When In Doubt, Go YOLO

A Brief Look Into YOLO Algorithm for Object Detection


Walk around the streets of Los Angeles and I bet you five bucks a Model 3 will pass by every minute or so. Tesla’s best selling auto comes at a “bargain” price of just under $40k and is popular for reasons. The company has taken pride in being the first to popularize “smart” electric vehicle. Furthermore, Tesla’s strong point comes in the form of Autopilot. This feature enables a seemingly ordinary car to drive autonomously without the need of human drivers, a piece of technology commonly imagined by futurist back then. How do they achieve this? To put it simple, their cars are fitted with cameras and sensors to gather the necessary data about the car’s surrounding environment. This data comes in many form, but for now let’s just look at the images coming from the camera. This image would then be fed through algorithms tasked for vision processing which, according to Tesla, are built on top of “deep neural network”.

  1. Processing the image by means of analysis and manipulation technique.
  2. Producing the output in the form of report or altered image based on the processing steps taken before.
YOLOv1 uses 24 convolutional layers and 2 fully connected layers. Source: Redmon et al. (2016)


YOLO is an abbreviation of “You Only Look Once”. From the paper published by its creators, YOLO utilizes regression problem to perform object detection tasks like separating bounding boxes and associate class probabilities. At its core, YOLO is a type of CNN that runs on one single network. Because of this, it can be easily optimized to attain better performance. Do keep in mind that YOLO relies on regression as opposed to classification problems used in CNN. YOLO have three major advantages when compared to traditional object detection models:

  • Accuracy: YOLO may encounter more localization errors but is less likely to predict false positives than traditional models.
  • Learning capabilities: YOLO learns very general representations of objects that is useful for cases like generalizing natural images to artwork.

Residual blocks

Input image divided into multiple grids. Source: Section.io

Bounding boxes

Source: Towards Data Science
  1. bw: width of the box
  2. bx, by: center of the box
  3. c: class of the object
  4. pc: confidence of object presention

Intersection Over Union (IOU)

A cat that seems to be plotting something. Source: Koderunners

All at once

YOLO from start to finish. Source: Redmon et al. (2016)


As an object detection algorithm, YOLO is relatively simple both in theory and practical aspects relative to traditional neural network models. It is arguably the fastest, state-of-the-art general-purpose object detection model currently in use. This is largely due to the model being able to detect objects in real-time. YOLO also generalizes well to new domains, making it ideal for applications that rely on fast and robust object detection.


J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.

Run-of-the-mill college student

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store