What Is Computer Vision?
Computer vision is the AI field that enables machines to interpret and act on visual information from the world — photos, videos, medical scans, satellite imagery, and live camera feeds.
What seems effortless for humans (recognizing a friend's face, reading a sign, catching a ball) requires sophisticated algorithms when done by machines. Computer vision has only recently become reliable enough for production use, thanks largely to deep learning breakthroughs.
Core Tasks in Computer Vision
Image classification assigns a label to an entire image: is this a cat or a dog? Object detection goes further, locating multiple objects within an image and drawing bounding boxes around each one.
Semantic segmentation labels every pixel — not just where objects are but their exact boundaries. Pose estimation identifies the position of human joints for fitness apps and motion capture. Optical character recognition (OCR) reads text from images, powering document scanning and license plate readers.
How It Works Under the Hood
Most computer vision systems use convolutional neural networks (CNNs) or, increasingly, vision transformers (ViTs). CNNs apply small filters across an image to detect features like edges, textures, and shapes, building up more complex representations in deeper layers.
Vision transformers split an image into patches and process them the same way language transformers process words — using self-attention to understand how different parts of the image relate to each other. This approach has achieved state-of-the-art results on many benchmarks.
Applications Transforming Industries
In healthcare, computer vision reads X-rays and MRI scans with accuracy rivaling specialists. In manufacturing, it inspects products on assembly lines for defects. Autonomous vehicles rely on it to understand road conditions, and retailers use it for inventory management.
Self-driving car systems combine computer vision with lidar and radar for a complete environmental understanding. For more on this, see AI in transportation.