Supervised Learning: Learning from Labels
In supervised learning, you provide the algorithm with input-output pairs — labeled examples. Want a model to identify cats in photos? Give it thousands of images labeled cat or not cat. The model learns the mapping between inputs and desired outputs.
Common supervised tasks include classification (spam or not spam, tumor or not tumor) and regression (predicting a number like house prices or stock values). The approach is well-understood, widely used, and the go-to choice when labeled data is available.
Unsupervised Learning: Finding Hidden Patterns
Unsupervised learning works with unlabeled data, letting algorithms discover structure on their own. Clustering groups similar items together (customer segmentation, document grouping). Dimensionality reduction compresses data while preserving important relationships.
Anomaly detection identifies unusual data points — critical for fraud detection and system monitoring. The model learns what normal looks like and flags anything that deviates significantly.
Semi-Supervised and Self-Supervised Learning
In practice, the boundary between supervised and unsupervised is blurry. Semi-supervised learning uses a small amount of labeled data combined with a large amount of unlabeled data — a practical compromise when labeling is expensive.
Self-supervised learning creates its own labels from the data. Language models, for instance, learn by predicting masked words — the labels come from the text itself. This approach powers the pretraining of models like GPT and BERT.
Choosing the Right Approach
Start by assessing your data. If you have labeled examples, supervised learning is usually the best first step. If labels are scarce or nonexistent, explore unsupervised or semi-supervised methods.
Many production systems combine approaches. A supervised classifier might use features discovered by an unsupervised algorithm. For the bigger picture on how these fit together, see our AI fundamentals guide.