From Pixels
to Features
One backbone to rule them all.
A journey through visual feature extraction
from hand-crafted descriptors to foundation models.
— Codemotion · October 28–29, 2025 · Milan
At the Conference
Talk Summary
CORE THEME
We explored what a backbone is, how it turns pixels into a reusable feature space, and how to harness it to prototype quickly.
From SIFT to Foundation Models
The talk traced the evolution of visual features: from SIFT and HOG to deep CNNs, then to Vision Transformers and today's foundation models (CLIP, DINOv2, SAM). Each step redefined what "understanding an image" means.
Key takeaway: With modern backbones, you no longer need millions of labeled samples to solve a new vision task — you just need the right representation.