Class-agnostic counting (CAC) aims to estimate the number of objects in
images without being restricted to predefined categories. Cependant, while
current exemplar-based CAC methods offer flexibility at inference time, they
still rely heavily on labeled data for training, which limits scalability and
generalization to many downstream use cases. Dans ce document, we introduce
CountingDINO, the first training-free exemplar-based CAC framework that
exploits a fully unsupervised feature extractor. Spécifiquement, our approach
employs self-supervised vision-only backbones to extract object-aware features,
and it eliminates the need for annotated data throughout the entire proposed
pipeline. At inference time, we extract latent object prototypes via ROI-Align
from DINO features and use them as convolutional kernels to generate similarity
maps. These are then transformed into density maps through a simple yet
effective normalization scheme. We evaluate our approach on the FSC-147
benchmark, where we outperform a baseline under the same label-free setting.
Our method also achieves competitive — and in some cases superior — results
compared to training-free approaches relying on supervised backbones, as well
as several fully supervised state-of-the-art methods. This demonstrates that
training-free CAC can be both scalable and competitive. Website:
https://lorebianchi98.github.io/CountingDINO/
Cet article explore les excursions dans le temps et leurs implications.
Télécharger PDF:



