בינה מאלכותית RB14-20 : זיהוי מיקום על ידי תמונה – Visual Place Recognition VPR
1. What is an Image Descriptor?
-
An Image Descriptor is a numerical vector (e.g., 256–4096 floats) that represents the essence of an image.
-
In VPR, descriptors are designed so that:
-
Two images of the same place → their descriptors are close in vector space.
-
Two images of different places → their descriptors are far apart.
-
So it’s like a fingerprint of a place that allows efficient comparison and retrieval.
2. Steps to Create an Image Descriptor for VPR
Step 1 – Preprocessing
-
Resize & Normalize image to a standard size (e.g., 224×224).
-
Convert to a tensor for input to the network.
Step 2 – Feature Extraction (Backbone CNN)
-
Pass the image through a Convolutional Neural Network (CNN) (e.g., ResNet, VGG).
-
Output = feature maps (multi-channel activation maps showing patterns in the image).
Step 3 – Pooling to Global Descriptor
-
Convert feature maps into a single compact vector.
Common methods:-
Average/Max Pooling – simple but weak.
-
NetVLAD – clusters local features and aggregates them into a robust descriptor.
-
GeM (Generalized Mean pooling) – smooth version of pooling, often used in retrieval.
-
CosPlace – modern method trained with special loss for VPR.
-
Step 4 – Normalization
-
Apply L2 normalization so the descriptor lies on the unit sphere.
-
This makes similarity comparisons (e.g., cosine similarity) consistent.
Step 5 – Similarity Comparison
-
For two descriptors, compute distance or similarity:
-
Cosine similarity
-
Euclidean distance (L2)
-
-
Smaller distance → more likely the images represent the same place.
3. Algorithms Commonly Used in VPR
Classical (pre-deep learning)
-
SIFT / SURF / ORB: detect local keypoints + descriptors.
-
Aggregation methods: Bag-of-Words, Fisher Vectors.
Deep Learning
-
NetVLAD (2016) – CNN backbone + differentiable VLAD pooling.
-
GeM pooling (2017) – smooth generalized pooling.
-
CosPlace (2022) – training recipe with classification-like setup for place recognition.
-
DELG / Radenović methods – global + local descriptors combined.
-
LoFTR / SuperPoint + SuperGlue – for local verification.