בינה מאלכותית RB14-20 : זיהוי מיקום על ידי תמונה – Visual Place Recognition VPR

בינה מאלכותית RB14-20 : זיהוי מיקום על ידי תמונה – Visual Place Recognition VPR

1. What is an Image Descriptor?

  • An Image Descriptor is a numerical vector (e.g., 256–4096 floats) that represents the essence of an image.

  • In VPR, descriptors are designed so that:

    • Two images of the same place → their descriptors are close in vector space.

    • Two images of different places → their descriptors are far apart.

So it’s like a fingerprint of a place that allows efficient comparison and retrieval.

2. Steps to Create an Image Descriptor for VPR

Step 1 – Preprocessing

  • Resize & Normalize image to a standard size (e.g., 224×224).

  • Convert to a tensor for input to the network.

Step 2 – Feature Extraction (Backbone CNN)

  • Pass the image through a Convolutional Neural Network (CNN) (e.g., ResNet, VGG).

  • Output = feature maps (multi-channel activation maps showing patterns in the image).

Step 3 – Pooling to Global Descriptor

  • Convert feature maps into a single compact vector.
    Common methods:

    • Average/Max Pooling – simple but weak.

    • NetVLAD – clusters local features and aggregates them into a robust descriptor.

    • GeM (Generalized Mean pooling) – smooth version of pooling, often used in retrieval.

    • CosPlace – modern method trained with special loss for VPR.

Step 4 – Normalization

  • Apply L2 normalization so the descriptor lies on the unit sphere.

  • This makes similarity comparisons (e.g., cosine similarity) consistent.

Step 5 – Similarity Comparison

  • For two descriptors, compute distance or similarity:

    • Cosine similarity

    • Euclidean distance (L2)

  • Smaller distance → more likely the images represent the same place.


3. Algorithms Commonly Used in VPR

Classical (pre-deep learning)

  • SIFT / SURF / ORB: detect local keypoints + descriptors.

  • Aggregation methods: Bag-of-Words, Fisher Vectors.

Deep Learning

  1. NetVLAD (2016) – CNN backbone + differentiable VLAD pooling.

  2. GeM pooling (2017) – smooth generalized pooling.

  3. CosPlace (2022) – training recipe with classification-like setup for place recognition.

  4. DELG / Radenović methods – global + local descriptors combined.

  5. LoFTR / SuperPoint + SuperGlue – for local verification.

כתיבת תגובה