בינה מאלכותית RB14-18 : מהו PASCAL VOC 2007

בינה מאלכותית RB14-18 : מהו PASCAL VOC 2007  –  benchmark dataset and competition

PASCAL VOC 2007 is a labeled dataset of 20 everyday object classes, with bounding boxes, used for training and evaluating object detection models.

PASCAL VOC stands for Pattern Analysis, Statistical Modelling and Computational Learning – Visual Object Classes.

It was a benchmark dataset and competition created in 2007 to push forward object detection and recognition.

Researchers trained models on its images and compared results using a standard evaluation metric (mAP)

Why It’s Important

  • It became a standard benchmark for early detection models like YOLOv1, Faster R-CNN, SSD.

  • It is smaller and simpler than today’s COCO dataset, so it’s still used for teaching, prototyping, and quick experiments.

What is PASCAL VOC 2007?

2. What’s Inside the Dataset?

  • Images: about 9,963 images of real-world scenes.

  • Objects: about 24,000 labeled objects.

  • Classes: 20 categories, including:

    • Person

    • Animals (dog, cat, bird, horse, sheep, cow)

    • Vehicles (car, bus, motorbike, bicycle, train, aeroplane, boat)

    • Household items (chair, sofa, tv/monitor, dining table, bottle, potted plant)


3. Annotations (Bounding Boxes)

Each image comes with an annotation file that tells:

  • What object is present (e.g., “dog”).

  • Where it is located (bounding box: top-left corner and bottom-right corner pixel coordinates).

This means there is a dog in the rectangle defined by those pixel coordinates

Perfect, let’s build Step 1: A Python program that will:

  1. Download the PASCAL VOC 2007 dataset

  2. List the labels (20 classes)

  3. Print the annotation structure (so you see how labels are stored)

  4. Show a few images with their bounding boxes

hose libraries are built into Python’s standard library and do not require pip install:

  • os → already in Python, for file system paths.

  • tarfile → already in Python, for extracting .tar archives.

  • urllib.request → already in Python, for downloading files.

  • xml.etree.ElementTree → already in Python, for parsing XML annotation files.

The only things you need to install manually (via pip) are external libraries that are not in the standard library:

  • matplotlib → plotting (drawing images + bounding boxes).

  • pillow (PIL) → image loading and manipulation.

And since patches comes from matplotlib (from matplotlib import patches), installing matplotlib is enough.

https://www.kaggle.com/datasets/zaraks/pascal-voc-2007

point to:

  • Train/Val set
    D:\temp\PASCAL-VOC-2007\VOCtrainval_06-Nov-2007\VOCdevkit\VOC2007

  • Test set
    D:\temp\PASCAL-VOC-2007\VOCtest_06-Nov-2007\VOCdevkit\VOC2007


the script so that in addition to writing everything into the text file, it also prints a dataset summary to the console at the end. This will include:

  • Total number of images processed

  • Total number of objects found

  • Average objects per image

  • A per-class breakdown (how many times each category appears across the dataset)


 Full Program with Console Summary

verything we’ve built so far is about reading and analyzing the PASCAL VOC 2007 dataset (images + XML annotations).

That’s the preparation stage:

  • Making sure the dataset is accessible.

  • Parsing the XML annotations.

  • Summarizing what’s inside.

So far, no AI model is involved — we are just checking the dataset


IoU = Intersection over Union

Definition:
IoU measures how much the predicted bounding box overlaps with the ground-truth bounding box.

Formula:

IoU=Area of OverlapArea of UnionIoU = \frac{\text{Area of Overlap}}{\text{Area of Union}}

  • Area of Overlap = the common region between the predicted box and the real box.

  • Area of Union = total area covered by both boxes together.

So IoU ranges from 0 to 1:

  • IoU = 1 → Perfect match (boxes identical).

  • IoU = 0 → No overlap at all.

  • IoU = 0.5 → Boxes overlap by 50%.

In PASCAL VOC 2007, a detection is considered correct (TP) only if:

IoU≥0.5IoU \geq 0.5

If IoU < 0.5 → the box does not overlap enough → it’s treated as a False Positive (FP).

This means:

  • Your predicted box must cover at least half of the true box area to count as correct


True Positive (TP) Example

Ground truth box (dog):
x = 50, y = 50, width = 100, height = 100

Predicted box (dog):
x = 55, y = 55, width = 100, height = 100

Step 1: Compute Overlap (Intersection)

  • The overlapping region width = 95 (since 100 − 5)

  • The overlapping region height = 95

  • Overlap area = 95 × 95 = 9025

Step 2: Compute Union

Union=Area(pred)+Area(gt)−Overlap\text{Union} = \text{Area(pred)} + \text{Area(gt)} – \text{Overlap}

100×100+100×100−9025=10975100×100 + 100×100 – 9025 = 10975

Step 3: Compute IoU

IoU=902510975≈0.822IoU = \frac{9025}{10975} ≈ 0.822

Interpretation:
IoU = 0.82 ≥ 0.5 → this detection matches the ground truth well,
so it is counted as a True Positive (TP).


False Positive (FP) Example (yours)

Ground truth box (dog):
x = 50, y = 50, width = 100, height = 100

Predicted box (dog):
x = 60, y = 60, width = 100, height = 100

→ Overlap = 80×80 = 6400
→ Union = 13600
→ IoU = 6400 / 13600 = 0.47

IoU < 0.5 → this detection is too far off, counted as False Positive (FP).


🔎 Summary:

Case IoU Result Meaning
TP Example 0.82 ✅ TP Good overlap (correct detection)
FP Example 0.47 ❌ FP Poor overlap (wrong detection)

Mean Average Precision (mAP)

mAP measures how well an object detector finds and localizes objects across all categories.
High mAP = fewer false positives + fewer missed detections.

  • Compute AP for each class separately.

  • Take the mean across all classes:

    mAP=1N∑c=1NAPcmAP = \frac{1}{N}\sum_{c=1}^N AP_c

    where NN = number of classes.

Example:

  • AP(car) = 0.70

  • AP(person) = 0.80

  • AP(dog) = 0.60
    → mAP = (0.70 + 0.80 + 0.60) / 3 = 0.70

Confusion Matrix Terms

Every prediction you make can be categorized into one of four cases:

1. TP = True Positive

  • The model predicts an object, and it is correct.

  • Example: The model predicts a “dog” box, and there really is a dog in that location (IoU ≥ 0.5 with ground truth).

2. FP = False Positive

  • The model predicts an object, but it is wrong.

  • Example: The model predicts a “dog” in the image, but there is no dog there (or IoU < 0.5).

  • Another case: The class is wrong (predicted “cat” but it’s a “dog”).

3. FN = False Negative

  • The model misses a real object.

  • Example: There is a dog in the image, but the model does not detect it at all.

4. TN = True Negative (rarely used in detection)

  • The model correctly says “nothing here.”

  • In image classification it’s common, but in object detection with large search space, TN is not usually counted.


🔹 How They Work Together

  • Precision = TP / (TP + FP)
    → “Of all boxes I predicted, how many were correct?”

  • Recall = TP / (TP + FN)
    → “Of all real objects, how many did I find?”


🔎 Example

Imagine an image has 2 dogs:

  • Your model predicts 3 boxes:

    • 2 match the dogs (good → TP=2)

    • 1 is wrong (bad → FP=1)

  • But it misses 0 dogs (FN=0).

So:

  • TP = 2

  • FP = 1

  • FN = 0

Precision = 2 / (2+1) = 0.67
Recall = 2 / (2+0) = 1.0


✅ In short:

  • TP = correct detection

  • FP = false alarm

  • FN = missed object


How to Interpret

  • mPA is given as a percentage (0–100)

  • Example:

    • 16 mPA → 16% average pixel accuracy (very poor model, most pixels misclassified).

    • 79 mPA → 79% average pixel accuracy (quite strong model, most pixels correct).

    • 21 mPA → 21% average pixel accuracy (weak, better than random but still low).

In YOLOv8, the metric labeled mAP50 (and sometimes displayed as mPA depending on your environment or UI) represents mean Average Precision at IoU 0.5, following the PASCAL VOC evaluation standard.

Here’s a precise breakdown of what YOLOv8 reports:


1. Key Metrics YOLOv8 Prints During Training

When you train or validate a model, you’ll see metrics like:

Class Images Instances Box(P) Box(R) mAP50 mAP50-95
all 500 1300 0.91 0.88 0.90 0.68
  • Box(P) → Precision = TP / (TP + FP)

  • Box(R) → Recall = TP / (TP + FN)

  • mAP50 → mean Average Precision at IoU = 0.5

    • Same as VOC 2007 style (looser overlap requirement).

  • mAP50-95 → mean AP averaged over IoU thresholds 0.5 → 0.95 (COCO-style).

If you see mPA = 79, that is mAP50 = 0.79 — meaning your model correctly detects objects about 79% of the time at IoU ≥ 0.5.


2. How YOLOv8 Calculates mAP

  • It checks every prediction against ground truth using IoU (Intersection over Union).

  • A prediction counts as True Positive (TP) if IoU ≥ 0.5 and the class matches.

  • Otherwise, it’s a False Positive (FP).

  • Using all predictions, YOLOv8 builds Precision–Recall curves per class.

  • The area under each curve = AP (Average Precision) for that class.

  • The mean across all classes = mAP.


3. Typical YOLOv8 Performance Indicators

Metric Description Good Range
Precision How often detections are correct >0.85
Recall How many real objects are detected >0.80
mAP50 Detection accuracy at IoU ≥ 0.5 >0.75
mAP50-95 Average accuracy (stricter IoU) >0.60

4. Example Meaning

If your YOLOv8 output shows:

mPA@0.5 = 0.79

That means:

  • Your model’s average mean Average Precision (mAP) at IoU=0.5 is 79%.

  • It is detecting and localizing objects correctly 79% of the time across all classes.


So to confirm:

In YOLOv8, “mPA” ≈ “mAP@0.5” (mean Average Precision at IoU ≥ 0.5)
It is not pixel accuracy — it’s your detection performance metric.


כתיבת תגובה