בינה מאלכותית RB14-18 : מהו PASCAL VOC 2007 – רובוטרוניקס: מפתחת מעגלים ואלקטרוניקה ,מיקרובקרים , תוכנה ורובוטיקה: התקשר עכשיו 0506399001

בינה מאלכותית RB14-18 : מהו PASCAL VOC 2007 – benchmark dataset and competition

PASCAL VOC 2007 is a labeled dataset of 20 everyday object classes, with bounding boxes, used for training and evaluating object detection models.

PASCAL VOC stands for Pattern Analysis, Statistical Modelling and Computational Learning – Visual Object Classes.

It was a benchmark dataset and competition created in 2007 to push forward object detection and recognition.

Researchers trained models on its images and compared results using a standard evaluation metric (mAP)

Why It’s Important

It became a standard benchmark for early detection models like YOLOv1, Faster R-CNN, SSD.
It is smaller and simpler than today’s COCO dataset, so it’s still used for teaching, prototyping, and quick experiments.

What is PASCAL VOC 2007?

2. What’s Inside the Dataset?

Images: about 9,963 images of real-world scenes.
Objects: about 24,000 labeled objects.
Classes: 20 categories, including:
- Person
- Animals (dog, cat, bird, horse, sheep, cow)
- Vehicles (car, bus, motorbike, bicycle, train, aeroplane, boat)
- Household items (chair, sofa, tv/monitor, dining table, bottle, potted plant)

3. Annotations (Bounding Boxes)

Each image comes with an annotation file that tells:

What object is present (e.g., “dog”).
Where it is located (bounding box: top-left corner and bottom-right corner pixel coordinates).

<object>
  <name>dog</name>
  <bndbox>
    <xmin>50</xmin>
    <ymin>60</ymin>
    <xmax>200</xmax>
    <ymax>220</ymax>
  </bndbox>
</object>

</bndbox>

</object>

This means there is a dog in the rectangle defined by those pixel coordinates

Perfect, let’s build Step 1: A Python program that will:

Download the PASCAL VOC 2007 dataset
List the labels (20 classes)
Print the annotation structure (so you see how labels are stored)
Show a few images with their bounding boxes

pip install matplotlib pillow

1	pip install matplotlib pillow

hose libraries are built into Python’s standard library and do not require pip install:

os → already in Python, for file system paths.
tarfile → already in Python, for extracting .tar archives.
urllib.request → already in Python, for downloading files.
xml.etree.ElementTree → already in Python, for parsing XML annotation files.

The only things you need to install manually (via pip) are external libraries that are not in the standard library:

matplotlib → plotting (drawing images + bounding boxes).
pillow (PIL) → image loading and manipulation.

And since patches comes from matplotlib (from matplotlib import patches), installing matplotlib is enough.

https://www.kaggle.com/datasets/zaraks/pascal-voc-2007

PASCAL-VOC-2007/
    PASCAL_VOC/
        PASCAL_VOC/
    VOCtest_06-Nov-2007/
        VOCdevkit/
            VOC2007/
                Annotations/
                ImageSets/
                    Layout/
                    Main/
                    Segmentation/
                JPEGImages/
                SegmentationClass/
                SegmentationObject/
    VOCtrainval_06-Nov-2007/
        VOCdevkit/
            VOC2007/
                Annotations/
                ImageSets/
                    Layout/
                    Main/
                    Segmentation/
                JPEGImages/
                SegmentationClass/
                SegmentationObject/

PASCAL-VOC-2007/

PASCAL_VOC/

VOCtest_06-Nov-2007/

VOCdevkit/

VOC2007/

Annotations/

ImageSets/

Layout/

Main/

Segmentation/

JPEGImages/

SegmentationClass/

SegmentationObject/

VOCtrainval_06-Nov-2007/

VOCdevkit/

VOC2007/

Annotations/

ImageSets/

Layout/

Main/

Segmentation/

JPEGImages/

SegmentationClass/

SegmentationObject/

point to:

Train/Val set →
D:\temp\PASCAL-VOC-2007\VOCtrainval_06-Nov-2007\VOCdevkit\VOC2007
Test set →
D:\temp\PASCAL-VOC-2007\VOCtest_06-Nov-2007\VOCdevkit\VOC2007

import os
import xml.etree.ElementTree as ET
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
import random

# ==========================================================
# 1. Dataset paths
# ==========================================================
voc_trainval = r"D:\temp\PASCAL-VOC-2007\VOCtrainval_06-Nov-2007\VOCdevkit\VOC2007"
voc_test     = r"D:\temp\PASCAL-VOC-2007\VOCtest_06-Nov-2007\VOCdevkit\VOC2007"

# Choose split
voc_dir = voc_trainval

ann_dir = os.path.join(voc_dir, "Annotations")
img_dir = os.path.join(voc_dir, "JPEGImages")

# ==========================================================
# 2. Parse annotation file
# ==========================================================
def parse_annotation(xml_file):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    objects = []
    for obj in root.findall("object"):
        cls = obj.find("name").text
        bbox = obj.find("bndbox")
        xmin = int(bbox.find("xmin").text)
        ymin = int(bbox.find("ymin").text)
        xmax = int(bbox.find("xmax").text)
        ymax = int(bbox.find("ymax").text)
        objects.append((cls, xmin, ymin, xmax, ymax))
    return objects

# ==========================================================
# 3. Show image side by side (raw vs annotated)
# ==========================================================
def show_side_by_side(img_file, objects):
    img = Image.open(img_file)

    # Create two subplots: raw + annotated
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 7))
    fig.suptitle(f"File: {os.path.basename(img_file)}\nFull path: {img_file}",
                 fontsize=12, weight="bold")

    # Left: raw image
    ax1.imshow(img)
    ax1.set_title("Raw Image")
    ax1.axis("off")

    # Right: annotated image
    ax2.imshow(img)
    for (cls, xmin, ymin, xmax, ymax) in objects:
        rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,
                                 linewidth=2, edgecolor='red', facecolor='none')
        ax2.add_patch(rect)
        ax2.text(xmin, ymin - 5, cls, color='red', fontsize=10, weight="bold")
    ax2.set_title("With Annotations")
    ax2.axis("off")

    plt.tight_layout()
    plt.show()

# ==========================================================
# 4. Switch: random or manual
# ==========================================================
mode = "manual"   # "random" or "manual"

if mode == "random":
    example_ann = random.choice(os.listdir(ann_dir))
elif mode == "manual":
    file_number = "000005"  # <-- choose manually here
    example_ann = file_number + ".xml"
else:
    raise ValueError("Mode must be 'random' or 'manual'")

example_ann_path = os.path.join(ann_dir, example_ann)
objects = parse_annotation(example_ann_path)
example_img = os.path.join(img_dir, example_ann.replace(".xml", ".jpg"))

# ==========================================================
# 5. Print summary and show side by side
# ==========================================================
print("=== IMAGE SUMMARY ===")
print("File name:", os.path.basename(example_img))
print("Image path:", example_img)
print("Annotation path:", example_ann_path)
print("Objects:")
for obj in objects:
    cls, xmin, ymin, xmax, ymax = obj
    print(f" - {cls} (bbox: {xmin},{ymin},{xmax},{ymax})")

# Show raw + annotated images side by side
show_side_by_side(example_img, objects)

import os

import xml.etree.ElementTree as ET

import matplotlib.pyplot as plt

import matplotlib.patches as patches

from PIL import Image

import random

# ==========================================================

# 1. Dataset paths

# ==========================================================

voc_trainval = r"D:\temp\PASCAL-VOC-2007\VOCtrainval_06-Nov-2007\VOCdevkit\VOC2007"

voc_test = r"D:\temp\PASCAL-VOC-2007\VOCtest_06-Nov-2007\VOCdevkit\VOC2007"

# Choose split

voc_dir = voc_trainval

ann_dir = os.path.join(voc_dir, "Annotations")

img_dir = os.path.join(voc_dir, "JPEGImages")

# ==========================================================

# 2. Parse annotation file

# ==========================================================

def parse_annotation(xml_file):

tree = ET.parse(xml_file)

root = tree.getroot()

objects = []

for obj in root.findall("object"):

cls = obj.find("name").text

bbox = obj.find("bndbox")

xmin = int(bbox.find("xmin").text)

ymin = int(bbox.find("ymin").text)

xmax = int(bbox.find("xmax").text)

ymax = int(bbox.find("ymax").text)

objects.append((cls, xmin, ymin, xmax, ymax))

return objects

# ==========================================================

# 3. Show image side by side (raw vs annotated)

# ==========================================================

def show_side_by_side(img_file, objects):

img = Image.open(img_file)

# Create two subplots: raw + annotated

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 7))

fig.suptitle(f"File: {os.path.basename(img_file)}\nFull path: {img_file}",

fontsize=12, weight="bold")

# Left: raw image

ax1.imshow(img)

ax1.set_title("Raw Image")

ax1.axis("off")

# Right: annotated image

ax2.imshow(img)

for (cls, xmin, ymin, xmax, ymax) in objects:

rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,

linewidth=2, edgecolor='red', facecolor='none')

ax2.add_patch(rect)

ax2.text(xmin, ymin - 5, cls, color='red', fontsize=10, weight="bold")

ax2.set_title("With Annotations")

ax2.axis("off")

plt.tight_layout()

plt.show()

# ==========================================================

# 4. Switch: random or manual

# ==========================================================

mode = "manual" # "random" or "manual"

if mode == "random":

example_ann = random.choice(os.listdir(ann_dir))

elif mode == "manual":

file_number = "000005" # <-- choose manually here

example_ann = file_number + ".xml"

else:

raise ValueError("Mode must be 'random' or 'manual'")

example_ann_path = os.path.join(ann_dir, example_ann)

objects = parse_annotation(example_ann_path)

example_img = os.path.join(img_dir, example_ann.replace(".xml", ".jpg"))

# ==========================================================

# 5. Print summary and show side by side

# ==========================================================

print("=== IMAGE SUMMARY ===")

print("File name:", os.path.basename(example_img))

print("Image path:", example_img)

print("Annotation path:", example_ann_path)

print("Objects:")

for obj in objects:

cls, xmin, ymin, xmax, ymax = obj

print(f" - {cls} (bbox: {xmin},{ymin},{xmax},{ymax})")

# Show raw + annotated images side by side

show_side_by_side(example_img, objects)

the script so that in addition to writing everything into the text file, it also prints a dataset summary to the console at the end. This will include:

Total number of images processed
Total number of objects found
Average objects per image
A per-class breakdown (how many times each category appears across the dataset)

Full Program with Console Summary

import os
import xml.etree.ElementTree as ET
from collections import Counter

# ==========================================================
# 1. Dataset path
# ==========================================================
voc_trainval = r"D:\temp\PASCAL-VOC-2007\VOCtrainval_06-Nov-2007\VOCdevkit\VOC2007"
voc_dir = voc_trainval   # or switch to voc_test

ann_dir = os.path.join(voc_dir, "Annotations")
img_dir = os.path.join(voc_dir, "JPEGImages")

# ==========================================================
# 2. Parse annotation XML
# ==========================================================
def parse_annotation(xml_file):
    tree = ET.parse(xml_file)
    root = tree.getroot()
    objects = []
    for obj in root.findall("object"):
        cls = obj.find("name").text
        bbox = obj.find("bndbox")
        xmin = int(bbox.find("xmin").text)
        ymin = int(bbox.find("ymin").text)
        xmax = int(bbox.find("xmax").text)
        ymax = int(bbox.find("ymax").text)
        objects.append((cls, xmin, ymin, xmax, ymax))
    return objects

# ==========================================================
# 3. Process all files
# ==========================================================
annotations = [f for f in os.listdir(ann_dir) if f.endswith(".xml")]

print(f"Start reading {len(annotations)} annotation files...")

summary_file = "voc2007_summary.txt"
total_objects = 0
class_counter = Counter()

with open(summary_file, "w", encoding="utf-8") as f:
    for idx, ann in enumerate(annotations, start=1):
        ann_path = os.path.join(ann_dir, ann)
        img_path = os.path.join(img_dir, ann.replace(".xml", ".jpg"))
        objects = parse_annotation(ann_path)

        total_objects += len(objects)
        for cls, xmin, ymin, xmax, ymax in objects:
            class_counter[cls] += 1

        # Write detailed info to file
        f.write(f"File name: {os.path.basename(img_path)}\n")
        f.write(f"Image path: {img_path}\n")
        f.write(f"Annotation path: {ann_path}\n")
        f.write("Objects:\n")
        for cls, xmin, ymin, xmax, ymax in objects:
            f.write(f" - {cls} (bbox: {xmin},{ymin},{xmax},{ymax})\n")
        f.write("\n" + "="*50 + "\n\n")

        # Print progress every 10 files
        if idx % 10 == 0:
            print(".", end="", flush=True)

# ==========================================================
# 4. Print summary to console
# ==========================================================
print("\n\n=== DATASET SUMMARY ===")
print(f"Total files processed: {len(annotations)}")
print(f"Total objects found: {total_objects}")
print(f"Average objects per image: {total_objects / len(annotations):.2f}")
print("\nObjects per class:")

for cls, count in class_counter.most_common():
    print(f" - {cls}: {count}")

print(f"\n✅ Detailed summary written to {summary_file}")

import os

import xml.etree.ElementTree as ET

from collections import Counter

# ==========================================================

# 1. Dataset path

# ==========================================================

voc_trainval = r"D:\temp\PASCAL-VOC-2007\VOCtrainval_06-Nov-2007\VOCdevkit\VOC2007"

voc_dir = voc_trainval # or switch to voc_test

ann_dir = os.path.join(voc_dir, "Annotations")

img_dir = os.path.join(voc_dir, "JPEGImages")

# ==========================================================

# 2. Parse annotation XML

# ==========================================================

def parse_annotation(xml_file):

tree = ET.parse(xml_file)

root = tree.getroot()

objects = []

for obj in root.findall("object"):

cls = obj.find("name").text

bbox = obj.find("bndbox")

xmin = int(bbox.find("xmin").text)

ymin = int(bbox.find("ymin").text)

xmax = int(bbox.find("xmax").text)

ymax = int(bbox.find("ymax").text)

objects.append((cls, xmin, ymin, xmax, ymax))

return objects

# ==========================================================

# 3. Process all files

# ==========================================================

annotations = [f for f in os.listdir(ann_dir) if f.endswith(".xml")]

print(f"Start reading {len(annotations)} annotation files...")

summary_file = "voc2007_summary.txt"

total_objects = 0

class_counter = Counter()

with open(summary_file, "w", encoding="utf-8") as f:

for idx, ann in enumerate(annotations, start=1):

ann_path = os.path.join(ann_dir, ann)

img_path = os.path.join(img_dir, ann.replace(".xml", ".jpg"))

objects = parse_annotation(ann_path)

total_objects += len(objects)

for cls, xmin, ymin, xmax, ymax in objects:

class_counter[cls] += 1

# Write detailed info to file

f.write(f"File name: {os.path.basename(img_path)}\n")

f.write(f"Image path: {img_path}\n")

f.write(f"Annotation path: {ann_path}\n")

f.write("Objects:\n")

for cls, xmin, ymin, xmax, ymax in objects:

f.write(f" - {cls} (bbox: {xmin},{ymin},{xmax},{ymax})\n")

f.write("\n" + "="*50 + "\n\n")

# Print progress every 10 files

if idx % 10 == 0:

print(".", end="", flush=True)

# ==========================================================

# 4. Print summary to console

# ==========================================================

print("\n\n=== DATASET SUMMARY ===")

print(f"Total files processed: {len(annotations)}")

print(f"Total objects found: {total_objects}")

print(f"Average objects per image: {total_objects / len(annotations):.2f}")

print("\nObjects per class:")

for cls, count in class_counter.most_common():

print(f" - {cls}: {count}")

print(f"\n✅ Detailed summary written to {summary_file}")

verything we’ve built so far is about reading and analyzing the PASCAL VOC 2007 dataset (images + XML annotations).

That’s the preparation stage:

Making sure the dataset is accessible.
Parsing the XML annotations.
Summarizing what’s inside.

So far, no AI model is involved — we are just checking the dataset

IoU = Intersection over Union

Definition:
IoU measures how much the predicted bounding box overlaps with the ground-truth bounding box.

Formula:

$\frac{\text{Area of Overlap}}{\text{Area of Union}}$

Area of Overlap = the common region between the predicted box and the real box.
Area of Union = total area covered by both boxes together.

So IoU ranges from 0 to 1:

IoU = 1 → Perfect match (boxes identical).
IoU = 0 → No overlap at all.
IoU = 0.5 → Boxes overlap by 50%.

In PASCAL VOC 2007, a detection is considered correct (TP) only if:

$\geq 0.5$

If IoU < 0.5 → the box does not overlap enough → it’s treated as a False Positive (FP).

This means:

Your predicted box must cover at least half of the true box area to count as correct

✅ True Positive (TP) Example

Ground truth box (dog):
x = 50, y = 50, width = 100, height = 100

Predicted box (dog):
x = 55, y = 55, width = 100, height = 100

Step 1: Compute Overlap (Intersection)

The overlapping region width = 95 (since 100 − 5)
The overlapping region height = 95
Overlap area = 95 × 95 = 9025

Step 2: Compute Union

$Union=Area(pred)+Area(gt)−Overlap\text{Union} = \text{Area(pred)} + \text{Area(gt)} – \text{Overlap}$

→ $100 \times 100 + 100 \times 100 - 9025 = 10975$

Step 3: Compute IoU

$\frac{9025}{10975} ≈ 0.822$

Interpretation:
IoU = 0.82 ≥ 0.5 → this detection matches the ground truth well,
so it is counted as a True Positive (TP).

❌ False Positive (FP) Example (yours)

Ground truth box (dog):
x = 50, y = 50, width = 100, height = 100

Predicted box (dog):
x = 60, y = 60, width = 100, height = 100

→ Overlap = 80×80 = 6400
→ Union = 13600
→ IoU = 6400 / 13600 = 0.47

IoU < 0.5 → this detection is too far off, counted as False Positive (FP).

🔎 Summary:

Case	IoU	Result	Meaning
TP Example	0.82	✅ TP	Good overlap (correct detection)
FP Example	0.47	❌ FP	Poor overlap (wrong detection)

Mean Average Precision (mAP)

mAP measures how well an object detector finds and localizes objects across all categories.
High mAP = fewer false positives + fewer missed detections.

Compute AP for each class separately.
Take the mean across all classes:

$\frac{1}{N}\sum_{c=1}^N AP_c$

where $N$ = number of classes.

Example:

AP(car) = 0.70
AP(person) = 0.80
AP(dog) = 0.60
→ mAP = (0.70 + 0.80 + 0.60) / 3 = 0.70

Confusion Matrix Terms

Every prediction you make can be categorized into one of four cases:

1. TP = True Positive

The model predicts an object, and it is correct.
Example: The model predicts a “dog” box, and there really is a dog in that location (IoU ≥ 0.5 with ground truth).

2. FP = False Positive

The model predicts an object, but it is wrong.
Example: The model predicts a “dog” in the image, but there is no dog there (or IoU < 0.5).
Another case: The class is wrong (predicted “cat” but it’s a “dog”).

3. FN = False Negative

The model misses a real object.
Example: There is a dog in the image, but the model does not detect it at all.

4. TN = True Negative (rarely used in detection)

The model correctly says “nothing here.”
In image classification it’s common, but in object detection with large search space, TN is not usually counted.

🔹 How They Work Together

Precision = TP / (TP + FP)
→ “Of all boxes I predicted, how many were correct?”
Recall = TP / (TP + FN)
→ “Of all real objects, how many did I find?”

🔎 Example

Imagine an image has 2 dogs:

Your model predicts 3 boxes:
- 2 match the dogs (good → TP=2)
- 1 is wrong (bad → FP=1)
But it misses 0 dogs (FN=0).

So:

TP = 2
FP = 1
FN = 0

Precision = 2 / (2+1) = 0.67
Recall = 2 / (2+0) = 1.0

✅ In short:

TP = correct detection
FP = false alarm
FN = missed object

How to Interpret

mPA is given as a percentage (0–100)
Example:
- 16 mPA → 16% average pixel accuracy (very poor model, most pixels misclassified).
- 79 mPA → 79% average pixel accuracy (quite strong model, most pixels correct).
- 21 mPA → 21% average pixel accuracy (weak, better than random but still low).

In YOLOv8, the metric labeled mAP50 (and sometimes displayed as mPA depending on your environment or UI) represents mean Average Precision at IoU 0.5, following the PASCAL VOC evaluation standard.

Here’s a precise breakdown of what YOLOv8 reports:

1. Key Metrics YOLOv8 Prints During Training

When you train or validate a model, you’ll see metrics like:

Box(P) → Precision = TP / (TP + FP)
Box(R) → Recall = TP / (TP + FN)
mAP50 → mean Average Precision at IoU = 0.5
- Same as VOC 2007 style (looser overlap requirement).
mAP50-95 → mean AP averaged over IoU thresholds 0.5 → 0.95 (COCO-style).

If you see mPA = 79, that is mAP50 = 0.79 — meaning your model correctly detects objects about 79% of the time at IoU ≥ 0.5.

2. How YOLOv8 Calculates mAP

It checks every prediction against ground truth using IoU (Intersection over Union).
A prediction counts as True Positive (TP) if IoU ≥ 0.5 and the class matches.
Otherwise, it’s a False Positive (FP).
Using all predictions, YOLOv8 builds Precision–Recall curves per class.
The area under each curve = AP (Average Precision) for that class.
The mean across all classes = mAP.

3. Typical YOLOv8 Performance Indicators

Metric	Description	Good Range
`Precision`	How often detections are correct	>0.85
`Recall`	How many real objects are detected	>0.80
`mAP50`	Detection accuracy at IoU ≥ 0.5	>0.75
`mAP50-95`	Average accuracy (stricter IoU)	>0.60

4. Example Meaning

If your YOLOv8 output shows:

That means:

Your model’s average mean Average Precision (mAP) at IoU=0.5 is 79%.
It is detecting and localizing objects correctly 79% of the time across all classes.

So to confirm:

In YOLOv8, “mPA” ≈ “mAP@0.5” (mean Average Precision at IoU ≥ 0.5)
It is not pixel accuracy — it’s your detection performance metric.

Why It’s Important

2. What’s Inside the Dataset?

3. Annotations (Bounding Boxes)

Full Program with Console Summary

IoU = Intersection over Union

✅ True Positive (TP) Example

❌ False Positive (FP) Example (yours)

🔎 Summary:

Mean Average Precision (mAP)

Confusion Matrix Terms

1. TP = True Positive

2. FP = False Positive

3. FN = False Negative

4. TN = True Negative (rarely used in detection)

🔹 How They Work Together

🔎 Example

How to Interpret

1. Key Metrics YOLOv8 Prints During Training

2. How YOLOv8 Calculates mAP

3. Typical YOLOv8 Performance Indicators

4. Example Meaning

אולי תאהב/י גם

יסודות בינה מלאכותית : 03-RB27 – מבוא ל YOLO – זיהוי אובייקטים

מיקרופייתון – מיתוג פין , הלדקה וכיבוי של לד

יסודות בינה מלאכותית : 10-RB27 – מבוא לרשת ניירונים

כתיבת תגובה לבטל