קורס בינה מלאכותית PyTorch- RB109-02 : יסודות רשת נוירונים

מחבר:admin
פורסם:ינואר 3, 2026
קטגוריה:רובוטרוניקס כללי

קורס בינה מלאכותית PyTorch- RB109-02 : יסודות רשת נוירונים

# -*- coding: utf-8 -*-
"""PyTorch -ANN - simple   .ipynb

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/drive/1gHQ0kLmXtZWb9Q6CCurP3_PmzyEhECc_
"""

# Simple ANN in PyTorch to learn y = 2x + 1
# - plots training + validation loss history
# - predicts y for x=2
# - shows true value and percent error
# is is an ANN — specifically a fully connected feed-forward neural network.

# What kind of ANN it is ANN (Artificial Neural Network)
# Feed-forward (no loops, no memory) , Fully connected (Dense layers)
# Regression ANN (predicts a real number)

import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import matplotlib.pyplot as plt

# -----------------------------
# 1) Reproducibility
# -----------------------------
torch.manual_seed(0)
np.random.seed(0)

device = "cuda" if torch.cuda.is_available() else "cpu"

# -----------------------------
# 2) Create data: y = 2x + 1
# -----------------------------
N = 200
x = np.linspace(-10, 10, N, dtype=np.float32).reshape(-1, 1)
print("x", x)



y = (2.0 * x + 1.0).astype(np.float32)
print("y", y)

# Train/Val split
idx = np.arange(N)
print("idx", idx)

np.random.shuffle(idx)
print("idx", idx)

train_size = int(0.8 * N)

train_idx, val_idx = idx[:train_size], idx[train_size:]

print(f"Train size: {train_size}")
print(f"Val size:   {N - train_size}")
print("train_idx", train_idx)
print("\nval_idx", val_idx)

x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val     = x[val_idx], y[val_idx]

print("x_train", x_train)
print("\ny_train", y_train)
print("\nx_val", x_val)
print("\ny_val", y_val)

x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val     = x[val_idx], y[val_idx]

# Torch tensors -Convert NumPy → PyTorch Tensor , Move the tensor to CPU or GPU
x_train_t = torch.from_numpy(x_train).to(device)
y_train_t = torch.from_numpy(y_train).to(device)
x_val_t   = torch.from_numpy(x_val).to(device)
y_val_t   = torch.from_numpy(y_val).to(device)

train_loader = DataLoader(TensorDataset(x_train_t, y_train_t), batch_size=32, shuffle=True)

# -----------------------------
# 3) Define a tiny ANN
# -----------------------------
model = nn.Sequential(
    nn.Linear(1, 16),  # i input layer connected  to 16 neurons
    nn.ReLU(),
    nn.Linear(16, 16),
    nn.ReLU(),
    nn.Linear(16, 1),
).to(device)

loss_fn = nn.MSELoss() # MSE = Mean Squared Error
MSE = Mean Squared Error

# -----------------------------
# 4) Train + record history
# -----------------------------
epochs = 300
train_loss_hist = []
val_loss_hist = []

for epoch in range(1, epochs + 1):
    model.train() #Sets the model to training mode.
    running = 0.0
    for xb, yb in train_loader:
       #During training: pred = model(xb) runs per batch.
      #Across the whole epoch, you eventually cover all training samples, batch by batch.
      #So in one epoch you still train on (almost) all training data, just not in a single forward pass
        pred = model(xb) # pred = model(xb): forward pass → prediction
        loss = loss_fn(pred, yb)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running += loss.item() * xb.size(0)

    train_loss = running / train_size

    model.eval()
    with torch.no_grad():
        val_pred = model(x_val_t)
        val_loss = loss_fn(val_pred, y_val_t).item()

    train_loss_hist.append(train_loss)
    val_loss_hist.append(val_loss)

# -----------------------------
# 5) Plot history (train vs val)
# -----------------------------
plt.figure()
plt.plot(train_loss_hist, label="train loss")
plt.plot(val_loss_hist, label="val loss")
plt.xlabel("epoch")
plt.ylabel("MSE loss")
plt.title("Learning History: y = 2x + 1")
plt.legend()
plt.grid(True)
plt.show()

# -----------------------------
# 6) Predict for x = 2
# -----------------------------
x_test = torch.tensor([[12.0]], dtype=torch.float32, device=device)
model.eval()
with torch.no_grad():
    y_pred = model(x_test).item()

y_true = 2.0 * 12.0 + 1.0  # = 5
abs_err = abs(y_pred - y_true)
pct_err = (abs_err / abs(y_true)) * 100.0

print(f"x=2")
print(f"Predicted y: {y_pred:.6f}")
print(f"True y:      {y_true:.6f}")
print(f"Abs error:   {abs_err:.6f}")
print(f"% error:     {pct_err:.4f}%")

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

# -*- coding: utf-8 -*-

"""PyTorch -ANN - simple .ipynb

Automatically generated by Colab.

Original file is located at

https://colab.research.google.com/drive/1gHQ0kLmXtZWb9Q6CCurP3_PmzyEhECc_

"""

# Simple ANN in PyTorch to learn y = 2x + 1

# - plots training + validation loss history

# - predicts y for x=2

# - shows true value and percent error

# is is an ANN — specifically a fully connected feed-forward neural network.

# What kind of ANN it is ANN (Artificial Neural Network)

# Feed-forward (no loops, no memory) , Fully connected (Dense layers)

# Regression ANN (predicts a real number)

import numpy as np

import torch

import torch.nn as nn

from torch.utils.data import TensorDataset, DataLoader

import matplotlib.pyplot as plt

# -----------------------------

# 1) Reproducibility

# -----------------------------

torch.manual_seed(0)

np.random.seed(0)

device = "cuda" if torch.cuda.is_available() else "cpu"

# -----------------------------

# 2) Create data: y = 2x + 1

# -----------------------------

N = 200

x = np.linspace(-10, 10, N, dtype=np.float32).reshape(-1, 1)

print("x", x)

y = (2.0 * x + 1.0).astype(np.float32)

print("y", y)

# Train/Val split

idx = np.arange(N)

print("idx", idx)

np.random.shuffle(idx)

print("idx", idx)

train_size = int(0.8 * N)

train_idx, val_idx = idx[:train_size], idx[train_size:]

print(f"Train size: {train_size}")

print(f"Val size: {N - train_size}")

print("train_idx", train_idx)

print("\nval_idx", val_idx)

x_train, y_train = x[train_idx], y[train_idx]

x_val, y_val = x[val_idx], y[val_idx]

print("x_train", x_train)

print("\ny_train", y_train)

print("\nx_val", x_val)

print("\ny_val", y_val)

x_train, y_train = x[train_idx], y[train_idx]

x_val, y_val = x[val_idx], y[val_idx]

# Torch tensors -Convert NumPy → PyTorch Tensor , Move the tensor to CPU or GPU

x_train_t = torch.from_numpy(x_train).to(device)

y_train_t = torch.from_numpy(y_train).to(device)

x_val_t = torch.from_numpy(x_val).to(device)

y_val_t = torch.from_numpy(y_val).to(device)

train_loader = DataLoader(TensorDataset(x_train_t, y_train_t), batch_size=32, shuffle=True)

# -----------------------------

# 3) Define a tiny ANN

# -----------------------------

model = nn.Sequential(

nn.Linear(1, 16), # i input layer connected to 16 neurons

nn.ReLU(),

nn.Linear(16, 16),

nn.ReLU(),

nn.Linear(16, 1),

).to(device)

loss_fn = nn.MSELoss() # MSE = Mean Squared Error

MSE = Mean Squared Error

# -----------------------------

# 4) Train + record history

# -----------------------------

epochs = 300

train_loss_hist = []

val_loss_hist = []

for epoch in range(1, epochs + 1):

model.train() #Sets the model to training mode.

running = 0.0

for xb, yb in train_loader:

#During training: pred = model(xb) runs per batch.

#Across the whole epoch, you eventually cover all training samples, batch by batch.

#So in one epoch you still train on (almost) all training data, just not in a single forward pass

pred = model(xb) # pred = model(xb): forward pass → prediction

loss = loss_fn(pred, yb)

optimizer.zero_grad()

loss.backward()

optimizer.step()

running += loss.item() * xb.size(0)

train_loss = running / train_size

model.eval()

with torch.no_grad():

val_pred = model(x_val_t)

val_loss = loss_fn(val_pred, y_val_t).item()

train_loss_hist.append(train_loss)

val_loss_hist.append(val_loss)

# -----------------------------

# 5) Plot history (train vs val)

# -----------------------------

plt.figure()

plt.plot(train_loss_hist, label="train loss")

plt.plot(val_loss_hist, label="val loss")

plt.xlabel("epoch")

plt.ylabel("MSE loss")

plt.title("Learning History: y = 2x + 1")

plt.legend()

plt.grid(True)

plt.show()

# -----------------------------

# 6) Predict for x = 2

# -----------------------------

x_test = torch.tensor([[12.0]], dtype=torch.float32, device=device)

model.eval()

with torch.no_grad():

y_pred = model(x_test).item()

y_true = 2.0 * 12.0 + 1.0 # = 5

abs_err = abs(y_pred - y_true)

pct_err = (abs_err / abs(y_true)) * 100.0

print(f"x=2")

print(f"Predicted y: {y_pred:.6f}")

print(f"True y: {y_true:.6f}")

print(f"Abs error: {abs_err:.6f}")

print(f"% error: {pct_err:.4f}%")

Neural Network Training Process — Step by Step

Training a neural network is a repeating process applied to every batch of data.
Each step has a clear role, and only one step actually changes the weights and bias.

Below is a clean, fixed, WordPress-ready format, with clear headings, tables, math blocks, and no duplicated or broken text.
You can copy–paste directly into WordPress (works well with Gutenberg or Classic).

קורס בינה מלאכותית PyTorch – RB109-02

יסודות רשת נוירונים (ANN – Artificial Neural Network)

Overview

This lesson explains how a simple Artificial Neural Network (ANN) is trained in PyTorch to learn the function:

[
y = 2x + 1
]

The focus is on understanding the training process mathematically and conceptually, not just running code.

Neural Network Training Process — Step by Step

Training a neural network is a repeating process applied to every batch of data.
Each step has a clear role, and only one step actually changes the weights and bias.

Typical Order Per Batch

Step	Name	What this step does	Effect on weights and bias
1	Forward pass (model)	Computes model output using current weights and bias (prediction)	No change
2	Loss calculation (loss function)	Measures how wrong the prediction is compared to the target	No change
3	Clear gradients (`optimizer.zero_grad()`)	Removes gradients from the previous batch	No change
4	Backpropagation (`loss.backward()`)	Computes derivatives of the loss with respect to each weight and bias	No direct change
5	Parameter update (`optimizer.step()`)	Updates weights and bias using the computed derivatives	Weights and bias are updated

Detailed Explanation of Each Step

1. Forward Pass (Model)

Input data is passed through the neural network.

Each layer performs:

Multiplication by weights
Addition of bias
Activation function (ReLU)

The final layer produces the prediction:

[
\hat{y}
]

Important:

Uses existing weights and bias
No learning happens here

2. Loss Calculation (Loss Function)

The loss function compares:

( \hat{y} ) (prediction)
( y ) (ground truth)

It outputs a single scalar value called loss.

Mean Squared Error (MSE):

[
L = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i – y_i)^2
]

Interpretation:

Small loss → good prediction
Large loss → bad prediction

Important:

No weights or bias are changed
Loss only measures error

3. Clear Old Gradients (`optimizer.zero_grad()`)

PyTorch accumulates gradients by default.

This step:

Resets all gradients to zero
Prevents gradient accumulation across batches

Important:

Does not change weights or bias
Only clears stored gradient values

4. Backpropagation (`loss.backward()`)

This is the mathematical core of learning.

What happens:

Computes derivatives of the loss with respect to:
- Every weight
- Every bias
Uses the chain rule from calculus
Computation flows from output back to input

Gradients are stored internally:

weight.grad
bias.grad

Gradient meaning:

Positive gradient → increasing parameter increases loss → parameter must decrease
Negative gradient → increasing parameter decreases loss → parameter must increase
Magnitude → how strong the change should be

Important:

No parameters are updated here
Only direction and strength are computed

5. Parameter Update (`optimizer.step()`)

This is the only step that changes weights and bias.

Gradient descent rule:

[
W \leftarrow W – \eta \frac{\partial L}{\partial W}
]

[
b \leftarrow b – \eta \frac{\partial L}{\partial b}
]

Where:

( \eta ) is the learning rate

Different optimizers (SGD, Adam, RMSprop):

Use different update rules
All rely on the same gradients

Key Concepts Summary

Forward pass → produces predictions
Loss → measures error
Backward pass → computes how each parameter affects the error
Optimizer → applies the correction

One-Sentence Intuition

Backpropagation decides how each weight and bias should move, and the optimizer moves them to reduce the error.

Mathematical Explanation of Neural Network Training

(Based on PyTorch ANN – RB109-02)

1. Model Definition (Mathematical Form)

The network is a fully connected feed-forward network:

[
x \rightarrow \text{Linear}_1 \rightarrow \text{ReLU} \rightarrow \text{Linear}_2 \rightarrow \text{ReLU} \rightarrow \text{Linear}_3 \rightarrow \hat{y}
]

Each linear layer:

[
z = Wx + b
]

Where:

( W ) – weight matrix
( b ) – bias vector
( x ) – input vector

2. Forward Pass Mathematics

First Layer

[
z_1 = W_1 x + b_1
]

[
a_1 = \text{ReLU}(z_1) = \max(0, z_1)
]

Second Layer

[
z_2 = W_2 a_1 + b_2
]

[
a_2 = \text{ReLU}(z_2) = \max(0, z_2)
]

Output Layer

[
\hat{y} = z_3 = W_3 a_2 + b_3
]

3. Loss Function (Mean Squared Error)

[
L = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i – y_i)^2
]

( \hat{y}_i ) – predicted value
( y_i ) – true value
( N ) – batch size

4. Objective of Training

[
\min_{W,b} L(W,b)
]

Find the weights and biases that minimize prediction error.

5. Backpropagation (Derivatives)

Output Layer

[
\frac{\partial L}{\partial W_3}, \quad \frac{\partial L}{\partial b_3}
]

Using chain rule:

[
\frac{\partial L}{\partial W_3}

\frac{\partial L}{\partial \hat{y}}
\cdot
\frac{\partial \hat{y}}{\partial W_3}
]

Hidden Layers

[
\frac{\partial L}{\partial W_2}

\frac{\partial L}{\partial \hat{y}}
\cdot
\frac{\partial \hat{y}}{\partial a_2}
\cdot
\frac{\partial a_2}{\partial z_2}
\cdot
\frac{\partial z_2}{\partial W_2}
]

ReLU derivative:

[
\frac{d}{dz}\text{ReLU}(z) =
\begin{cases}
1 & z > 0 \
0 & z \le 0
\end{cases}
]

6. Meaning of the Gradient

Each gradient answers:

“If this parameter increases, does the loss increase or decrease?”

Positive → decrease parameter
Negative → increase parameter
Magnitude → update strength

7. Why Learning Works

Forward pass → prediction
Loss → error measurement
Backpropagation → error attribution
Optimizer → correction

Repeated many times:

[
\hat{y} \approx 2x + 1
]

8. Final Insight

Even though the true function is linear, the network:

Learns through nonlinear layers
Discovers the rule from data
Uses calculus, not hard-coded logic

This is the foundation of all deep learning models.

If you want next:

Numeric backpropagation example
Same explanation mapped line-by-line to PyTorch
Classification or CNN version

Just say.

Neural Network Training Process — Step by Step

קורס בינה מלאכותית PyTorch – RB109-02

יסודות רשת נוירונים (ANN – Artificial Neural Network)

Overview

Neural Network Training Process — Step by Step

Typical Order Per Batch

Detailed Explanation of Each Step

1. Forward Pass (Model)

2. Loss Calculation (Loss Function)

3. Clear Old Gradients (optimizer.zero_grad())

4. Backpropagation (loss.backward())

5. Parameter Update (optimizer.step())

Key Concepts Summary

One-Sentence Intuition

Mathematical Explanation of Neural Network Training

1. Model Definition (Mathematical Form)

2. Forward Pass Mathematics

First Layer

Second Layer

Output Layer

3. Loss Function (Mean Squared Error)

4. Objective of Training

5. Backpropagation (Derivatives)

Output Layer

[ \frac{\partial L}{\partial W_3}

Hidden Layers

[ \frac{\partial L}{\partial W_2}

6. Meaning of the Gradient

7. Why Learning Works

8. Final Insight

אולי תאהב/י גם

Learn Python to A.I Programming – Lesson 1

esp32 rtc clock internal clock

דרושים מדרכים לחוג רובוטיקה לילדים בוקר \ ערב כל הארץ

3. Clear Old Gradients (`optimizer.zero_grad()`)

4. Backpropagation (`loss.backward()`)

5. Parameter Update (`optimizer.step()`)

[
\frac{\partial L}{\partial W_3}

[
\frac{\partial L}{\partial W_2}