קורס בינה מלאכותית PyTorch- RB109-02 : יסודות רשת נוירונים

קורס בינה מלאכותית PyTorch- RB109-02 : יסודות רשת נוירונים

Neural Network Training Process — Step by Step

Training a neural network is a repeating process applied to every batch of data.
Each step has a clear role, and only one step actually changes the weights and bias.


Below is a clean, fixed, WordPress-ready format, with clear headings, tables, math blocks, and no duplicated or broken text.
You can copy–paste directly into WordPress (works well with Gutenberg or Classic).


קורס בינה מלאכותית PyTorch – RB109-02

יסודות רשת נוירונים (ANN – Artificial Neural Network)


Overview

This lesson explains how a simple Artificial Neural Network (ANN) is trained in PyTorch to learn the function:

[
y = 2x + 1
]

The focus is on understanding the training process mathematically and conceptually, not just running code.


Neural Network Training Process — Step by Step

Training a neural network is a repeating process applied to every batch of data.
Each step has a clear role, and only one step actually changes the weights and bias.


Typical Order Per Batch

Step Name What this step does Effect on weights and bias
1 Forward pass (model) Computes model output using current weights and bias (prediction) No change
2 Loss calculation (loss function) Measures how wrong the prediction is compared to the target No change
3 Clear gradients (optimizer.zero_grad()) Removes gradients from the previous batch No change
4 Backpropagation (loss.backward()) Computes derivatives of the loss with respect to each weight and bias No direct change
5 Parameter update (optimizer.step()) Updates weights and bias using the computed derivatives Weights and bias are updated

Detailed Explanation of Each Step


1. Forward Pass (Model)

Input data is passed through the neural network.

Each layer performs:

  • Multiplication by weights
  • Addition of bias
  • Activation function (ReLU)

The final layer produces the prediction:

[
\hat{y}
]

Important:

  • Uses existing weights and bias
  • No learning happens here

2. Loss Calculation (Loss Function)

The loss function compares:

  • ( \hat{y} ) (prediction)
  • ( y ) (ground truth)

It outputs a single scalar value called loss.

Mean Squared Error (MSE):

[
L = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i – y_i)^2
]

Interpretation:

  • Small loss → good prediction
  • Large loss → bad prediction

Important:

  • No weights or bias are changed
  • Loss only measures error

3. Clear Old Gradients (optimizer.zero_grad())

PyTorch accumulates gradients by default.

This step:

  • Resets all gradients to zero
  • Prevents gradient accumulation across batches

Important:

  • Does not change weights or bias
  • Only clears stored gradient values

4. Backpropagation (loss.backward())

This is the mathematical core of learning.

What happens:

  • Computes derivatives of the loss with respect to:
    • Every weight
    • Every bias
  • Uses the chain rule from calculus
  • Computation flows from output back to input

Gradients are stored internally:

  • weight.grad
  • bias.grad

Gradient meaning:

  • Positive gradient → increasing parameter increases loss → parameter must decrease
  • Negative gradient → increasing parameter decreases loss → parameter must increase
  • Magnitude → how strong the change should be

Important:

  • No parameters are updated here
  • Only direction and strength are computed

5. Parameter Update (optimizer.step())

This is the only step that changes weights and bias.

Gradient descent rule:

[
W \leftarrow W – \eta \frac{\partial L}{\partial W}
]

[
b \leftarrow b – \eta \frac{\partial L}{\partial b}
]

Where:

  • ( \eta ) is the learning rate

Different optimizers (SGD, Adam, RMSprop):

  • Use different update rules
  • All rely on the same gradients

Key Concepts Summary

  • Forward pass → produces predictions
  • Loss → measures error
  • Backward pass → computes how each parameter affects the error
  • Optimizer → applies the correction

One-Sentence Intuition

Backpropagation decides how each weight and bias should move, and the optimizer moves them to reduce the error.


Mathematical Explanation of Neural Network Training

(Based on PyTorch ANN – RB109-02)


1. Model Definition (Mathematical Form)

The network is a fully connected feed-forward network:

[
x \rightarrow \text{Linear}_1 \rightarrow \text{ReLU} \rightarrow \text{Linear}_2 \rightarrow \text{ReLU} \rightarrow \text{Linear}_3 \rightarrow \hat{y}
]

Each linear layer:

[
z = Wx + b
]

Where:

  • ( W ) – weight matrix
  • ( b ) – bias vector
  • ( x ) – input vector

2. Forward Pass Mathematics

First Layer

[
z_1 = W_1 x + b_1
]

[
a_1 = \text{ReLU}(z_1) = \max(0, z_1)
]


Second Layer

[
z_2 = W_2 a_1 + b_2
]

[
a_2 = \text{ReLU}(z_2) = \max(0, z_2)
]


Output Layer

[
\hat{y} = z_3 = W_3 a_2 + b_3
]


3. Loss Function (Mean Squared Error)

[
L = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i – y_i)^2
]

  • ( \hat{y}_i ) – predicted value
  • ( y_i ) – true value
  • ( N ) – batch size

4. Objective of Training

[
\min_{W,b} L(W,b)
]

Find the weights and biases that minimize prediction error.


5. Backpropagation (Derivatives)

Output Layer

[
\frac{\partial L}{\partial W_3}, \quad \frac{\partial L}{\partial b_3}
]

Using chain rule:

[
\frac{\partial L}{\partial W_3}

\frac{\partial L}{\partial \hat{y}}
\cdot
\frac{\partial \hat{y}}{\partial W_3}
]


Hidden Layers

[
\frac{\partial L}{\partial W_2}

\frac{\partial L}{\partial \hat{y}}
\cdot
\frac{\partial \hat{y}}{\partial a_2}
\cdot
\frac{\partial a_2}{\partial z_2}
\cdot
\frac{\partial z_2}{\partial W_2}
]

ReLU derivative:

[
\frac{d}{dz}\text{ReLU}(z) =
\begin{cases}
1 & z > 0 \
0 & z \le 0
\end{cases}
]


6. Meaning of the Gradient

Each gradient answers:

“If this parameter increases, does the loss increase or decrease?”

  • Positive → decrease parameter
  • Negative → increase parameter
  • Magnitude → update strength

7. Why Learning Works

  • Forward pass → prediction
  • Loss → error measurement
  • Backpropagation → error attribution
  • Optimizer → correction

Repeated many times:

[
\hat{y} \approx 2x + 1
]


8. Final Insight

Even though the true function is linear, the network:

  • Learns through nonlinear layers
  • Discovers the rule from data
  • Uses calculus, not hard-coded logic

This is the foundation of all deep learning models.


If you want next:

  • Numeric backpropagation example
  • Same explanation mapped line-by-line to PyTorch
  • Classification or CNN version

Just say.