קורס בינה מלאכותית PyTorch- RB109-02 : יסודות רשת נוירונים
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# -*- coding: utf-8 -*- """PyTorch -ANN - simple .ipynb Automatically generated by Colab. Original file is located at https://colab.research.google.com/drive/1gHQ0kLmXtZWb9Q6CCurP3_PmzyEhECc_ """ # Simple ANN in PyTorch to learn y = 2x + 1 # - plots training + validation loss history # - predicts y for x=2 # - shows true value and percent error # is is an ANN — specifically a fully connected feed-forward neural network. # What kind of ANN it is ANN (Artificial Neural Network) # Feed-forward (no loops, no memory) , Fully connected (Dense layers) # Regression ANN (predicts a real number) import numpy as np import torch import torch.nn as nn from torch.utils.data import TensorDataset, DataLoader import matplotlib.pyplot as plt # ----------------------------- # 1) Reproducibility # ----------------------------- torch.manual_seed(0) np.random.seed(0) device = "cuda" if torch.cuda.is_available() else "cpu" # ----------------------------- # 2) Create data: y = 2x + 1 # ----------------------------- N = 200 x = np.linspace(-10, 10, N, dtype=np.float32).reshape(-1, 1) print("x", x) y = (2.0 * x + 1.0).astype(np.float32) print("y", y) # Train/Val split idx = np.arange(N) print("idx", idx) np.random.shuffle(idx) print("idx", idx) train_size = int(0.8 * N) train_idx, val_idx = idx[:train_size], idx[train_size:] print(f"Train size: {train_size}") print(f"Val size: {N - train_size}") print("train_idx", train_idx) print("\nval_idx", val_idx) x_train, y_train = x[train_idx], y[train_idx] x_val, y_val = x[val_idx], y[val_idx] print("x_train", x_train) print("\ny_train", y_train) print("\nx_val", x_val) print("\ny_val", y_val) x_train, y_train = x[train_idx], y[train_idx] x_val, y_val = x[val_idx], y[val_idx] # Torch tensors -Convert NumPy → PyTorch Tensor , Move the tensor to CPU or GPU x_train_t = torch.from_numpy(x_train).to(device) y_train_t = torch.from_numpy(y_train).to(device) x_val_t = torch.from_numpy(x_val).to(device) y_val_t = torch.from_numpy(y_val).to(device) train_loader = DataLoader(TensorDataset(x_train_t, y_train_t), batch_size=32, shuffle=True) # ----------------------------- # 3) Define a tiny ANN # ----------------------------- model = nn.Sequential( nn.Linear(1, 16), # i input layer connected to 16 neurons nn.ReLU(), nn.Linear(16, 16), nn.ReLU(), nn.Linear(16, 1), ).to(device) loss_fn = nn.MSELoss() # MSE = Mean Squared Error MSE = Mean Squared Error # ----------------------------- # 4) Train + record history # ----------------------------- epochs = 300 train_loss_hist = [] val_loss_hist = [] for epoch in range(1, epochs + 1): model.train() #Sets the model to training mode. running = 0.0 for xb, yb in train_loader: #During training: pred = model(xb) runs per batch. #Across the whole epoch, you eventually cover all training samples, batch by batch. #So in one epoch you still train on (almost) all training data, just not in a single forward pass pred = model(xb) # pred = model(xb): forward pass → prediction loss = loss_fn(pred, yb) optimizer.zero_grad() loss.backward() optimizer.step() running += loss.item() * xb.size(0) train_loss = running / train_size model.eval() with torch.no_grad(): val_pred = model(x_val_t) val_loss = loss_fn(val_pred, y_val_t).item() train_loss_hist.append(train_loss) val_loss_hist.append(val_loss) # ----------------------------- # 5) Plot history (train vs val) # ----------------------------- plt.figure() plt.plot(train_loss_hist, label="train loss") plt.plot(val_loss_hist, label="val loss") plt.xlabel("epoch") plt.ylabel("MSE loss") plt.title("Learning History: y = 2x + 1") plt.legend() plt.grid(True) plt.show() # ----------------------------- # 6) Predict for x = 2 # ----------------------------- x_test = torch.tensor([[12.0]], dtype=torch.float32, device=device) model.eval() with torch.no_grad(): y_pred = model(x_test).item() y_true = 2.0 * 12.0 + 1.0 # = 5 abs_err = abs(y_pred - y_true) pct_err = (abs_err / abs(y_true)) * 100.0 print(f"x=2") print(f"Predicted y: {y_pred:.6f}") print(f"True y: {y_true:.6f}") print(f"Abs error: {abs_err:.6f}") print(f"% error: {pct_err:.4f}%") |
Neural Network Training Process — Step by Step
Training a neural network is a repeating process applied to every batch of data.
Each step has a clear role, and only one step actually changes the weights and bias.
Below is a clean, fixed, WordPress-ready format, with clear headings, tables, math blocks, and no duplicated or broken text.
You can copy–paste directly into WordPress (works well with Gutenberg or Classic).
קורס בינה מלאכותית PyTorch – RB109-02
יסודות רשת נוירונים (ANN – Artificial Neural Network)
Overview
This lesson explains how a simple Artificial Neural Network (ANN) is trained in PyTorch to learn the function:
[
y = 2x + 1
]
The focus is on understanding the training process mathematically and conceptually, not just running code.
Neural Network Training Process — Step by Step
Training a neural network is a repeating process applied to every batch of data.
Each step has a clear role, and only one step actually changes the weights and bias.
Typical Order Per Batch
| Step | Name | What this step does | Effect on weights and bias |
|---|---|---|---|
| 1 | Forward pass (model) | Computes model output using current weights and bias (prediction) | No change |
| 2 | Loss calculation (loss function) | Measures how wrong the prediction is compared to the target | No change |
| 3 | Clear gradients (optimizer.zero_grad()) |
Removes gradients from the previous batch | No change |
| 4 | Backpropagation (loss.backward()) |
Computes derivatives of the loss with respect to each weight and bias | No direct change |
| 5 | Parameter update (optimizer.step()) |
Updates weights and bias using the computed derivatives | Weights and bias are updated |
Detailed Explanation of Each Step
1. Forward Pass (Model)
Input data is passed through the neural network.
Each layer performs:
- Multiplication by weights
- Addition of bias
- Activation function (ReLU)
The final layer produces the prediction:
[
\hat{y}
]
Important:
- Uses existing weights and bias
- No learning happens here
2. Loss Calculation (Loss Function)
The loss function compares:
- ( \hat{y} ) (prediction)
- ( y ) (ground truth)
It outputs a single scalar value called loss.
Mean Squared Error (MSE):
[
L = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i – y_i)^2
]
Interpretation:
- Small loss → good prediction
- Large loss → bad prediction
Important:
- No weights or bias are changed
- Loss only measures error
3. Clear Old Gradients (optimizer.zero_grad())
PyTorch accumulates gradients by default.
This step:
- Resets all gradients to zero
- Prevents gradient accumulation across batches
Important:
- Does not change weights or bias
- Only clears stored gradient values
4. Backpropagation (loss.backward())
This is the mathematical core of learning.
What happens:
- Computes derivatives of the loss with respect to:
- Every weight
- Every bias
- Uses the chain rule from calculus
- Computation flows from output back to input
Gradients are stored internally:
weight.gradbias.grad
Gradient meaning:
- Positive gradient → increasing parameter increases loss → parameter must decrease
- Negative gradient → increasing parameter decreases loss → parameter must increase
- Magnitude → how strong the change should be
Important:
- No parameters are updated here
- Only direction and strength are computed
5. Parameter Update (optimizer.step())
This is the only step that changes weights and bias.
Gradient descent rule:
[
W \leftarrow W – \eta \frac{\partial L}{\partial W}
]
[
b \leftarrow b – \eta \frac{\partial L}{\partial b}
]
Where:
- ( \eta ) is the learning rate
Different optimizers (SGD, Adam, RMSprop):
- Use different update rules
- All rely on the same gradients
Key Concepts Summary
- Forward pass → produces predictions
- Loss → measures error
- Backward pass → computes how each parameter affects the error
- Optimizer → applies the correction
One-Sentence Intuition
Backpropagation decides how each weight and bias should move, and the optimizer moves them to reduce the error.
Mathematical Explanation of Neural Network Training
(Based on PyTorch ANN – RB109-02)
1. Model Definition (Mathematical Form)
The network is a fully connected feed-forward network:
[
x \rightarrow \text{Linear}_1 \rightarrow \text{ReLU} \rightarrow \text{Linear}_2 \rightarrow \text{ReLU} \rightarrow \text{Linear}_3 \rightarrow \hat{y}
]
Each linear layer:
[
z = Wx + b
]
Where:
- ( W ) – weight matrix
- ( b ) – bias vector
- ( x ) – input vector
2. Forward Pass Mathematics
First Layer
[
z_1 = W_1 x + b_1
]
[
a_1 = \text{ReLU}(z_1) = \max(0, z_1)
]
Second Layer
[
z_2 = W_2 a_1 + b_2
]
[
a_2 = \text{ReLU}(z_2) = \max(0, z_2)
]
Output Layer
[
\hat{y} = z_3 = W_3 a_2 + b_3
]
3. Loss Function (Mean Squared Error)
[
L = \frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i – y_i)^2
]
- ( \hat{y}_i ) – predicted value
- ( y_i ) – true value
- ( N ) – batch size
4. Objective of Training
[
\min_{W,b} L(W,b)
]
Find the weights and biases that minimize prediction error.
5. Backpropagation (Derivatives)
Output Layer
[
\frac{\partial L}{\partial W_3}, \quad \frac{\partial L}{\partial b_3}
]
Using chain rule:
[
\frac{\partial L}{\partial W_3}
\frac{\partial L}{\partial \hat{y}}
\cdot
\frac{\partial \hat{y}}{\partial W_3}
]
Hidden Layers
[
\frac{\partial L}{\partial W_2}
\frac{\partial L}{\partial \hat{y}}
\cdot
\frac{\partial \hat{y}}{\partial a_2}
\cdot
\frac{\partial a_2}{\partial z_2}
\cdot
\frac{\partial z_2}{\partial W_2}
]
ReLU derivative:
[
\frac{d}{dz}\text{ReLU}(z) =
\begin{cases}
1 & z > 0 \
0 & z \le 0
\end{cases}
]
6. Meaning of the Gradient
Each gradient answers:
“If this parameter increases, does the loss increase or decrease?”
- Positive → decrease parameter
- Negative → increase parameter
- Magnitude → update strength
7. Why Learning Works
- Forward pass → prediction
- Loss → error measurement
- Backpropagation → error attribution
- Optimizer → correction
Repeated many times:
[
\hat{y} \approx 2x + 1
]
8. Final Insight
Even though the true function is linear, the network:
- Learns through nonlinear layers
- Discovers the rule from data
- Uses calculus, not hard-coded logic
This is the foundation of all deep learning models.
If you want next:
- Numeric backpropagation example
- Same explanation mapped line-by-line to PyTorch
- Classification or CNN version
Just say.