בינה מלאכותית RB108-6 : בינה מלאכותית – איך נוירון לומד

בינה מלאכותית RB108-6 : בינה מלאכותית – איך נוירון לומד

חלק א :

x = [-3, -2, -1, 0, 1, 2, 3]

y = [-5, -3, -1, 1, 3, 5, 7]


those are the true values  :y = [-5, -3, -1, 1, 3, 5, 7]

1) The Neuron Model

The neuron predicts:

y_predict = w * x + b

Starting values:

w = 1
b = 0
learning_rate = 0.05

2) Forward Pass (Prediction)

For each x:

y_predict = w * x + b
x y_true y_predict
-3 -5 -3
-2 -3 -2
-1 -1 -1
0 1 0
1 3 1
2 5 2
3 7 3


3).Error , lost Function 

we start w=1,b=0

y_true    = w * x + b

y_predict = w * x + b

x y_true y_predict = 1·x + 0 error = y_predict – y_true loss = (y_true – y_predict)²
-3 -5 -3 +2 4
-2 -3 -2 +1 1
-1 -1 -1 0 0
0 1 0 -1 1
1 3 1 -2 4
2 5 2 -3 9
3 7 3 -4 16

Totals for Step 3

Sum of errors:


2 + 1 + 0 – 1 – 2 – 3 – 4 = -7

Sum of loss values:


4 + 1 + 0 + 1 + 4 + 9 + 16 = 35

Mean Loss (The Value Used for Learning)

The Mean Loss is:

mean loss=sum of all lossesnumber of samples\text{mean loss} = \frac{\text{sum of all losses}}{\text{number of samples}}

In our case:

  • Total loss = 35

  • Number of samples = 7

So:

mean loss=357=5

MSE (mean squared error) = cost function =  Error loss

 


3.1 Why we use the Loss and not the Error

Example:

error = +3
error = -3

Loss converts both to:

loss = 9
loss = 9

So the neuron learns how big the mistake is, not just the sign.

Raw error cancels out:

(+3) + (-3) = 0 → looks perfect, but actually very wrong.

Loss does NOT cancel out → always correct for learning.


4) Gradients are computed from the loss

Why We Don’t Solve With Derivative = 0 (Except for One Neuron)

1) For a single neuron (like your example):

y=wx+by = w x + b

This is a simple line.

We can take the derivative of the loss, set it to 0, and solve for the exact best:

w=2,b=1w = 2,\quad b = 1

This works because the loss surface is a perfect parabola (a simple bowl shape).


2) But for many neurons (real neural networks):

When you add:

  • more neurons

  • more layers

  • activation functions (ReLU, sigmoid)

  • millions of parameters

The loss surface becomes:

  • twisted

  • curved

  • full of valleys and hills

  • impossible to solve with algebra

You cannot write equations like:

∂L∂wi=0\frac{\partial L}{\partial w_i} = 0

for millions of nonlinear parameters.

There is no formula that gives all w and b directly.