RL 01

מחבר:admin
פורסם:אפריל 5, 2025
קטגוריה:רובוטרוניקס כללי
תגובות:אין תגובות

RL 01

let say we move from 0,0 to 0,1 :

max(Q((1,0), all actions)) =

max(q_table[0][1][0], # ↑
q_table[0][1][1], # ↓
q_table[0][1][2], # ←
q_table[0][1][3]) # →

[0][0] [0][1] [0][2] [0][3] [0][4] [0][5]
[1][0] [1][1] [1][2] [1][3] [1][4] [1][5]
[2][0] [2][1] [2][2] [2][3] [2][4] [2][5]
[3][0] [3][1] [3][2] [3][3] [3][4] [3][5]
[4][0] [4][1] [4][2] [4][3] [4][4] [4][5]
[5][0] [5][1] [5][2] [5][3] [5][4] [5][5]

f the agent tries to go to an invalid position (like `(-1, 0)`):

'q_table[0][0][2] # y=0, x=0, action index 2 = ←

The move is detected as out of bounds
The agent gets a reward = -1
The episode ends
The agent still updates the Q-value for the action it tried (even though it failed)
The Q-value will slowly approach -1 over time