RB1001-1 : PPO is a Reinforcement Learning (RL)
-
RL = agent learns by trial and error, using reward.
-
PPO = one of the most stable and popular RL algorithms.
- Full trajectory
RB1001-1 : PPO is a Reinforcement Learning (RL)
RL = agent learns by trial and error, using reward.
PPO = one of the most stable and popular RL algorithms.