RB1001-1 : PPO is a Reinforcement Learning (RL)

RB1001-1 : PPO is a Reinforcement Learning (RL)

  • RL = agent learns by trial and error, using reward.

  • PPO = one of the most stable and popular RL algorithms.

  • Full trajectory