Deep Reinforcement Learning
Module 2
Policy Gradient Algorithms
VPG, TRPO, and PPO — the on-policy family of algorithms that directly optimize the policy objective.
Readings
1
Vanilla Policy Gradient (VPG)
12 min
2
Trust Region Policy Optimization (TRPO)
12 min
3
Proximal Policy Optimization (PPO)
12 min
Quizzes
Labs