-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Deep Reinforcement Learning Hands-On
By :

Next, we will take a look at the approaches used to improve the stability of the stochastic policy gradient method. Some attempts have been made to make the policy improvement more stable, and in this chapter, we will focus on three methods:
In addition, we will compare those methods to a relatively new off-policy method called soft actor-critic (SAC), which is a development of the deep deterministic policy gradients (DDPG) method described in Chapter 17, Continuous Action Space. To compare them to the A2C baseline, we will use several environments from the Roboschool library created by OpenAI.
The overall motivation of the methods that we will look at is to improve the stability of the policy update during the training. There is a dilemma: on the one hand, we...