-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Deep Reinforcement Learning Hands-On
By :

At a high level, our training contains a repetition of the following steps:
The purpose of the first two steps is to populate the replay buffer with samples from the environment (which are observation, action, reward, and next observation). The last two steps are for training our network.
The following is an illustration of the preceding steps that will make potential parallelism slightly more obvious. On the left, the training flow is shown. The training steps use environments, the replay buffer, and our NN. The solid lines show data and code flow.
Dotted lines represent usage of the NN for training and inference.
Figure 9.6: A sequential diagram of the training process
As you can see, the top two steps...