-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Deep Reinforcement Learning Hands-On
By :

The first version of A3C parallelization that we'll check (which was outlined on Figure 2) has both one main process which carries out training and several children processes communicating with environments and gathering experience to train on. For simplicity and efficiency, the neural network (NN) weights broadcasting from the trainer process is not implemented. Instead of explicitly gathering and sending weights to children, the network is shared between all processes using PyTorch built-in capabilities, allowing us to use the same nn.Module
instance with all its weights in different processes by calling the share_memory()
method on NN creation. Under the hood, this method has zero overhead for CUDA (as GPU memory is shared among all host's processes) or shared memory IPC in the case of CPU computation. In both cases, the method improves performance, but limits our example for one single machine using one single GPU card for training and data gathering...