-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Accelerate Model Training with PyTorch 2.X
By :

This section introduces the basic workflow to implement distributed training on PyTorch, besides presenting the components used in this process.
Generally speaking, the basic workflow to implement distributed training on PyTorch comprises the steps illustrated in Figure 8.14:
Figure 8.14 – Basic workflow to implement distributed training in PyTorch
Let’s look at each step in more detail.
Note
The complete code shown in this section is available at https://github.com/PacktPublishing/Accelerate-Model-Training-with-PyTorch-2.X/blob/main/code/chapter08/pytorch_ddp.py.
The communication group is the logical entity that’s used by PyTorch to define and control the distributed environment. So, the first step to code the distributed training concerns initializing a communication group. This step is performed by instantiating an object...