-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Accelerate Model Training with PyTorch 2.X
By :

The results shown in Table 9.2 attest that Gloo fulfills the role of the communication backend for the distributed training process in PyTorch very well.
Even so, there is another option for the communication backend to go even faster on Intel platforms: the Intel oneCCL collective communication library. In this section, we will learn what this library is and how to use it as a communication backend for PyTorch.
Intel oneCCL is a collective communication library created and maintained by Intel. Along the lines of Gloo, oneCCL also provides collective communication primitives such as the so-called “All-reduce.”
Naturally, Intel oneCCL is optimized to run on Intel platform environments, though this does not necessarily mean it will not work on other platforms. We can use this library to provide collective communication among the processes executing in the same machine (intraprocess communication) or the...