dmlcloud.init
- dmlcloud.init(kind='auto')
Initializes the torch.distributed framework.
For most use cases, kind=’auto’ (the default) should be sufficient.
If kind is ‘env’, the “env://” initialization method is used. See torch.distributed.init_process_group.
If kind is ‘slurm’, SLURM environment variables are used to find the ip address of the root rank.
If kind is ‘mpi’, MPI is used to exchange ip addresses.
If kind is ‘dummy’, a dummy process group with a single process is used (no distributed training). This is useful for debugging and testing.
The ‘auto’ kind tries to initialize the process group in the following order:
If the MASTER_PORT environment variable is set, use environment variable initialization
If srun (slurm) was used to launch this program, use slurms environment variables
If MPI is available, use MPI to exchange ip addresses
Otherwise, a dummy process group with a single process is used (no distributed training)
- Parameters:
kind (str) – The kind of initialization to use. Can be one of ‘auto’, ‘dummy’, ‘slurm’, ‘mpi’, or ‘env’.