init_parallel_state#
- init_parallel_state(
- tensor_model_parallel_size=1,
- pipeline_model_parallel_size=1,
- context_parallel_size=1,
Initialize distributed training and megatron parallel state.
Sets up the distributed training environment using NCCL backend and initializes Megatron’s parallel state with the specified parallelism configurations. This function handles device assignment, process group initialization, and parallel state setup.
- Parameters:
- Returns:
The local rank of the current process.
- Return type:
- Raises:
AssertionError – If the number of available GPUs doesn’t match the required world size (tensor_model_parallel_size * pipeline_model_parallel_size * context_parallel_size).
Note
This function sets up environment variables for NCCL configuration and initializes the process group if not already initialized. It also verifies the context parallel rank and world size after initialization.