Parallel#
Context-parallel communication primitives (zigzag splits / all-to-all) shared by the mixer and conv modules above.
|
Initialize distributed training and megatron parallel state. |
|
Distributes tensor data across group ranks using zigzag pattern. |
|
Reconstructs complete tensor from zigzag-distributed chunks. |
|
Set up logging that only prints to console from rank 0, but logs all ranks to files. |
|
Differentiable all-to-all collective for CP sequence ↔ channel redistribution. |