When I use mpirun for Megatron-LM bert pretraining, I received
AssertionError: MPI world size 16 does not match torch world size 8"
For mpirun args, I use --allow-run-as-root -np 16 -N 8 --hostfile ${hostfile}. Is there something I miss for setting torch world size appropriately?