After setting hostNetwork to true, mpi does not work #1657

varinic · 2022-09-05T08:24:29Z

I set hostNetwork to true for the mpi job, but the new ip is not used when the MPI is actually executed, resulting in no communication between the worker pods.
I found the mapping between the physical machine ip and the worker pod name in /opt/kube/hosts of the launcher pod, but there is no such mapping relationship in the worker pod. I doubt this is the reason why MPI is not using the new ip?
Can anybody help me？

zw0610 · 2022-09-05T12:59:50Z

Could you confirm the mapping file is located at `/opt/kube/hosts`? From the mpi-controller in the master branch, I think the controller does not mount such a file. Given the circumstance that the real ip of a pod will only be assigned after the pod is scheduled, it seems quite possible that the `kube-delivery` image does the job. In this case, could you confirm the image (version/tag) of the `kube-delivery`, which is the image used by the init-container of the launcher pod.

…

On Mon, Sep 5, 2022 at 4:24 PM Yongmin Hu ***@***.***> wrote: I set hostNetwork to true for the mpi job, but the new ip is not used when the MPI is actually executed, resulting in no communication between the worker pods. I found the mapping between the physical machine ip and the worker pod name in /opt/kube/hosts of the launcher pod, but there is no such mapping relationship in the worker pod. I doubt this is the reason why MPI is not using the new ip? Can anybody help me？ — Reply to this email directly, view it on GitHub <#1657>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AK7V6ISQYXP43ILWCABNKQTV4WU4ZANCNFSM6AAAAAAQEYDFCI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

varinic · 2022-09-05T13:19:20Z

The problem can be solved by removing "-mca pml ob1 -mca btl ^openib" parameter in the mpirun command.
Thanks!

varinic closed this as completed Sep 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After setting hostNetwork to true, mpi does not work #1657

After setting hostNetwork to true, mpi does not work #1657

varinic commented Sep 5, 2022

zw0610 commented Sep 5, 2022 via email

varinic commented Sep 5, 2022

After setting hostNetwork to true, mpi does not work #1657

After setting hostNetwork to true, mpi does not work #1657

Comments

varinic commented Sep 5, 2022

zw0610 commented Sep 5, 2022 via email

varinic commented Sep 5, 2022