You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I set hostNetwork to true for the mpi job, but the new ip is not used when the MPI is actually executed, resulting in no communication between the worker pods.
I found the mapping between the physical machine ip and the worker pod name in /opt/kube/hosts of the launcher pod, but there is no such mapping relationship in the worker pod. I doubt this is the reason why MPI is not using the new ip?
Can anybody help me?
The text was updated successfully, but these errors were encountered:
Could you confirm the mapping file is located at `/opt/kube/hosts`? From
the mpi-controller in the master branch, I think the controller does not
mount such a file.
Given the circumstance that the real ip of a pod will only be assigned
after the pod is scheduled, it seems quite possible that the
`kube-delivery` image does the job. In this case, could you confirm the
image (version/tag) of the `kube-delivery`, which is the image used by the
init-container of the launcher pod.
On Mon, Sep 5, 2022 at 4:24 PM Yongmin Hu ***@***.***> wrote:
I set hostNetwork to true for the mpi job, but the new ip is not used when
the MPI is actually executed, resulting in no communication between the
worker pods.
I found the mapping between the physical machine ip and the worker pod
name in /opt/kube/hosts of the launcher pod, but there is no such mapping
relationship in the worker pod. I doubt this is the reason why MPI is not
using the new ip?
Can anybody help me?
—
Reply to this email directly, view it on GitHub
<#1657>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK7V6ISQYXP43ILWCABNKQTV4WU4ZANCNFSM6AAAAAAQEYDFCI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
I set hostNetwork to true for the mpi job, but the new ip is not used when the MPI is actually executed, resulting in no communication between the worker pods.
I found the mapping between the physical machine ip and the worker pod name in /opt/kube/hosts of the launcher pod, but there is no such mapping relationship in the worker pod. I doubt this is the reason why MPI is not using the new ip?
Can anybody help me?
The text was updated successfully, but these errors were encountered: