Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always use same IP address for module and modex #5938

Merged
merged 2 commits into from
Oct 21, 2018

Conversation

bwbarrett
Copy link
Member

Two patches that end up with simplifying the modex send so that we're not racy in a multi-IP per NIC design. We were already storing a source address in the local BTL, for use to bind() and force the source address to be known to us. With this patch, we guarantee that the address we use for bind() is the same we publish in the modex.

This should help with #5818.

Today, a btl tcp module is associated with exactly one IP
address (IPv4 or IPv6).  There's no need to reserve space
for both an IPv4 and IPv6 address in the module structure,
since the module will only be associated with one or the
other.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Simplify selection of the address to publish for a given BTL TCP
module in the module exchange code.  Rather than looping through
all IP addresses associated with a node, looking for one that
matches the kindex of a module, loop over the modules and
use the address stored in the module structure.  This also
happens to be the address that the source will use to bind()
in a connect() call, so this should eliminate any confusion
(read: bugs) when an interface has multiple IPs associated with
it.

Refs open-mpi#5818

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
@bwbarrett
Copy link
Member Author

@bosilca fixed the address family option, which was tested on a platform with mixed IPv4 and IPv6 addresses.

Ran out of time to implement the "use the address specified in the if_include / if_exclude" design. I don't think it's a huge design change (change tcp_create to take a sockaddr_storage instead of a kindex), but pulling apart the spilt_and_resolve is kind of going to suck.

@jjhursey
Copy link
Member

bot:ibm:retest

@open-mpi open-mpi deleted a comment from ibm-ompi Oct 18, 2018
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 18, 2018
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 18, 2018
@bosilca bosilca merged commit 6900249 into open-mpi:master Oct 21, 2018
@bwbarrett bwbarrett deleted the bugfix/tcp-multi-address branch May 8, 2020 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants