Skip to content

Conversation

bwbarrett
Copy link
Member

Two patches that end up with simplifying the modex send so that we're not racy in a multi-IP per NIC design. We were already storing a source address in the local BTL, for use to bind() and force the source address to be known to us. With this patch, we guarantee that the address we use for bind() is the same we publish in the modex.

This should help with #5818.

Today, a btl tcp module is associated with exactly one IP
address (IPv4 or IPv6).  There's no need to reserve space
for both an IPv4 and IPv6 address in the module structure,
since the module will only be associated with one or the
other.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Simplify selection of the address to publish for a given BTL TCP
module in the module exchange code.  Rather than looping through
all IP addresses associated with a node, looking for one that
matches the kindex of a module, loop over the modules and
use the address stored in the module structure.  This also
happens to be the address that the source will use to bind()
in a connect() call, so this should eliminate any confusion
(read: bugs) when an interface has multiple IPs associated with
it.

Refs open-mpi#5818

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
@bwbarrett
Copy link
Member Author

@bosilca fixed the address family option, which was tested on a platform with mixed IPv4 and IPv6 addresses.

Ran out of time to implement the "use the address specified in the if_include / if_exclude" design. I don't think it's a huge design change (change tcp_create to take a sockaddr_storage instead of a kindex), but pulling apart the spilt_and_resolve is kind of going to suck.

@jjhursey
Copy link
Member

bot:ibm:retest

@open-mpi open-mpi deleted a comment from ibm-ompi Oct 18, 2018
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 18, 2018
@open-mpi open-mpi deleted a comment from ibm-ompi Oct 18, 2018
@bosilca bosilca merged commit 6900249 into open-mpi:master Oct 21, 2018
@bwbarrett bwbarrett deleted the bugfix/tcp-multi-address branch May 8, 2020 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants