Skip to content

TCP BTL does not support Linux virtual interfaces #160

Open
@ompiteam

Description

@ompiteam

If you create a virtual ethernet device in Linux, the TCP BTL gets confused.

This is because the Linux kernel will use the same kernel index for both interfaces -- the TCP BTL fundamentally assumes that all interfaces will have a unique kernel index (we use that kernel index for indexing and unique identification in modex data). This is clearly a bad assumption.

I chatted with Ralph about this on the phone: we're wondering why the kernel index was used at all. Why not use the OPAL IF index? That ''is'' unique (in a process), and is suitable for both indexing and identification in modex data.

Ralph is going to revamp the OPAL IF interface soon, anyway (e.g., convert it from a list to an array) and will likely be removing all the kernel index stuff. This will force changing the TCP BTL to use the OPAL IF index (instead of the kernel index). This will likely solve the problem.

Once we fix this, perhaps Bart at Atipa can test it for us (he ran into the issue because he has eth0:0 on his cluster head node to talk to the IPMI network. He doesn't usually run MPI jobs on the head node, but he did this once and ran into hangs/badness, and I helped diagnose the issue). :-)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions