Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when calling ucp_listener_create using IB-enabled library on non-IB machine #6755

Closed
krakowski opened this issue May 4, 2021 · 10 comments
Assignees
Labels

Comments

@krakowski
Copy link

krakowski commented May 4, 2021

Describe the bug

Calling ucp_listener_create with a IB-enabled library on a machine without InfiniBand adapters leads to a segmentation fault due to a null pointer dereference in uct_listener_create (cm is null here). Debugging this issue I found out that this happens because the Worker's Connection Manager array contains a null pointer.

Disabling all IB-related features during configuration fixes this problem.

--with-verbs=no \
--with-rc=no \
--with-ud=no \
--with-dc=no \
--with-mlx5-dv=no \
--with-ib-hw-tm=no \
--with-dm=no \
--with-cm=no \
--with-rdmacm=no \

Steps to Reproduce

  • UCX version 1.10.0
  • Configured with contrib/configure-release

Setup and versions

  • Linux 5.4.85-1-MANJARO (x86_64)
  • rdma-core 30.0
@krakowski krakowski added the Bug label May 4, 2021
@krakowski
Copy link
Author

This may also be the cause of issue #6244

@alinask
Copy link
Contributor

alinask commented May 4, 2021

Hi @krakowski ,
can you please try setting the following environment parameter on both sides (client and server) to see it helps?
UCX_SOCKADDR_TLS_PRIORITY=tcp
(given that you are not setting any other UCX environment parameters)

@krakowski
Copy link
Author

Hi @alinask,

yes, I can confirm that it works by setting UCX_SOCKADDR_TLS_PRIORITY=tcp.

@alinask
Copy link
Contributor

alinask commented May 4, 2021

Can you please use this environment parameter for UCX v1.10?
We will fix this issue for the next UCX release.
Thanks!

@abellina
Copy link

abellina commented May 5, 2021

I just hit this also, and was about to file an issue @alinask thanks for jumping on it. UCX_SOCKADDR_TLS_PRIORITY=tcp also fixes it for me.

@abellina
Copy link

abellina commented May 5, 2021

@alinask @yosefe do you think it's possible to get this into 1.10.1?

@petro-rudenko fyi.

@alinask
Copy link
Contributor

alinask commented May 9, 2021

@alinask @yosefe do you think it's possible to get this into 1.10.1?

@petro-rudenko fyi.

@abellina The fix was merged into the UCX v1.10.x branch.

@alinask
Copy link
Contributor

alinask commented May 10, 2021

@krakowski @abellina Following the fix in the UCX v1.10.x branch, can we close this ticket?

@krakowski
Copy link
Author

From my side it can be closed. Thanks for the quick fix! 🙂

@abellina
Copy link

abellina commented May 10, 2021

Same for me. Thanks @alinask. Verified a TCP-only setup with v1.10.x is not segfaulting.

@alinask alinask closed this as completed May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants