-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/efa: fi_info crash in a system with mlnx but no efa defice #7805
Labels
Comments
ghost
added
the
bug
label
Jun 6, 2022
@ofiwg/aws-efa-team |
looking into it. |
#7806 should fix the issue. @chien-intel would you please try this patch? |
PR #7806 fixed this issue. Feel free to close this issue after PR is merged. |
Thank you! will merge after CI finish. |
wzamazon
added a commit
to wzamazon/libfabric-1
that referenced
this issue
Jun 7, 2022
This patch added a unit test for the error handling of function efa_device_construct(), this is to reproduce the GitHub issue: ofiwg#7805 Signed-off-by: Wei Zhang <wzam@amazon.com>
wzamazon
added a commit
to wzamazon/libfabric-1
that referenced
this issue
Jun 7, 2022
This patch added a unit test for the error handling of function efa_device_construct(), this is to reproduce the GitHub issue: ofiwg#7805 Signed-off-by: Wei Zhang <wzam@amazon.com>
PR merged. I also checked that this issue only apply to main branch, therefore no backport is needed. Closing ... |
thank you. |
ghost
closed this as completed
Jun 7, 2022
jtamzn
pushed a commit
to jtamzn/libfabric
that referenced
this issue
Oct 19, 2022
This patch added a unit test for the error handling of function efa_device_construct(), this is to reproduce the GitHub issue: ofiwg#7805 Signed-off-by: Wei Zhang <wzam@amazon.com>
This issue was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
libfabric src on 54edc09, configured with debug and valgrind.
Running fi_info on a system with a Mellanox device without Efa produced this output:
fi_info
double free or corruption (!prev)
Aborted (core dumped)
Here is the gdb stack trace:
(gdb) bt
#0 0x00007ffff5e1437f in raise () from //lib64/libc.so.6
#1 0x00007ffff5dfedb5 in abort () from //lib64/libc.so.6
#2 0x00007ffff5e574e7 in __libc_message () from //lib64/libc.so.6
#3 0x00007ffff5e5e5ec in malloc_printerr () from //lib64/libc.so.6
#4 0x00007ffff5e6039c in _int_free () from //lib64/libc.so.6
#5 0x00007ffff4cfd3f2 in mlx5_free_context (ibctx=0x676190) at providers/mlx5/mlx5.c:1407
#6 0x00007ffff6bf08b5 in _ibv_close_device_1_1 (context=) at libibverbs/device.c:384
#7 0x00007ffff77b3ca7 in efa_device_destruct (device=0x66fb20) at prov/efa/src/efa_device.c:180
#8 0x00007ffff77b3ecd in efa_device_list_finalize () at prov/efa/src/efa_device.c:254
#9 0x00007ffff77b3e54 in efa_device_list_initialize () at prov/efa/src/efa_device.c:237
#10 0x00007ffff77c28ea in efa_prov_initialize () at prov/efa/src/efa_fabric.c:269
#11 0x00007ffff77c92dd in fi_efa_ini () at prov/efa/src/rxr/rxr_prov.c:111
#12 0x00007ffff770d6ff in fi_ini () at src/fabric.c:856
#13 0x00007ffff770e093 in fi_getinfo (version=65552, node=0x0, service=0x0, flags=0, hints=0x0, info=0x7fffffffd740) at src/fabric.c:1101
#14 0x0000000000401cc0 in run (hints=0x0, node=0x0, port=0x0, flags=0) at util/info.c:324
#15 0x0000000000402110 in main (argc=1, argv=0x7fffffffd888) at util/info.c:448
To Reproduce
Use libfabric src on sha 54edc09, configured with debug and valgrind and run fi_info on a system with mellanox but no efa. Probably any verbs capable device will do, other than efa.
Expected behavior
fi_info to display info and not crash
Output
see description.
Environment:
Reproduced on RHEL 8.2 and 8.5 with Mellanox and rdma-core installed.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: