Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
prov/verbs: Add missing lock to protect SRX
When compiled with --enable-debug, the test fi_shared_ctx fails on the following assertion failure: fi_shared_ctx: prov/verbs/src/verbs_ofi.h:993: vrb_alloc_ctx: Assertion `ofi_genlock_held(progress->active_lock)' failed. And here is the associated stack: 6 0x00007ffff7a39e96 in __GI___assert_fail (assertion=0x7ffff7f2ec28 "ofi_genlock_held(progress->active_lock)", file=0x7ffff7f2ec08 "prov/verbs/src/verbs_ofi.h", line=993, function=0x7ffff7f2f338 <__PRETTY_FUNCTION__.45> "vrb_alloc_ctx") at ./assert/assert.c:101 7 0x00007ffff7e06b9b in vrb_alloc_ctx (progress=0x5555555ca1a0) at prov/verbs/src/verbs_ofi.h:993 8 0x00007ffff7e0b9fe in vrb_post_srq (srx=0x5555555ca7c0, wr=0x7fffffffe120) at prov/verbs/src/verbs_ep.c:1603 9 0x00007ffff7e0bd2d in vrb_srx_recv (ep_fid=0x5555555ca7c0, buf=0x0, len=1024, desc=0x0, src_addr=18446744073709551615, context=0x55555556b740 <rx_ctx>) at prov/verbs/src/verbs_ep.c:1649 10 0x000055555555d720 in fi_recv (context=0x55555556b740 <rx_ctx>, src_addr=<optimized out>, desc=0x0, len=1024, buf=0x0, ep=0x5555555ca7c0) at /home/sdidelot/libfabric/include/rdma/fi_endpoint.h:297 11 ft_post_rx_buf (ep=0x5555555ca7c0, size=1024, ctx=0x55555556b740 <rx_ctx>, op_buf=0x0, op_mr_desc=0x0, op_tag=0) at common/shared.c:2392 12 0x000055555555d937 in ft_post_rx (ep=<optimized out>, size=<optimized out>, ctx=<optimized out>) at common/shared.c:2400 13 0x00005555555576cf in server_connect () at functional/shared_ctx.c:502 14 run () at functional/shared_ctx.c:547 15 main (argc=<optimized out>, argv=<optimized out>) at functional/shared_ctx.c:629 The problem is that vrb_post_srq() doesn't acquire progress->active_lock when vrb_alloc_ctx() is called, which may result in a race condition if multiple threads concurrently access the same SRX queue. Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
- Loading branch information