-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
psm cq_read thread safe? #1278
Comments
The provider is not thread safe. This is not currently checked in fi_getinfo. Will add that. |
Thanks for the verification. Might be good to indicate this on the man page as well. |
On another note, it doesn't look like rx_attr->comp_order = FI_ORDER_DATA is observed either. I assume when using psm I'll always need to rely on recv side completion events to ensure the full message is written? |
Right, there is no ordering implication at the receive side. The provider isn't checking every field of the hint but the output fi_info does have the fileds set to the supported values. Will add the missing checks. |
Have the missing checks been added to the provider? If so, I can close this unless there is another open issue here. |
Yes, the checking has been added. |
I assumed that you had. Thanks. |
prov/gni: Clean up scalable endpoint leak
This goes away if I serialize access to my polling routine. sockets provider has no problem with concurrent access. My hints:
fi_ctx.hints->domain_attr->mr_mode = FI_MR_BASIC;
fi_ctx.hints->domain_attr->threading = FI_THREAD_SAFE;
fi_ctx.hints->rx_attr->comp_order = FI_ORDER_STRICT;
fi_ctx.hints->ep_attr->type = FI_EP_RDM;
fi_ctx.hints->caps = FI_MSG | FI_RMA;
fi_ctx.hints->mode = FI_CONTEXT | FI_LOCAL_MR | FI_RX_CQ_DATA;
Using d2418b2
Program received signal SIGSEGV, Segmentation fault.
0x00007f1a3094d10b in psmx_cq_readfrom (cq=0x27e6d10, buf=0x7ffc50af3b90, count=0, src_addr=0x0)
at ../../../../../src/contrib/libfabric/prov/psm/src/psmx_cq.c:596
596 if (!event->error) {
(gdb) bt
#0 0x00007f1a3094d10b in psmx_cq_readfrom (cq=0x27e6d10, buf=0x7ffc50af3b90, count=0, src_addr=0x0)
#1 0x00007f1a3094d229 in psmx_cq_read (cq=0x27e6d10, buf=0x7ffc50af3b90, count=1)
#2 0x00007f1a30b9c249 in fi_cq_read (cq=0x27e6d10, buf=0x7ffc50af3b90, count=1) at ../../../src/contrib/libfabric/include/rdma/fi_eq.h:364
The text was updated successfully, but these errors were encountered: