Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psm cq_read thread safe? #1278

Closed
disprosium8 opened this issue Sep 15, 2015 · 7 comments
Closed

psm cq_read thread safe? #1278

disprosium8 opened this issue Sep 15, 2015 · 7 comments

Comments

@disprosium8
Copy link
Contributor

This goes away if I serialize access to my polling routine. sockets provider has no problem with concurrent access. My hints:

fi_ctx.hints->domain_attr->mr_mode = FI_MR_BASIC;
fi_ctx.hints->domain_attr->threading = FI_THREAD_SAFE;
fi_ctx.hints->rx_attr->comp_order = FI_ORDER_STRICT;
fi_ctx.hints->ep_attr->type = FI_EP_RDM;
fi_ctx.hints->caps = FI_MSG | FI_RMA;
fi_ctx.hints->mode = FI_CONTEXT | FI_LOCAL_MR | FI_RX_CQ_DATA;

Using d2418b2

Program received signal SIGSEGV, Segmentation fault.
0x00007f1a3094d10b in psmx_cq_readfrom (cq=0x27e6d10, buf=0x7ffc50af3b90, count=0, src_addr=0x0)
at ../../../../../src/contrib/libfabric/prov/psm/src/psmx_cq.c:596
596 if (!event->error) {
(gdb) bt
#0 0x00007f1a3094d10b in psmx_cq_readfrom (cq=0x27e6d10, buf=0x7ffc50af3b90, count=0, src_addr=0x0)

at ../../../../../src/contrib/libfabric/prov/psm/src/psmx_cq.c:596

#1 0x00007f1a3094d229 in psmx_cq_read (cq=0x27e6d10, buf=0x7ffc50af3b90, count=1)

at ../../../../../src/contrib/libfabric/prov/psm/src/psmx_cq.c:627

#2 0x00007f1a30b9c249 in fi_cq_read (cq=0x27e6d10, buf=0x7ffc50af3b90, count=1) at ../../../src/contrib/libfabric/include/rdma/fi_eq.h:364

@j-xiong
Copy link
Contributor

j-xiong commented Sep 15, 2015

The provider is not thread safe. This is not currently checked in fi_getinfo. Will add that.

@disprosium8
Copy link
Contributor Author

Thanks for the verification. Might be good to indicate this on the man page as well.

@disprosium8
Copy link
Contributor Author

On another note, it doesn't look like rx_attr->comp_order = FI_ORDER_DATA is observed either. I assume when using psm I'll always need to rely on recv side completion events to ensure the full message is written?

@j-xiong
Copy link
Contributor

j-xiong commented Sep 15, 2015

Right, there is no ordering implication at the receive side. The provider isn't checking every field of the hint but the output fi_info does have the fileds set to the supported values. Will add the missing checks.

@shefty
Copy link
Member

shefty commented Oct 29, 2015

Have the missing checks been added to the provider? If so, I can close this unless there is another open issue here.

@j-xiong
Copy link
Contributor

j-xiong commented Oct 29, 2015

Yes, the checking has been added.

@shefty
Copy link
Member

shefty commented Oct 29, 2015

I assumed that you had. Thanks.

@shefty shefty closed this as completed Oct 29, 2015
sungeunchoi pushed a commit to sungeunchoi/libfabric that referenced this issue Mar 10, 2017
prov/gni: Clean up scalable endpoint leak
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants