-
Notifications
You must be signed in to change notification settings - Fork 3k
Non-blocking recv() isn't really non-blocking with lwip stack #13197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers. |
cc @ARMmbed/mbed-os-ipcore |
Thanks for identifying this. I was unaware of this - I was convinced we were using non-blocking calls to lwIP. Indeed, TCP sockets are set to non-blocking. This is related to #13056, where there was confusion because I thought that sockets were created non-blocking, and that the lines in When I realised that wasn't the case, and If
Not quite following this comment - all I would hope that just replacing that (I think we could then turn off |
Alright I tried that change and everything seems to be working. Thanks!
Scratch that, it was just my misunderstanding how that part of the socket code works plus an issue with my test code causing bad performance. Everything seems to be fine with blocking sockets. |
@multiplemonomials Thanks for detailed report and @kjbracey-arm for the instant fix |
Description of defect
In my project I need to implement a system with a high update rate (2kHz) that communicates over Ethernet. So far I have sending working fine, but when I attempted receiving I ran into trouble: If there is currently no data in the receive buffer, it seems like UDPSocket::recvfrom() will block for at least 500 us!
I investigated the problem some more and I think I see what's going on. It looks like LWIPStack always creates LWIP sockets in blocking mode with a timeout of 1ms, regardless of what the Socket's blocking mode is set to. So, recv() will always wait about a millisecond for data to arrive before returning.
It seems like the best fix for this would be for LWIPStack to correctly set non-blocking sockets to be non-blocking in LWIP. Another, related issue is that blocking sockets still only use a timeout of 1, so they turn into a resource-intensive spinlock if recv() is called with a longer timeout. It looks like this could be avoided pretty easily by proxying the actual timeout value to lwip, so it will then mutex wait for the correct amount of time before returning.
Patch:
There are probably a lot of changes needed to properly handle non-blocking sockets, so I came up with a simple (but dirty) patch:
In sys_arch_mbox_fetch() in lwip_sys_arch.c, I changed:
to
Since LWIPInterface uses a timeout of 1 everywhere, this will change it to a timeout of 0, so the mutex wait will return immediately if there is no message. We need to do this here because otherwise it changes the 0 value to osWaitForever.
In my testing this brings the recv() delay down to where it should be, about 10 us. It will prevent blocking sockets from working properly though so keep that in mind. I also haven't tested it with TCP.
Target(s) affected by this defect ?
I was testing on a NUCLEO_F429ZI, but I think that this affects all targets.
Toolchain(s) (name and version) displaying this defect ?
GCC_ARM 2019 q4
What version of Mbed-os are you using (tag or sha) ?
mbed-os-5.15.0
What version(s) of tools are you using. List all that apply (E.g. mbed-cli)
I'm using my custom build system, https://github.com/USCRPL/mbed-cmake
This integrates with the MBed python scripts, and should have the same build behavior as mbed-cli.
How is this defect reproduced ?
socket.recvfrom()
.Expected behavior: With no data in the socket I expect recvfrom() to take no more than double digit numbers of microseconds to run.
Actual behavior: recvfrom() takes about 480-500us to execute.
The text was updated successfully, but these errors were encountered: