-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rare segfault on wait_for_update #84
Fix rare segfault on wait_for_update #84
Conversation
interesting find! it's a bit surprising that it happens here. rm2fb should have NULL ptr handling added to make sure it doesn't also segfault, i wonder what's happening. EINVAL for sem_open() indicates something weird is going on. |
LGTM! |
i'm still curious about the cause here - can you add print statements to help diagnose / see what the requested shm path is? if it isn't happening in rm2fb (we would see segfaults in plato if it was, right?), it is likely better to figure out and fix the underlying cause. EINVAL is given back when the name is invalid in shm_open() |
Reverted the attempt stuff and added a lot of print debugging (commit LinusCDE@ae72c35). Built current plato (master branch) against this lib: dist.tar.gz. You can just extract this on device and run the If you wish to see the exact sysv messages sent, you can add this spy lib between: libswtfb_sysv_spy.so.xz (Source) This really seems to be something that is really timing sensitive. Because running the binary with print statements works perfectly fine. You can see for yourself by replacing the "plato" binary with this one. The only change is that I just commented out every new It is also probably noteworthy to mention, that println! on rust by default locks a mutex. So it will probably cause a slightly longer delay than |
The O_CREAT option needs the mode parameter specified. The server code did do this, but not the client side, resulting in an error only if the client actually got to creating the semaphore first.
The new commit should fix the issue for good. Seems that the Not sure why the c++ client/wrapper didn't cause this. My only guess would be that the wrapper uses Entware's libc (currently 2.27) which defaults to some correct values if they are not specified. Since we usually use a generic compiler for rust on arm, the libc crate is most likely the system one (currently 2.31) which might just behave differently when the arguments. Specifying Edit: Mixed up which libc is which version. This seems to be more like a, we removed this default behaviour for clarification. So updating it for the c++ client is definitely recommended as it might be a case of future proofing, should entware's libc update. |
Had some weird behaviour. Not sure if really needed, added the |
Found this, when using this implementation for plato:
The cause was that
sem_timedwait
was called onsem_open
's resultSEM_FAILED
. That call failed witherrno == 22
("Invalid value"). Seems that the rust code called the code too fast and and probably made more likely by more frequent wait()'s in plato due to the ui system. Plato uses a bus system for almost everything, so afaik, calls should never be happen in parallel.The implemenation was 1:1 like the reference implementation. While this didn't ever happen on the official rm2fb client, the cause might still be interesting to investigate and could possibly hint to a problem in the original impl. (cc @raisjn)
Added a way to do several attempts with delays which should provide a good enough workaround. Also turned later occuring segfault into a proper panic, should 3 attempts/10ms delay be not enough for some app.