-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISC-V] Disable robust mutexes in PAL on Qemu #106123
Conversation
Before this change due to https://gitlab.com/qemu-project/qemu/-/issues/2446 we get lot of System.Threading.Tests/System.IO.IsolatedStorage.Tests errors like: System.Threading.Tests.MutexTests.AbandonExisting(name: "Local\\0908829282ce40db90b073ccd04b4c62", waitType: WaitAny, waitCount: 1, notAbandonedWaitIndex: 0, isNotAbandonedWaitObjectSignaled: False, abandonDuringWait: False) [FAIL] System.AggregateException : One or more errors occurred. (Not enough storage is available to process this command. : 'Local\0908829282ce40db90b073ccd04b4c62'. One or more system calls failed: pthread_mutex_init(...) == ENOTSUP/EOPNOTSUPP;) ---- System.IO.IOException : Not enough storage is available to process this command. : 'Local\0908829282ce40db90b073ccd04b4c62'. One or more system calls failed: pthread_mutex_init(...) == ENOTSUP/EOPNOTSUPP; In this patch we force PAL to use file locks which fixes System.Threading.Tests/System.IO.IsolatedStorage.Tests on Qemu.
This change means that the runtime build for QEMU is going to be different from the regular runtime build. I am not sure we want to take this path. It is different from the earlier uses of |
In alternative solution we may special-case only RISC-V and force cmake to perform robust mutex runtime checks on config time (without using hardcoded values from tryrun.cmake). Then I believe relying on Qemu shouldn't be longer needed. |
Wouldn't that mean a build produced under QEMU would always use file locks instead of pthread robust mutexes? I wonder if we should include both the pthread robust mutex and file lock implementations into the build and choose one at runtime based on the not-supported error. Then regardless of how the build is produced it would have consistent behavior when running under QEMU or not. |
Given that https://gitlab.com/qemu-project/qemu/-/issues/2424 is really "won't fix" and assuming glibc pthread_mutex_init implementation won't change much in nearest future I think answer is "yes" :)
If I understand your idea correctly it would require PAL mutex adjustment, I think at least PAL_CreateMutexW. I can explore this direction and check if that's enough or require more work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In mutex.cpp pthread robust mutex code and file lock specific code are scattered across file. Choosing one at runtime instead compile time NAMED_MUTEX_USE_PTHREAD_MUTEX macro need more adjustment than only in InitializeProcessSharedRobustRecursiveMutex/NamedMutexSharedData. I'm not convinced it's good idea. |
RISC-V Release-CLR-QEMU: 9389 / 9390 (99.99%)
Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz RISC-V Release-FX-QEMU: 703807 / 713575 (98.63%)
Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz Build information and linksGIT: # CORE_LIBS_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s libs /p:EnableSourceLink=false
# CORE_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -s clr+libs+host /p:EnableSourceLink=false
# TESTCLR_BUILD_CMD
runtime/src/tests/build.sh -riscv64 -cross -Release -priority1 -p:UseLocalAppHostPack=true
# TESTCLR_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --core_root ./coreclr.Release/Tests/Core_Root --testhost ./testhost.Release --atest ./coreclr.Release --test ./ --log_dir ./logs --timeout 2700 --log_level DEBUG
# TESTCLR_RUN
/godata/pipelines/Release-CLR-QEMU/logs/run_tests.log
cd "/_PATH_/_WITH_/_TEST_" && ROOTFS_DIR=/crossrootfs/riscv64 QEMU_LD_PREFIX=/crossrootfs/riscv64 __TestDotNetCmd=/godata/pipelines/Release-CLR-QEMU/testhost.Release/dotnet CORE_ROOT=/godata/pipelines/Release-CLR-QEMU/coreclr.Release/Tests/Core_Root /usr/bin/time -f "exec_time: %e" ./_TEST_BINARY_
# TESTFX_BUILD_CMD
runtime/build.sh --arch riscv64 --cross -c Release -rc Release -hc Release -lc Release -s libs.tests --testscope innerloop /p:EnableSourceLink=false /p:UseLocalAppHostPack=true
# TESTFX_CMD
python3 riscv-CI/goci/agent/TestRunner/run.py --corefx --testhost ./testhost.Release --atest ./corefx.Release --log_dir ./logs --timeout 6000 --memlimit 4096 --jobs 16 --log_level DEBUG
# TESTFX_RUN
/go-agent/pipelines/Release-FX-QEMU/logs/run_tests.log
cd "/_PATH_/_WITH_/_TEST_" && ROOTFS_DIR=/crossrootfs/riscv64 QEMU_LD_PREFIX=/crossrootfs/riscv64 __TestDotNetCmd=/go-agent/pipelines/Release-FX-QEMU/testhost.Release/dotnet /usr/bin/time -f "exec_time: %e" /go-agent/pipelines/Release-FX-QEMU/testhost.Release/dotnet exec xunit.console.dll _TEST_BINARY_ -nologo -nocolor -notrait category=failing
|
Yes, it would involve switching a number of compile-time checks to run-time checks such that both implementations are included in the build on Linux.
With that it sounds like a build produced under QEMU would differ in behavior from a build produced on similar non-emulated hardware, as the former would use the file lock implementation and the latter would use pthread robust mutex. Ideally, the produced build should have consistent behavior. For instance, it would be problematic to use QEMU to cross-build a patch and apply it on non-emulated hardware, if the shipped binaries were produced on non-emulated hardware. Maybe this kind of issue already exists to some degree, but perhaps it should be fixed. A safer approach may be to always use the file lock implementation based on arch, but that would be unfortunate, as it's less efficient. It's also very difficult to change the behavior between releases for compat reasons. I'm still favoring run-time checks over build-time checks. |
Thanks, now with this explanation solution based on run-time checks makes more sense to me. |
Before this change due to https://gitlab.com/qemu-project/qemu/-/issues/2446 we get lot of System.Threading.Tests/System.IO.IsolatedStorage.Tests errors like:
In this patch we force PAL to use file locks which fixes System.Threading.Tests/System.IO.IsolatedStorage.Tests on Qemu.
Part of #84834, cc @dotnet/samsung