-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test failure : Regressions\\coreclr\\GitHub_45929\\test45929\\test45929.cmd #46803
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Also fail in runtime 20210116.4
Error messages:
|
Failed again in run : runtime-coreclr outerloop 20210126.8 Failed test : R2R windows arm Checked @ Windows.10.Arm64v8.Open Error message:
|
Failed again in run runtime-coreclr outerloop 20210206.2 Failed test:
Error message:
|
Failed again in runtime-coreclr r2r 20210223.1 Failed test:
Error message:
|
Related: #47697 |
I havent seen this fail in r2r runs for the past month. |
Failed again in: runtime-coreclr r2r-extra 20210807.1 Failed test:
One of error message:
|
I found https://dev.azure.com/dnceng/public/_build/results?buildId=1282759&view=ms.vss-test-web.build-test-results-tab&runId=37887884&resultId=110132&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab which has a failure with a crash dump. I'm looking into that now. |
@jakobbotsch @AndyAyersMS Looking at the crash dump it appears that the thread is quite close to a tail call frame. Are we aware of any gcstress issues with tail call? I haven't been able to reproduce this error on my local hardware on Windows so I'm having quite a bit of difficulty getting sos to tell me what actual managed functions are on the stack, but the presence a pointer to g_sentinelTailCallFrame and the code was recently in TailCallTls::AllocArgBuffer (although it isn't there anymore) |
I'm not aware of anything. There are only a handful of open GC stress issues. |
Hmm, well, I was only able to isolate the dump I looked at to determine that it had replaced the top of the stack program counter with an object reference, but I couldn't identify any further details. However, I couldn't identify any pathway for that to happen. |
Are the failing legs running with tailcall stress? Otherwise it's probably not tailcall related since there are no tail. prefixes in the test and we don't do opportunistic tailcalls on ARM32. |
- The watson codebase manipulates the state of the following fields on Exception in a lock-free manner without locks if there are multiple threads throwing the same exception - _stackTrace - _stackTraceString - _remoteStackTraceString - _watsonBuckets - _ipForWatsonBuckets - The designed behavior is that these apis should "mostly" be correct, but as they are only used for fatal shutdown scenarios, exact correctness is not required for correct program execution - However, there are some race conditions that have been seen recently in testing 1. In some circumstances, the value will be explicitly read from multiple times, where the first read is to check for NULL, and then a second read is to read the actual value and use it in some way. In the presence of a race which sets the value to NULL, the runtime can crash. To fix this, the code is refactored in cases which could lead to crashes with a single read, and carrying around the read value to where it needs to go. 2. Since the C++ memory model generally allows a single read written in C++ to be converted into multiple reads if the compiler can prove that the read does not cross a lock/memory barrier, it is possible for the C++ compiler to inject multiple reads where the logic naturally only has 1. The fix for this is to utlilize the VolatileLoadWithoutBarrier api to specify that a read should happen once in cases where it might cause a problem. Finally, the test45929 was tended to fail in GC stress as it would take a very long time to run under GC stress or on some hardware. Adjust it so that it shuts down after about 2.5 minutes. - Do this instead of disabling running under gcstress as there is evidence that there may have been bugs seen during runs under gcstress. Fixes dotnet#46803
#57684) * Fix stress issues around multiple threads throwing the same exceptions - The watson codebase manipulates the state of the following fields on Exception in a lock-free manner without locks if there are multiple threads throwing the same exception - _stackTrace - _stackTraceString - _remoteStackTraceString - _watsonBuckets - _ipForWatsonBuckets - The designed behavior is that these apis should "mostly" be correct, but as they are only used for fatal shutdown scenarios, exact correctness is not required for correct program execution - However, there are some race conditions that have been seen recently in testing 1. In some circumstances, the value will be explicitly read from multiple times, where the first read is to check for NULL, and then a second read is to read the actual value and use it in some way. In the presence of a race which sets the value to NULL, the runtime can crash. To fix this, the code is refactored in cases which could lead to crashes with a single read, and carrying around the read value to where it needs to go. 2. Since the C++ memory model generally allows a single read written in C++ to be converted into multiple reads if the compiler can prove that the read does not cross a lock/memory barrier, it is possible for the C++ compiler to inject multiple reads where the logic naturally only has 1. The fix for this is to utlilize the VolatileLoadWithoutBarrier api to specify that a read should happen once in cases where it might cause a problem. Finally, the test45929 was tended to fail in GC stress as it would take a very long time to run under GC stress or on some hardware. Adjust it so that it shuts down after about 2.5 minutes. - Do this instead of disabling running under gcstress as there is evidence that there may have been bugs seen during runs under gcstress. Fixes #46803 * Rename as per suggestion
…s - The watson codebase manipulates the state of the following fields on Exception in a lock-free manner without locks if there are multiple threads throwing the same exception - _stackTrace - _stackTraceString - _remoteStackTraceString - _watsonBuckets - _ipForWatsonBuckets - The designed behavior is that these apis should "mostly" be correct, but as they are only used for fatal shutdown scenarios, exact correctness is not required for correct program execution - However, there are some race conditions that have been seen recently in testing 1. In some circumstances, the value will be explicitly read from multiple times, where the first read is to check for NULL, and then a second read is to read the actual value and use it in some way. In the presence of a race which sets the value to NULL, the runtime can crash. To fix this, the code is refactored in cases which could lead to crashes with a single read, and carrying around the read value to where it needs to go. 2. Since the C++ memory model generally allows a single read written in C++ to be converted into multiple reads if the compiler can prove that the read does not cross a lock/memory barrier, it is possible for the C++ compiler to inject multiple reads where the logic naturally only has 1. The fix for this is to utlilize the VolatileLoadWithoutBarrier api to specify that a read should happen once in cases where it might cause a problem. Finally, the test45929 was tended to fail in GC stress as it would take a very long time to run under GC stress or on some hardware. Adjust it so that it shuts down after about 2.5 minutes. - Do this instead of disabling running under gcstress as there is evidence that there may have been bugs seen during runs under gcstress. Fixes #46803
…same exceptions (#57959) * Fix stress issues around multiple threads throwing the same exceptions - The watson codebase manipulates the state of the following fields on Exception in a lock-free manner without locks if there are multiple threads throwing the same exception - _stackTrace - _stackTraceString - _remoteStackTraceString - _watsonBuckets - _ipForWatsonBuckets - The designed behavior is that these apis should "mostly" be correct, but as they are only used for fatal shutdown scenarios, exact correctness is not required for correct program execution - However, there are some race conditions that have been seen recently in testing 1. In some circumstances, the value will be explicitly read from multiple times, where the first read is to check for NULL, and then a second read is to read the actual value and use it in some way. In the presence of a race which sets the value to NULL, the runtime can crash. To fix this, the code is refactored in cases which could lead to crashes with a single read, and carrying around the read value to where it needs to go. 2. Since the C++ memory model generally allows a single read written in C++ to be converted into multiple reads if the compiler can prove that the read does not cross a lock/memory barrier, it is possible for the C++ compiler to inject multiple reads where the logic naturally only has 1. The fix for this is to utlilize the VolatileLoadWithoutBarrier api to specify that a read should happen once in cases where it might cause a problem. Finally, the test45929 was tended to fail in GC stress as it would take a very long time to run under GC stress or on some hardware. Adjust it so that it shuts down after about 2.5 minutes. - Do this instead of disabling running under gcstress as there is evidence that there may have been bugs seen during runs under gcstress. Fixes #46803 * Rename as per suggestion Co-authored-by: David Wrighton <davidwr@microsoft.com>
Job : runtime-coreclr outerloop 20210109.2
Failed test:
Regressions\\coreclr\\GitHub_45929\\test45929\\test45929.cmd
R2R windows arm Checked no_tiered_compilation @ Windows.10.Arm64v8.Open
Error message:
The text was updated successfully, but these errors were encountered: