Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.RegularExpressions.Tests failing for release 3.1 #41286

Closed
Anipik opened this issue Aug 24, 2020 · 13 comments
Closed

System.Text.RegularExpressions.Tests failing for release 3.1 #41286

Anipik opened this issue Aug 24, 2020 · 13 comments

Comments

@Anipik
Copy link
Contributor

Anipik commented Aug 24, 2020

System.Text.RegularExpressions.Tests are failing on release 3.1 branch consistently.
@joperezr did some investigation to find if its related to a particular os or infra.

It Fails on a lot of different distros and some of them show that its a memory corruption error

~/work/ADF109DB/w/BA0C0A13/e ~/work/ADF109DB/w/BA0C0A13/e
  Discovering: System.Text.RegularExpressions.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Text.RegularExpressions.Tests (found 191 of 209 test cases)
  Starting:    System.Text.RegularExpressions.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 161: 11409 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Text.RegularExpressions.Tests.runtimeconfig.json --depsfile System.Text.RegularExpressions.Tests.deps.json xunit.console.dll System.Text.RegularExpressions.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
~/work/ADF109DB/w/BA0C0A13/e
----- end Thu Aug 20 23:45:14 UTC 2020 ----- exit code 139 ----------------------------------------------------------
exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.
Waiting a few seconds for any dump to be written..
Looking around for any Linux dump..
... found no dump in /home/helixbot/work/ADF109DB/w/BA0C0A13/

https://dev.azure.com/dnceng/public/_build/results?buildId=780834&view=logs&jobId=a52227ff-668e-56e2-c853-fe96ee61d984&j=a52227ff-668e-56e2-c853-fe96ee61d984&t=60819441-8b21-5634-db97-1a5a84d92242

There are dumps available offline.

cc @joperezr @ericstj @safern @ViktorHofer

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.RegularExpressions untriaged New issue has not been triaged by the area owner labels Aug 24, 2020
@ghost
Copy link

ghost commented Aug 24, 2020

Tagging subscribers to this area: @eerhardt, @pgovind
See info in area-owners.md if you want to be subscribed.

@Anipik Anipik added this to the 5.0.0 milestone Aug 24, 2020
@pgovind
Copy link
Contributor

pgovind commented Aug 24, 2020

Just adding a note that setting flags in S.T.RegEx.csproj to 1) turn on XUnit logs and 2) turn off parallel unit tests did not help. The logs didn't show anything, and the failure still occurred.

@joperezr
Copy link
Member

It is worth noting that this is not only happening on release/3.1, so whatever change that is causing the memory corruption is also in dotnet/runtime as when doing the investigation I found very similar failures in dotnet/runtime runs as well.

@pgovind
Copy link
Contributor

pgovind commented Aug 24, 2020

So we(me and @joperezr ) spent some time looking at this and I'm not sure we have enough actionable material here. We ran this query:

let workitems = WorkItems
| where FriendlyName == "System.Text.RegularExpressions.Tests"
| where Status == "BadExit"
| join (Jobs) on $left.JobName == $right.Name;
let workitemsLatest = workitems
| where Queued > ago(21d)
| where Repository == "dotnet/runtime"
| order by Queued;
let jobids = workitemsLatest
| where Source !contains "pr/"
| where ExitCode == "139"
| project JobName;
Files
| where JobName in (jobids)
| where WorkItemFriendlyName == "System.Text.RegularExpressions.Tests"

which resulted in 1 failing run with a coredump. When I investigated that run, it looks like many libs failed with time outs and seg faults, which leads me to believe that RegEx is not the problem. If I delete the | where Source !contains "pr/" part of the query to consider all PRs , I get many hits but in every instance I inspected, RegEx fails with other libs, never alone :/ In the release branches, no dumps are ever produced, only logs. I'm going to change the milestone to Future, and I'd like to keep this issue open to collect future failures in the hope that we can get a dump from a run where only RegEx fails

@pgovind pgovind removed the untriaged New issue has not been triaged by the area owner label Aug 24, 2020
@pgovind pgovind modified the milestones: 5.0.0, Future Aug 24, 2020
@ericstj
Copy link
Member

ericstj commented Nov 13, 2020

Here's the latest. Most looked like the process being killed.

===========================================================================================================
~/work/9F7608FE/w/AD7C0990/e ~/work/9F7608FE/w/AD7C0990/e
  Discovering: System.Text.RegularExpressions.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Text.RegularExpressions.Tests (found 135 test cases)
  Starting:    System.Text.RegularExpressions.Tests (parallel test collections = on, max threads = 2)
./RunTests.sh: line 161:  4823 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Text.RegularExpressions.Tests.runtimeconfig.json xunit.console.dll System.Text.RegularExpressions.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=nonnetcoreapptests -notrait category=nonlinuxtests -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
~/work/9F7608FE/w/AD7C0990/e
----- end Thu Nov 12 22:30:24 UTC 2020 ----- exit code 137 ----------------------------------------------------------
exit code 137 means SIGKILL Killed eg by kill

One was a little different. Could it be a lead?

===========================================================================================================
/root/helix/work/workitem /root/helix/work/workitem
  Discovering: System.Text.RegularExpressions.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.Text.RegularExpressions.Tests (found 135 test cases)
  Starting:    System.Text.RegularExpressions.Tests (parallel test collections = on, max threads = 2)
   System.Text.RegularExpressions.Tests: [Long Running Test] 'System.Text.RegularExpressions.Tests.RegexMatchTests.Match_SpecialUnicodeCharacters_enUS', Elapsed: 00:02:23
[Long Running Test] 'System.Text.RegularExpressions.Tests.RegexReplaceTests.Replace', Elapsed: 00:02:21
    System.Text.RegularExpressions.Tests.RegexMatchTests.Match_SpecialUnicodeCharacters_enUS [FAIL]
      Timed out at 11/12/2020 7:33:33 PM after 60000ms waiting for remote process.
      	Process ID: 190
      	Handle: 944
      	Name: dotnet
      	MainModule: /root/helix/work/correlation/dotnet
      	StartTime: 11/12/2020 7:32:32 PM
      	TotalProcessorTime: 00:00:00.3900000
      
      Stack Trace:
        /_/src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs(131,0): at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose(Boolean disposing)
        /_/src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs(55,0): at Microsoft.DotNet.RemoteExecutor.RemoteInvokeHandle.Dispose()
        /_/src/System.Text.RegularExpressions/tests/Regex.Match.Tests.cs(799,0): at System.Text.RegularExpressions.Tests.RegexMatchTests.Match_SpecialUnicodeCharacters_enUS()
  Finished:    System.Text.RegularExpressions.Tests
=== TEST EXECUTION SUMMARY ===
   System.Text.RegularExpressions.Tests  Total: 1505, Errors: 0, Failed: 1, Skipped: 0, Time: 344.404s
/root/helix/work/workitem

@ericstj ericstj added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' blocking-servicing labels Nov 13, 2020
@ericstj ericstj modified the milestones: Future, 6.0.0 Nov 13, 2020
@pgovind
Copy link
Contributor

pgovind commented Nov 13, 2020

Could it be a lead?

Nice! Will track this down tomorrow and disable it if it makes sense. I honestly have no idea why this would start failing in corefx suddenly. Maybe there was a globalization related change on the CI machines (because of ICU/(non-ICU)). Just guessing here.

@ericstj
Copy link
Member

ericstj commented Nov 13, 2020

Actually found we were hitting a similar thing in 5.0 that @stephentoub fixed. #2153 Maybe we could try backporting the test fix?

I'm pretty sure the test is just too expensive for CI; that one test class was doing ~58M regex matches,

Fix made was #2194

@pgovind
Copy link
Contributor

pgovind commented Nov 13, 2020

Ah, that PR was made before I started with RegEx, or I would've remembered! Out of curiosity, how did you find the issue/toub's PR?

@ericstj
Copy link
Member

ericstj commented Nov 13, 2020

I searched for "exit code 137". YMMV, this change touches tests which were added in 5.0, but it might also touch some that preexisted. I liked @stephentoub's technique for surveying the test workload. Would be nice to also survey memory usage, but that might not be present in the log.

@ViktorHofer
Copy link
Member

ViktorHofer commented Nov 30, 2020

@ericstj are you sure that blocking-servicing is the right label? I doubt this actually blocks servicing.

@Anipik
Copy link
Contributor Author

Anipik commented Nov 30, 2020

yes we can remove the blocking servicing and blocking-ci as well. we moved the failing test to outerloop

@Anipik Anipik removed blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' blocking-servicing labels Nov 30, 2020
@ericstj
Copy link
Member

ericstj commented Nov 30, 2020

I think this can actually be closed with @pgovind's fix, assuming we're seeing successful PRs runs now. I don't actually see any PRs after his change, however his was green.

@Anipik
Copy link
Contributor Author

Anipik commented Nov 30, 2020

assuming we're seeing successful PRs runs now.

Yes we have green builds now dotnet/corefx#43007

@Anipik Anipik closed this as completed Nov 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants