Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.CodeDom.Tests nullref on arm64 #53042

Closed
hoyosjs opened this issue May 20, 2021 · 41 comments · Fixed by #53510
Closed

System.CodeDom.Tests nullref on arm64 #53042

hoyosjs opened this issue May 20, 2021 · 41 comments · Fixed by #53510
Assignees
Labels
area-VM-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' bug disabled-test The test is disabled in source code against the issue
Milestone

Comments

@hoyosjs
Copy link
Member

hoyosjs commented May 20, 2021

Started this morning. See https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E7+definition%3Aruntime+name%3ASystem.CodeDom.Tests.CodeAttributeArgumentCollectionTests

Runfo Tracking Issue: system.codedom.tests.codeattributeargumentcollectiontests

Build Definition Kind Run Name Console Core Dump Test Results Run Client
1159218 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1159218 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1159218 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1159218 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1159065 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1159065 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1159065 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1159065 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158923 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158923 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158923 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158923 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158808 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158808 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158808 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158808 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158600 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158600 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158600 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158600 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158368 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158368 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158368 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158368 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158131 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158131 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158131 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1158131 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157757 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157757 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157757 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157757 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157471 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157471 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157471 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157471 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157140 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157140 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157140 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157140 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157013 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157013 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157013 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1157013 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156910 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156910 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156910 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156910 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156852 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156852 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156852 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156852 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156761 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156761 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156761 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156761 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156559 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156559 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156559 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156559 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156287 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156287 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156287 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1156287 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155922 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155922 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155922 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155922 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155610 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155610 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155610 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155610 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155291 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155291 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155291 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155291 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155076 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155076 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155076 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1155076 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154833 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154833 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154833 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154833 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154745 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154745 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154745 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154745 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154637 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154637 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154637 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154637 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154469 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154469 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154469 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154469 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154298 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154298 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154298 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py
1154298 runtime Rolling net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open console.log runclient.py

Displaying 100 of 248 results

Build Result Summary

Day Hit Count Week Hit Count Month Hit Count
7 59 62
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.CodeDom untriaged New issue has not been triaged by the area owner labels May 20, 2021
@ghost
Copy link

ghost commented May 20, 2021

Tagging subscribers to this area: @buyaa-n, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

Started this morning. See https://runfo.azurewebsites.net/search/tests/?q=started%3A%7E7+definition%3Aruntime+name%3ASystem.CodeDom.Tests.CodeAttributeArgumentCollectionTests

Author: hoyosjs
Assignees: -
Labels:

area-System.CodeDom, untriaged

Milestone: -

@hoyosjs hoyosjs added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' and removed untriaged New issue has not been triaged by the area owner labels May 21, 2021
@danmoseley
Copy link
Member

Always net6.0-windows-Release-arm64-CoreCLR_release-Windows.10.Arm64.Open. No change in CodeDom code itself since May 18th.

   System.CodeDom.Tests.CodeAttributeArgumentCollectionTests.AddRange_CodeStatementArray_Works [FAIL]
      System.NullReferenceException : Object reference not set to an instance of an object.
      Stack Trace:
           at System.CodeDom.Tests.CodeCollectionTestBase`2.AddRange_TestData() in System.CodeDom.Tests.dll:token 0x6000423+0x0

The method is simply

        public static IEnumerable<object[]> AddRange_TestData()
        {
            yield return new object[] { new TItem[0] };
            yield return new object[] { new TItem[] { new TItem() } };
            yield return new object[] { new TItem[] { new TItem(), new TItem() } };
        }

where TItem is types like CodeCatchClause.

@dotnet/jit-contrib are you aware of any ARM64 specific codegen issue htat might be causing random NRE? I know this was happening on Apple Silicon, this is Windows.

@danmoseley
Copy link
Member

@hoyosjs do we run on net6.0-windows-Debug-arm64-CoreCLR_release-Windows.10.Arm64.Open? If so then it smells like an optimization issue.

@hoyosjs
Copy link
Member Author

hoyosjs commented May 24, 2021

Interesting part is if you look at other tests within CodeDom you see that there's linux-arm64 issues too with the same nullref on https://runfo.azurewebsites.net/search/tests/?q=started%3A~7%20definition%3Aruntime%20name%3Asystem.codedom.tests&pagenumber=1, but it's never on Debug. As for the running on debug, no. We do run checked, but I see Linux checked failures:

- template: /eng/pipelines/common/platform-matrix.yml
parameters:
jobTemplate: /eng/pipelines/common/templates/runtimes/run-test-job.yml
buildConfig: checked
platforms:
- Linux_arm
- windows_x86
- windows_arm64
helixQueueGroup: pr
helixQueuesTemplate: /eng/pipelines/coreclr/templates/helix-queues-setup.yml
jobParameters:
testGroup: innerloop
liveLibrariesBuildConfig: Release
condition: >-
or(
eq(dependencies.evaluate_paths.outputs['SetPathVars_coreclr.containsChange'], true),
eq(dependencies.evaluate_paths.outputs['SetPathVars_runtimetests.containsChange'], true),
eq(variables['isFullMatrix'], true))

@hoyosjs
Copy link
Member Author

hoyosjs commented May 24, 2021

There's no ARM64 debug.

@danmoseley
Copy link
Member

OK, so we know we get failures like this in Arm64 only, on release lib+release CoreCLR on Windows and release lib+checked CoreCLR on Ubuntu (but not Alpine/musl).

But apparently not on release lib + checked CoreCLR on Alpine (musl) or Windows nor release+release on Linux, nor release|debug on checked CoreCLR on Apple silicon, since I see those also in the runtime.yml. Nor do we see these failures on any Mono configuration.

That's not much of a pattern, but it does make it even less likely it's a libraries code issue, but either codegen or R2R (what do we R2R?)

I see all variations have "0x6000423+0x0" which I guess means that it's at the very first IL instruction in the method? (Why are there no PDB's deployed here to give line numbers?)

@hoyosjs
Copy link
Member Author

hoyosjs commented May 24, 2021

We R2R the product, but I think that doesn't apply to the tests. There's 10 failures today, so let me download a workitem from today and inspect it.

@danmoseley
Copy link
Member

danmoseley commented May 24, 2021

The first failure was 5/20/2021 5:24:28 PM - not sure whether this is UTC.

Nothing jumps out:

C:\git\runtime\src\libraries>git log --color --graph --pretty=format:"%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%ad) %C(bold blue)<%an>%Creset" --abbrev-commit --after="2021/05/19" --before="2021/05/21" **
* 08c801d15e0 - Implement Initialize on HashAlgorithm (#51402) (Fri May 21 18:12:18 2021 -0400) <Kevin Jones>
* e299f6a17d6 - Fix typo tihs -> this (#52398) (Sat May 22 00:03:21 2021 +0200) <Jonas Nyrup>
* e85b9f9140b - ensure we flush when a window limit is hit (#52797) (Fri May 21 14:25:01 2021 -0700) <Geoff Kizer>
* c0860776f13 - Remove unused field (#53100) (Fri May 21 12:35:08 2021 -0700) <Chris Ross>
* 5c656ca1825 - Skip flakey mobile library test crashes (#52931) (Fri May 21 15:18:14 2021 -0400) <Mitchell Hwang>
* 12a5044232b - Fixing nullability annotations on DateOnly and TimeOnly (#53088) (Fri May 21 11:22:39 2021 -0700) <Tanner Gooding>
* 43b1ce5d6b6 - Allow more efficient marshalling to IDispatch (#53065) (Fri May 21 22:26:17 2021 +0600) <Andrii Kurdiumov>
* becccb2c3c4 - Mark some APIs as unsupported on MacCatalyst (#53075) (Fri May 21 16:48:49 2021 +0300) <Maxim Lipnin>
* 8bf306c55c1 - Fix creation of platforms and sources package (#53073) (Fri May 21 15:11:41 2021 +0200) <Viktor Hofer>
* 202c9fd3463 - Big-endian fix: JsonWriterHelper (#52790) (Fri May 21 14:21:15 2021 +0200) <Ulrich Weigand>
* f96ce3a4720 - add SslStream_RandomWrites_OK test (#52682) (Fri May 21 12:01:36 2021 +0200) <Tomas Weinfurt>
* 5c80887836c - Make DSA.Create, AesCcm, AesGcm, ChaCha20Poly1305 throw PNSE on iOS (#52978) (Fri May 21 08:56:05 2021 +0200) <Filip Navara>
* d8fec8577ae - Add UnmanagedCallConvAttribute (#52869) (Thu May 20 23:07:39 2021 -0700) <Elinor Fung>
* e8c1965caf9 - Fix DataCommonEventSource EnterScope (#53043) (Fri May 21 04:45:15 2021 +0200) <Miha Zupan>
* be0a12d2196 - Resolving ILLink warnings for Microsoft.Extensions.Configuration.Binder (#52795) (Thu May 20 16:54:08 2021 -0700) <Jose Perez Rodriguez>
* 28c213fda48 - Make the lookup for getApplicationProtocol optional (#53001) (Thu May 20 19:16:57 2021 -0400) <Steve Pfister>
* 81a9e9ee1ae - Fix OperatingSystem.IsAndroidVersionAtLeast() (#53034) (Fri May 21 01:15:59 2021 +0200) <Alexander Köplinger>
* 999c9c0bf11 - Return null when Variant contains BSTR (#53030) (Fri May 21 04:34:02 2021 +0600) <Andrii Kurdiumov>
* 8b4684e8410 - StringValues.Count test null first (#52508) (Thu May 20 23:26:21 2021 +0100) <Ben Adams>
* e2f5c114c9d - Use Assembly.Load as first option to load TempAssembly. (#52429) (Thu May 20 13:56:55 2021 -0700) <Steve Molloy>
* aaaaeed8c8d - make TestBasePriorityOnWindows not fail when ruinning tests with `-low` (#53003) (Thu May 20 12:23:39 2021 -0700) <Vladimir Sadov>
* b3c10eefb1e - Enable Android arm device tests (#52935) (Thu May 20 13:44:08 2021 -0400) <Steve Pfister>
* 595c1225681 - JsonNode trimmability improvements (#52934) (Thu May 20 11:39:28 2021 -0500) <Steve Harter>
* 0d27b099bd9 - React to MSBuild Traversal and NoTargets SDK updates (#52895) (Thu May 20 08:28:04 2021 +0200) <Viktor Hofer>
* 4a782d58ac4 - Objective-C msgSend* support for pending exceptions in Release (#52849) (Wed May 19 22:03:34 2021 -0700) <Aaron Robinson>
* 84233f19eb7 - CoreLib missed Equals nullable annotations  (#52167) (Thu May 20 03:34:49 2021 +0300) <hrrrrustic>

C:\git\runtime\src\libraries>git log --color --graph --pretty=format:"%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%ad) %C(bold blue)<%an>%Creset" --abbrev-commit --after="2021/05/19" --before="2021/05/21" ../coreclr/**
* 91772b5803d - JIT: fix relop flags for peeled switch compare (#53096) (Fri May 21 12:19:25 2021 -0700) <Andy Ayers>
* 1e388cd5aa1 - Override BaseType in TypeRefTypeSystemType (#52963) (Fri May 21 20:35:06 2021 +0200) <Jakob Botsch Nielsen>
* 6854225c095 - Com trimming related work (#52940) (Fri May 21 06:12:46 2021 -0700) <Lakshan Fernando>
* dd6ff04525f - Implement 64-bit type handle histogram counts (#52898) (Fri May 21 12:24:28 2021 +0200) <Jakob Botsch Nielsen>
* d8fec8577ae - Add UnmanagedCallConvAttribute (#52869) (Thu May 20 23:07:39 2021 -0700) <Elinor Fung>
* 419765391c1 - Add a missing end of line to JITDUMP in lclmorph.cpp (#53028) (Fri May 21 06:48:30 2021 +0300) <SingleAccretion>
* 77f0728b9cc - Delete an unnecessary pessimization for x86. (#52803) (Thu May 20 20:27:02 2021 -0700) <Sergey Andreenko>
* 80c81b39ae7 - Print R2RDump Statistics in the Output File Instead of Console (#52278) (Thu May 20 19:46:20 2021 +0000) <Ivan Diaz Sanchez>
* e0eda5f17b2 - JIT: fix invocation of unboxed entry when method returns struct (#52998) (Thu May 20 11:56:11 2021 -0700) <Andy Ayers>
* 636f89d78ef - Move metadata off the executable heaps (#52912) (Thu May 20 11:02:22 2021 +0200) <Jan Vorlicek>
* 542ef8ba780 - Add native EventPipe event source generation into Mono build. (#52844) (Thu May 20 09:08:42 2021 +0200) <Johan Lorensson>
* 0d27b099bd9 - React to MSBuild Traversal and NoTargets SDK updates (#52895) (Thu May 20 08:28:04 2021 +0200) <Viktor Hofer>
* 4a782d58ac4 - Objective-C msgSend* support for pending exceptions in Release (#52849) (Wed May 19 22:03:34 2021 -0700) <Aaron Robinson>
* d49bcbe0441 - Add basic natvis visualizations for some VM types (#52853) (Wed May 19 16:43:37 2021 -0700) <Jeremy Koritzinsky>

@AndyAyersMS can you imagine how #52975 might be relevant?

edit, I mean #52998

@hoyosjs
Copy link
Member Author

hoyosjs commented May 24, 2021

Looking at this, the commits before failures are: https://github.com/dotnet/runtime/commits/d81ad044fa6830f5f31f6b6e8224ebf66a3c298c

@AndyAyersMS
Copy link
Member

@danmoseley Your query above ran vs 2020 and not 2021 ?

My change could be quite relevant, it involved devirtualization and enumerators.

@AndyAyersMS
Copy link
Member

(ah you must have fixed it...)

@AndyAyersMS
Copy link
Member

I'll take a look, but it will be a few hours.

@danmoseley
Copy link
Member

@hoyosjs that log is two hours earlier than e0eda5f . Unless you have better data, I think it is too early.

Repathing - thanks Andy.

@hoyosjs
Copy link
Member Author

hoyosjs commented May 24, 2021

@danmoseley https://dev.azure.com/dnceng/public/_build/results?buildId=1147916&view=results failed and it was a rolling build before @AndyAyersMS change went in

@danmoseley danmoseley added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.CodeDom labels May 24, 2021
@danmoseley
Copy link
Member

Oh, well so much for that theory. Still, I do believe codegen is most likely based on evidence above.

Can you find last commit that succeeded? Assuming this is consistent?

@hoyosjs
Copy link
Member Author

hoyosjs commented May 24, 2021

Let me verify consistency (although it looks like it is).

@hoyosjs
Copy link
Member Author

hoyosjs commented May 24, 2021

This was a single commit windows where failures started: #52912. @janvorli do you know what might have caused this?

@danmoseley danmoseley added area-VM-coreclr and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels May 25, 2021
@danmoseley
Copy link
Member

@hoyosjs was there a clever way you figured that out? Kusto?

@danmoseley
Copy link
Member

#53042 suggests similar NRE in some non codedom tests.

@janvorli
Copy link
Member

I am looking into it.

@janvorli
Copy link
Member

I have ran the test 10000 times in a loop on Windows ARM64 machine equivalent to the lab one (both checked and release versions of coreclr), I've also tried to run 5 instances in parallel 1000 times and I am unable to repro the failure.
Do we happen to have a dump from one of the failed runs? I have not found it.

@janvorli
Copy link
Member

Ah, my mistake, for some weird reason, I was building the stuff from a stale main. Trying again.

@kunalspathak
Copy link
Member

Alternatively, you can also use runfo to download helix payload of failing test.

@hoyosjs
Copy link
Member Author

hoyosjs commented May 26, 2021

It's a nullref that gets swallowed by the test runner. That being said I downloaded the payload from a failed run and ran it locally in a loop and never got the test to fail. Worth the query on Kusto to see if it was a particular agent. Update: 32 different issues have seen the issue in a uniform manner, so agent issues are unlikely.

@janvorli
Copy link
Member

Finally got it fail after hundreds of iterations, even though in a different manner (Fatal error. Internal CLR error. (0x80131506)).

@hoyosjs
Copy link
Member Author

hoyosjs commented May 27, 2021

This one is becoming a hot pretty hot issue in CI council. It doesn't affect PRs, but we have limited visibility into the CI health as this is taking down pretty 8-9 rolling builds a day. Should we back the change or disable the testing on the namespace?

@agocke
Copy link
Member

agocke commented May 27, 2021

Let's move these tests to staging for now.

@hoyosjs hoyosjs changed the title System.CodeDom.Tests.CodeAttributeArgumentCollectionTests nullref on Windows arm64 System.CodeDom.Tests nullref on arm64 May 27, 2021
@hoyosjs
Copy link
Member Author

hoyosjs commented May 27, 2021

Actually, after a bit more digging, this is also hitting Linux ARM64 runs equally hard, and those do affect PR runs. @agocke I didn't know you could move tests to staging. I thought you had to move the leg itself?

image

@agocke
Copy link
Member

agocke commented May 27, 2021

Ugh, we need to fix that -- staging isn't very useful if we can't move over individual tests.

OK then we should skip the tests.

@janvorli
Copy link
Member

@hoyosjs do you have a link for the Linux failures? I've been running the failing tests in a loop tens of thousands of times in a loop and in 10 shells in parallel on arm64 windows machine since yesterday to repro it. It reproed just once so far, but since the null reference is handled by the xunit, there is no dump and so no way to see what's wrong. So I have modified runtime to fail fast on any null reference exception. It has been running with this modification for about 8 hours without any crash yet.

@BruceForstall
Copy link
Member

@hoyosjs
Copy link
Member Author

hoyosjs commented May 27, 2021

export (3).xlsx

This file is all the failures with their queues and build links.

@hoyosjs
Copy link
Member Author

hoyosjs commented May 28, 2021

I got this to repro in a lab machine. The four tests that need disabling are:

AddRange_CodeStatementArray_Works
AddRange_CodeStatementCollection_Works
Ctor_CodeStatementCollection_Works
Ctor_Array_Works

@stephentoub stephentoub added disabled-test The test is disabled in source code against the issue bug labels May 28, 2021
@stephentoub stephentoub added this to the 6.0.0 milestone May 28, 2021
@janvorli
Copy link
Member

It turns out my change has not caused the issue, it has just exposed a bug that was in the runtime ever at least since the first github commit. There is a bug in the void StubLinkerCPU::EmitMovConstant method. This method generates a sequence of mov reg, #immediate to build a 64 bit constant out of upto four 16 bit chunks. The checks that determine when the constant is fully generated were wrong. In our case, the constant was a value with zero lowest 16 bits and the function generated just mov x0, #0.
See the if (!(constant & 0xFFFF)) return; and two other similar checks.

void StubLinkerCPU::EmitMovConstant(IntReg target, UINT64 constant)
{
#define WORD_MASK 0xFFFF
// Move the 64bit constant in 4 chunks (of 16 bits).
// MOVZ Rd, <1st word>, LSL 0
// MOVK Rd, <2nd word>, LSL 1
// MOVK Rd, <3nd word>, LSL 2
// MOVK Rd, <4nd word>, LSL 3
WORD word = (WORD) (constant & WORD_MASK);
Emit32((DWORD)(0xD2<<24 | (4)<<21 | word<<5 | target));
if (!(constant & 0xFFFF)) return;
word = (WORD) ((constant>>16) & WORD_MASK);
if (word != 0)
Emit32((DWORD)(0xF2<<24 | (5)<<21 | word<<5 | target));
if (!(constant & 0xFFFFFFFF)) return;
word = (WORD) ((constant>>32) & WORD_MASK);
if (word != 0)
Emit32((DWORD)(0xF2<<24 | (6)<<21 | word<<5 | target));
if (!(constant & 0xFFFFFFFFFFFF)) return;
word = (WORD) ((constant>>48) & WORD_MASK);
if (word != 0)
Emit32((DWORD)(0xF2<<24 | (7)<<21 | word<<5 | target));
#undef WORD_MASK
}

@janvorli
Copy link
Member

I'll send out a PR with a fix soon.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 31, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 1, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' bug disabled-test The test is disabled in source code against the issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants