Renable running Arm64 test cases in CI #83948

kunalspathak · 2023-03-26T18:45:45Z

The original change prohibited running Arm tests from non-Arm platform, however, we build AnyOS/AnyConfiguration tests and we don't know what the targeted CI leg is for. For now, just remove such restriction and see if it picks up Arm test cases. After that, probably, add a property (or look for a property, if there is one available) that can be used to exclude the Arm test cases from getting included in non-arm platforms.

Fixes #83947

ghost · 2023-03-26T18:45:59Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

The original change prohibited running Arm tests from non-Arm platform, however, we build AnyOS/AnyConfiguration tests and we don't know what the targeted CI leg is for. For now, just remove such restriction and see if it picks up Arm test cases. After that, probably, add a property (or look for a property, if there is one available) that can be used to exclude the Arm test cases from getting included in non-arm platforms.

Fixes #83947

Author:	kunalspathak
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

kunalspathak · 2023-03-27T02:38:54Z

These are included now:

kunalspathak · 2023-03-27T02:39:09Z

@dotnet/jit-contrib

kunalspathak · 2023-03-27T02:39:59Z

/azp run runtime-coreclr jitstressregs

kunalspathak · 2023-03-27T02:40:15Z

/azp run runtime-coreclr jitstress2-jitstressregs

azure-pipelines · 2023-03-27T02:40:16Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2023-03-27T02:40:25Z

Azure Pipelines successfully started running 1 pipeline(s).

tannergooding · 2023-03-27T06:34:20Z

This likely needs to include the logic to continue filtering these on irrelevant platforms. The filtering was done due to the timeouts that otherwise get hit, particularly in stress jobs

kunalspathak · 2023-03-27T13:28:42Z

The filtering was done due to the timeouts that otherwise get hit, particularly in stress jobs

Do you know the source of timeouts? The tests already have a IsSupported to execute only on relevant platform. I know that there might be many test cases that are irrelevant, they should still complete fairly quickly, right? If my understanding is correct, the problem is that build-test-job is a common job triggered independent of the configuration and we are building it for "Any configuration". We do not build the binaries depending on configuration. I think we should have a follow-up PR to fix that problem, but currently the priority is to re-enable these test cases. I didn't realize that the newly added Arm64 test cases in #80297 are not run until I realized that there was a bug and still the CI was passing.

runtime/eng/pipelines/runtime.yml

Line 1028 in 9b38f2a

- CoreClrTestBuildHost # Either osx_x64 or linux_x64

tannergooding · 2023-03-27T15:14:30Z

Do you know the source of timeouts? The tests already have a IsSupported to execute only on relevant platform

There are a lot of tests. Also most of them explicitly test that the APIs throw PlatformNotSupportedException when IsSupported returns false, so that's a lot of exceptions that get tested.

The csproj check was meant to save on CI time to ensure that we weren't building as much and that we weren't running as much, particularly given the overhead that some stress modes add to every bit of managed code that executes.

If my understanding is correct, the problem is that build-test-job is a common job triggered independent of the configuration and we are building it for "Any configuration"

Hmmm. This makes it tricky to "do the right thing". On one hand we'll end up with hurting local build times and testing throughput and on the other CI won't be testing what we actually want/need for Pri0

The simplest thing for today would likely be to remove or comment out all the conditional exclusion logic at the build level and to run the stress tests in this PR to try to catch any timeouts caused. If we don't remove all the conditional logic, we'll still be building less than we should in some cases and not covering the right test scenarios.

That at least ensures CI is correct. It's not going to help with turnaround time for the average job which doesn't touch these, however. We could move them out to their own CI leg and only trigger them on the relevant platform, but that's "more work" overall and we can track it in a separate issue.

kunalspathak · 2023-03-27T17:11:54Z

The simplest thing for today would likely be to remove or comment out all the conditional exclusion logic at the build level

Yes, I will remove the _IncludeXarchHWIntrinsicTests too.

and to run the stress tests in this PR to try to catch any timeouts caused.

I did trigger jitstressregs and a jitstress pipeline. I will trigger gcstress too.

we can track it in a separate issue

#83980

kunalspathak · 2023-03-27T20:01:57Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

kunalspathak · 2023-03-27T20:02:08Z

/azp run runtime-coreclr jitstress

azure-pipelines · 2023-03-27T20:02:15Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2023-03-27T20:02:19Z

Azure Pipelines successfully started running 1 pipeline(s).

trylek

LGTM, thanks Kunal! Just to clarify test priority checks that the build script refers to at line 8 and 10 should still be legal, we aren't trying to build Pri0 and Pri1 tests at the same time (there's actually no way to do it now considering they're used in different pipelines); other than that you're absolutely right that the architecture-specific checks shouldn't be put in the primary csproj scripts exactly as you described.

kunalspathak · 2023-03-27T21:18:15Z

you're absolutely right that the architecture-specific checks shouldn't be put in the primary csproj scripts exactly as you described.

What is a good way to accomplish this if we want to not skip building certain csproj files based on architecture?

trylek · 2023-03-27T21:49:26Z

As I described in a comment on the issue thread

#83980

the current POR is basically to use the CLRTestTargetUnsupported property for tests applicable to only a certain OS / architecture or build / execution flavor (e.g. some tests are incompatible with GC stress, IJW tests are currently incompatible with Crossgen2 as it doesn't support mixed managed & native code etc.). I can easily imagine there may be previously overlooked inconsistencies in application of this rule. Please let me know if you believe there are cases that cannot leverage this technique so that we'd need to invent a new mechanism. For instance, the new-style merged test wrappers don't yet support any equivalent of this logic so that as a temporary measure we mark all architecture / OS-specific tests as "requiring process isolation" even though no isolation is really needed, all that's needed is for the merged wrapper to stop trying to directly call the tests in question and instead resort to the legacy way of running the test script. While not ideal, it thankfully applies to a tiny fraction of the entire test set. Once the bulk of test merging is behind us, we can revisit these and try to implement more performant solutions. That may require additional JIT work as one of the problems we hit previously was that JITting the merged test wrapper calling a test "for the wrong architecture" crashed JIT on an assertion failure despite the fact that we would have ultimately skipped execution of the test.

BruceForstall · 2023-03-27T22:42:38Z

It seems like with this change we'll run a lot more tests in the CI (some, of course, we should be running and currently aren't).

Should NumberOfStripesToUseInStress be increased to compensate?

Also, with this change, the comments:

    <!-- We have a lot of tests here so run them in outerloop on platforms where they aren't accelerated -->
    <!-- For most of these, that just involves excluding the projects when the architecture is mismatched -->

are obsolete and can be removed.

Also,

    <!-- For Vector512, we only have a very small pool of machines with acceleration support, so they are always outerloop -->

is also obsolete (all our Helix xarch machines support AVX-512).

tannergooding · 2023-03-27T22:50:34Z

Moved this to the related issue as I realized it'd be better to have the discussion there: #83980 (comment)

Just to clarify test priority checks that the build script refers to at line 8 and 10 should still be legal, we aren't trying to build Pri0 and Pri1 tests at the same time (there's actually no way to do it now considering they're used in different pipelines); other than that you're absolutely right that the architecture-specific checks shouldn't be put in the primary csproj scripts exactly as you described.

@trylek, the consideration here is that we have a lot of hardware intrinsic tests. All tests "can" run on any machine. However, the test pattern on mismatched hardware is validation that a PlatformNotSupportedException is thrown and this isn't important coverage to run "all the time".

For Pri1 we want to run "everything", regardless of target architecture. For Pri0 we want to only run the tests that are "possible supported" by the underlying hardware.

So, for Pri0 on Arm64 we want to run the https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/Arm tests. But we don't want to run the https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/X86 tests. We want to run both for Pri1.

Inversely for Pri0 on x86 Windows and x64 Windows/Linux/MacOS we want to run https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/X86 but don't want to run https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/Arm. We again want to run both sets for Pri1.

Given this, is there a setup that would allow this to be achieved?

kunalspathak · 2023-03-28T02:03:17Z

We will explore the ideas in a separate PR. I will merge this one to unblock the Arm64 testing.

kunalspathak · 2023-03-28T14:33:52Z

Triggered superpmi collection to capture the Arm64 test cases.

remove _IncludeArm64HWIntrinsicTests restriction

90488ec

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 26, 2023

ghost assigned kunalspathak Mar 26, 2023

kunalspathak changed the title ~~remove _IncludeArm64HWIntrinsicTests restriction~~ Renable running Arm64 test cases in CI Mar 26, 2023

kunalspathak marked this pull request as ready for review March 27, 2023 02:39

kunalspathak mentioned this pull request Mar 27, 2023

Arm64: Implement VectorTableLookup/VectorTableLookupExtension intrinsinsic + Consecutive registers support #80297

Merged

6 tasks

Remove _IncludeXarchHWIntrinsicTests

dbec5e5

tannergooding approved these changes Mar 27, 2023

View reviewed changes

build-analysis bot mentioned this pull request Mar 27, 2023

CI failure STRICT_JS doesn't work with MODULARIZE or EXPORT_ES6 #83986

Closed

kunalspathak requested a review from BruceForstall March 27, 2023 20:02

BruceForstall requested review from trylek and davidwrighton March 27, 2023 20:23

trylek approved these changes Mar 27, 2023

View reviewed changes

tannergooding mentioned this pull request Mar 27, 2023

Usage of TargetArchitecture in managed coreclr test #83980

Open

Remove comments

090f611

kunalspathak merged commit 92b6788 into dotnet:main Mar 28, 2023

kunalspathak deleted the arm64-test branch March 28, 2023 14:31

ghost locked as resolved and limited conversation to collaborators Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renable running Arm64 test cases in CI #83948

Renable running Arm64 test cases in CI #83948

kunalspathak commented Mar 26, 2023

ghost commented Mar 26, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

azure-pipelines bot commented Mar 27, 2023

azure-pipelines bot commented Mar 27, 2023

tannergooding commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

tannergooding commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

azure-pipelines bot commented Mar 27, 2023

azure-pipelines bot commented Mar 27, 2023

trylek left a comment

kunalspathak commented Mar 27, 2023

trylek commented Mar 27, 2023

BruceForstall commented Mar 27, 2023

tannergooding commented Mar 27, 2023 •

edited

Loading

kunalspathak commented Mar 28, 2023

kunalspathak commented Mar 28, 2023

Renable running Arm64 test cases in CI #83948

Renable running Arm64 test cases in CI #83948

Conversation

kunalspathak commented Mar 26, 2023

ghost commented Mar 26, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

azure-pipelines bot commented Mar 27, 2023

azure-pipelines bot commented Mar 27, 2023

tannergooding commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

tannergooding commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

kunalspathak commented Mar 27, 2023

azure-pipelines bot commented Mar 27, 2023

azure-pipelines bot commented Mar 27, 2023

trylek left a comment

Choose a reason for hiding this comment

kunalspathak commented Mar 27, 2023

trylek commented Mar 27, 2023

BruceForstall commented Mar 27, 2023

tannergooding commented Mar 27, 2023 • edited Loading

kunalspathak commented Mar 28, 2023

kunalspathak commented Mar 28, 2023

tannergooding commented Mar 27, 2023 •

edited

Loading