Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renable running Arm64 test cases in CI #83948

Merged
merged 3 commits into from
Mar 28, 2023

Conversation

kunalspathak
Copy link
Member

The original change prohibited running Arm tests from non-Arm platform, however, we build AnyOS/AnyConfiguration tests and we don't know what the targeted CI leg is for. For now, just remove such restriction and see if it picks up Arm test cases. After that, probably, add a property (or look for a property, if there is one available) that can be used to exclude the Arm test cases from getting included in non-arm platforms.

Fixes #83947

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 26, 2023
@ghost ghost assigned kunalspathak Mar 26, 2023
@kunalspathak kunalspathak changed the title remove _IncludeArm64HWIntrinsicTests restriction Renable running Arm64 test cases in CI Mar 26, 2023
@ghost
Copy link

ghost commented Mar 26, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

The original change prohibited running Arm tests from non-Arm platform, however, we build AnyOS/AnyConfiguration tests and we don't know what the targeted CI leg is for. For now, just remove such restriction and see if it picks up Arm test cases. After that, probably, add a property (or look for a property, if there is one available) that can be used to exclude the Arm test cases from getting included in non-arm platforms.

Fixes #83947

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member Author

These are included now:

image

@kunalspathak kunalspathak marked this pull request as ready for review March 27, 2023 02:39
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Member Author

/azp run runtime-coreclr jitstressregs

@kunalspathak
Copy link
Member Author

/azp run runtime-coreclr jitstress2-jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tannergooding
Copy link
Member

This likely needs to include the logic to continue filtering these on irrelevant platforms. The filtering was done due to the timeouts that otherwise get hit, particularly in stress jobs

@kunalspathak
Copy link
Member Author

The filtering was done due to the timeouts that otherwise get hit, particularly in stress jobs

Do you know the source of timeouts? The tests already have a IsSupported to execute only on relevant platform. I know that there might be many test cases that are irrelevant, they should still complete fairly quickly, right? If my understanding is correct, the problem is that build-test-job is a common job triggered independent of the configuration and we are building it for "Any configuration". We do not build the binaries depending on configuration. I think we should have a follow-up PR to fix that problem, but currently the priority is to re-enable these test cases. I didn't realize that the newly added Arm64 test cases in #80297 are not run until I realized that there was a bug and still the CI was passing.

- CoreClrTestBuildHost # Either osx_x64 or linux_x64

@tannergooding
Copy link
Member

Do you know the source of timeouts? The tests already have a IsSupported to execute only on relevant platform

There are a lot of tests. Also most of them explicitly test that the APIs throw PlatformNotSupportedException when IsSupported returns false, so that's a lot of exceptions that get tested.

The csproj check was meant to save on CI time to ensure that we weren't building as much and that we weren't running as much, particularly given the overhead that some stress modes add to every bit of managed code that executes.

If my understanding is correct, the problem is that build-test-job is a common job triggered independent of the configuration and we are building it for "Any configuration"

Hmmm. This makes it tricky to "do the right thing". On one hand we'll end up with hurting local build times and testing throughput and on the other CI won't be testing what we actually want/need for Pri0

The simplest thing for today would likely be to remove or comment out all the conditional exclusion logic at the build level and to run the stress tests in this PR to try to catch any timeouts caused. If we don't remove all the conditional logic, we'll still be building less than we should in some cases and not covering the right test scenarios.

That at least ensures CI is correct. It's not going to help with turnaround time for the average job which doesn't touch these, however. We could move them out to their own CI leg and only trigger them on the relevant platform, but that's "more work" overall and we can track it in a separate issue.

@kunalspathak
Copy link
Member Author

The simplest thing for today would likely be to remove or comment out all the conditional exclusion logic at the build level

Yes, I will remove the _IncludeXarchHWIntrinsicTests too.

and to run the stress tests in this PR to try to catch any timeouts caused.

I did trigger jitstressregs and a jitstress pipeline. I will trigger gcstress too.

we can track it in a separate issue

#83980

@kunalspathak
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@kunalspathak
Copy link
Member Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Kunal! Just to clarify test priority checks that the build script refers to at line 8 and 10 should still be legal, we aren't trying to build Pri0 and Pri1 tests at the same time (there's actually no way to do it now considering they're used in different pipelines); other than that you're absolutely right that the architecture-specific checks shouldn't be put in the primary csproj scripts exactly as you described.

@kunalspathak
Copy link
Member Author

you're absolutely right that the architecture-specific checks shouldn't be put in the primary csproj scripts exactly as you described.

What is a good way to accomplish this if we want to not skip building certain csproj files based on architecture?

@trylek
Copy link
Member

trylek commented Mar 27, 2023

As I described in a comment on the issue thread

#83980

the current POR is basically to use the CLRTestTargetUnsupported property for tests applicable to only a certain OS / architecture or build / execution flavor (e.g. some tests are incompatible with GC stress, IJW tests are currently incompatible with Crossgen2 as it doesn't support mixed managed & native code etc.). I can easily imagine there may be previously overlooked inconsistencies in application of this rule. Please let me know if you believe there are cases that cannot leverage this technique so that we'd need to invent a new mechanism. For instance, the new-style merged test wrappers don't yet support any equivalent of this logic so that as a temporary measure we mark all architecture / OS-specific tests as "requiring process isolation" even though no isolation is really needed, all that's needed is for the merged wrapper to stop trying to directly call the tests in question and instead resort to the legacy way of running the test script. While not ideal, it thankfully applies to a tiny fraction of the entire test set. Once the bulk of test merging is behind us, we can revisit these and try to implement more performant solutions. That may require additional JIT work as one of the problems we hit previously was that JITting the merged test wrapper calling a test "for the wrong architecture" crashed JIT on an assertion failure despite the fact that we would have ultimately skipped execution of the test.

@BruceForstall
Copy link
Member

It seems like with this change we'll run a lot more tests in the CI (some, of course, we should be running and currently aren't).

Should NumberOfStripesToUseInStress be increased to compensate?

Also, with this change, the comments:

    <!-- We have a lot of tests here so run them in outerloop on platforms where they aren't accelerated -->
    <!-- For most of these, that just involves excluding the projects when the architecture is mismatched -->

are obsolete and can be removed.

Also,

    <!-- For Vector512, we only have a very small pool of machines with acceleration support, so they are always outerloop -->

is also obsolete (all our Helix xarch machines support AVX-512).

@tannergooding
Copy link
Member

tannergooding commented Mar 27, 2023

Moved this to the related issue as I realized it'd be better to have the discussion there: #83980 (comment)


Just to clarify test priority checks that the build script refers to at line 8 and 10 should still be legal, we aren't trying to build Pri0 and Pri1 tests at the same time (there's actually no way to do it now considering they're used in different pipelines); other than that you're absolutely right that the architecture-specific checks shouldn't be put in the primary csproj scripts exactly as you described.

@trylek, the consideration here is that we have a lot of hardware intrinsic tests. All tests "can" run on any machine. However, the test pattern on mismatched hardware is validation that a PlatformNotSupportedException is thrown and this isn't important coverage to run "all the time".

For Pri1 we want to run "everything", regardless of target architecture. For Pri0 we want to only run the tests that are "possible supported" by the underlying hardware.

So, for Pri0 on Arm64 we want to run the https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/Arm tests. But we don't want to run the https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/X86 tests. We want to run both for Pri1.

Inversely for Pri0 on x86 Windows and x64 Windows/Linux/MacOS we want to run https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/X86 but don't want to run https://github.com/dotnet/runtime/tree/main/src/tests/JIT/HardwareIntrinsics/Arm. We again want to run both sets for Pri1.

Given this, is there a setup that would allow this to be achieved?

@kunalspathak
Copy link
Member Author

We will explore the ideas in a separate PR. I will merge this one to unblock the Arm64 testing.

@kunalspathak kunalspathak merged commit 92b6788 into dotnet:main Mar 28, 2023
@kunalspathak kunalspathak deleted the arm64-test branch March 28, 2023 14:31
@kunalspathak
Copy link
Member Author

Triggered superpmi collection to capture the Arm64 test cases.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arm64 tests are not run as part of CI
4 participants