Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mono][Android][x64] JIT.HardwareIntrinsics.X86.*/Jit.Methodical.* tests failed #65840

Closed
MaximLipnin opened this issue Feb 24, 2022 · 22 comments
Labels
area-Infrastructure-mono blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Milestone

Comments

@MaximLipnin
Copy link
Contributor

MaximLipnin commented Feb 24, 2022

The JIT.HardwareIntrinsics.X86.*/Jit.Methodical.* tests failed on the Mono Android x64 Release @ Ubuntu.1804.Amd64.Android.29.Open CI job (Build)

An error message from the parsed output:

System.AggregateException : One or more errors occurred. (Failed to install mobile app.
Expected: True
Actual:   False) (The following constructor parameters did not have matching fixture data: _Global globalVar)
---- Failed to install mobile app.
Expected: True
Actual:   False
---- The following constructor parameters did not have matching fixture data: _Global globalVar

Occurrences 5/16-6/16:

Date Run Failures Details
6/16 PM 1829491 142 failures + 64 non-JIT failures Mono Android x64 Release @ Ubuntu.1804.Amd64.Android.29.Open
6/15 AM 1825935 ~191 failures Mono Android x64 Release @ Ubuntu.1804.Amd64.Android.29.Open
6/13 AM 1820913 ~15 failures
6/12 PM 1820505 ~159 failures
6/12 AM 1820087 ~204 failures
6/11 AM 1819219 ~204 failures
6/7 AM 1810649 ~244 failures
6/6 PM 1809660 ~376 failures
6/5 PM 1808009 ~148 failures
6/5 AM 1807614 ~146 failures
6/4 AM 1806758 ~175 failures
6/3 PM 1805934 ~297 failures
6/1 PM 1801263 ~572 failures
5/31 AM 1797399 ~56 failures
5/29 PM 1795815 ~133 failures
5/27 PM 1793879 4 failures Mono Android x64 Release @ Ubuntu.1804.Amd64.Android.29.Open
5/25 PM 1789813 ~108 failures
5/25 AM 1788542 ~368 failures
5/20 PM 1782101 ~236 failures
5/19 PM 1780395 ~78 failures Mono Android x64 Release @ Ubuntu.1804.Amd64.Android.29.Open
5/19 AM 1779269 ~54 failures Mono Android x64 Release @ Ubuntu.1804.Amd64.Android.29.Open
5/18 PM 1778170 ~113 failures Mono Android x64 Release @ Ubuntu.1804.Amd64.Android.29.Open
5/17 PM 1776043 6 failures
5/16 PM 1774196 ~39 failures
5/16 AM 1772864 ~21 failures
@dotnet-issue-labeler dotnet-issue-labeler bot added area-Infrastructure-mono untriaged New issue has not been triaged by the area owner labels Feb 24, 2022
@ghost
Copy link

ghost commented Feb 24, 2022

Tagging subscribers to this area: @directhex
See info in area-owners.md if you want to be subscribed.

Issue Details

The JIT.HardwareIntrinsics.X86.*/Jit.Methodical.* tests failed on the Mono Android x64 Release @ Ubuntu.1804.Amd64.Android.29.Open CI job (Build)

An error message from the parsed output:

System.AggregateException : One or more errors occurred. (Failed to install mobile app.
Expected: True
Actual:   False) (The following constructor parameters did not have matching fixture data: _Global globalVar)
---- Failed to install mobile app.
Expected: True
Actual:   False
---- The following constructor parameters did not have matching fixture data: _Global globalVar
Author: MaximLipnin
Assignees: -
Labels:

untriaged, area-Infrastructure-mono

Milestone: -

@MaximLipnin MaximLipnin added the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Feb 28, 2022
@MaximLipnin
Copy link
Contributor Author

It looks same as #59679 which was closed as fixed.
@fanyang-mono Do you happen to know who can address it?

@fanyang-mono
Copy link
Member

@MaximLipnin Did it fail with retry as well? (I know that the infrastructure will do a another run if the first run failed with this type of failure.)

@MaximLipnin
Copy link
Contributor Author

@MaximLipnin Did it fail with retry as well? (I know that the infrastructure will do a another run if the first run failed with this type of failure.)

Hmm, re-try seems to be successful but the whole test run is marked as failed anyway

@steveisok
Copy link
Member

@premun I suspect this may be infra related.

@mkhamoyan
Copy link
Contributor

@premun
Copy link
Member

premun commented May 20, 2022

I analyzed the linked builds and:

  1. The first is just a failing test
  2. The second failure has been fixed here Categorize Android instrumentation timeouts xharness#889
  3. The third is just a failing test, I don't see a failure there actually

But if there is a ---- Failed to install mobile app. error, we retry the work item as this is an infra failure that just happens from time to time.

@mkhamoyan is it possible runfo doesn't take into account that there are retries of work items and it searches all attempts and gives us false positives?

@mkhamoyan
Copy link
Contributor

I'm not sure how runfo searches. Maybe @jaredpar can help?

@fanyang-mono
Copy link
Member

@premun regarding to the third one, the test seems to be passed. According to this log message:
05-19 00:22:10.465 22728 22746 I DOTNET : MonoRunner finished, return-code=100
The test is marked as failed seems to be an infrastructure issue to me.

@fanyang-mono
Copy link
Member

But all of them were different syndrome than the one originally reported here. If the original issue is not happening any more. Maybe we should close this one and open a new one to track the new issue.

@premun
Copy link
Member

premun commented May 20, 2022

@fanyang-mono seems like XHarness is being killed somehow because of some time out and doesn't return any exit code:

        [00:22:12] dbug: Exit code: 0
                         Std out:
                         INSTRUMENTATION_RESULT: return-code=100
                         INSTRUMENTATION_CODE: 100
                         
                         
                         
        [00:24:02] info: Instrumentation finished normally with exit code 100
        [00:24:05] dbug: Executing command: '/datadisks/disk1/work/A9C208EA/p/microsoft.dotnet.xharness.cli/1.0.0-prerelease.22256.2/runtimes/any/native/adb/linux/adb -s emulator-5556 logcat -d '
        [00:35:53] info: Wrote current ADB log to /datadisks/disk1/work/A9C208EA/w/A83A0903/uploads/Reports/JIT.opt/OSR/pinnedlocal/adb-logcat-net.dot.JIT_opt-net.dot.MonoRunner.log
        [00:35:57] dbug: Saving diagnostics data to '/datadisks/disk1/work/A9C208EA/w/A83A0903/e/diagnostics.json'
        
        cmdLine:/datadisks/disk1/work/A9C208EA/w/A83A0903/e/JIT/opt/OSR/pinnedlocal/pinnedlocal.sh Timed Out (timeout in milliseconds: 3600000 from variable __TestTimeout, start: 5/18/2022 11:53:30 PM, end: 5/19/2022 12:53:30 AM)

There should be a line like this:

XHarness exit code: 0

I think -100 is the default that you set and it's never overridden?

@fanyang-mono
Copy link
Member

fanyang-mono commented May 20, 2022

Anything here suspicious?

      05-19 00:22:10.474 22718 22718 D AndroidRuntime: Shutting down VM
      05-19 00:22:10.509  1672  1672 I Zygote  : Process 22728 exited due to signal 9 (Killed)
      05-19 00:22:10.553  1916  1943 I libprocessgroup: Successfully killed process cgroup uid 10159 pid 22728 in 86ms
      05-19 00:22:10.596  1916  3158 D WificondControl: Scan result ready event
      05-19 00:22:58.063  1726  1726 E netmgr  : Failed to open QEMU pipe 'qemud:network': Invalid argument
      05-19 00:22:59.063  1731  1731 E wifi_forwarder: RemoteConnection failed to initialize: RemoteConnection failed to open pipe
      05-19 00:23:00.003  2064  2064 D KeyguardClockSwitch: Updating clock: 1223

When reading it, there seems to be a network issue.

@premun
Copy link
Member

premun commented May 20, 2022

I don't really have much experience reading these. I know that some of these such as "Shutting down VM" are there always but not sure about the others honestly

@karelz
Copy link
Member

karelz commented Jun 16, 2022

@premun @steveisok there seem to be about 24 failed Rolling test runs in last month (see updated top post) -- that is quite a lot and it is causing lots of noise in the test results analysis.
Can we do something about this? Perhaps even disable the entire test suite until we understand more?
As a result, the JIT/ test failures are most likely getting ignored.

@steveisok
Copy link
Member

Can we do something about this? Perhaps even disable the entire test suite until we understand more?

@fanyang-mono I'll defer to you here.

@karelz
Copy link
Member

karelz commented Jun 17, 2022

Looks like in latest run 1829491 (6/16 PM), there were also 64 non-JIT failures (GC, tracing, Regressions/coreclr, Reflection, couple of others). So, disabling JIT tests is not going to fully help -- it seems to be indeed infra problem.

Circling back to the original issue - the callstack detected for all these failures:

System.AggregateException : One or more errors occurred. (Failed to install mobile app. Expected: True Actual: False) (The following constructor parameters did not have matching fixture data: _Global globalVar)
---- Failed to install mobile app.
Expected: True
Actual: False
---- The following constructor parameters did not have matching fixture data: _Global globalVar

@premun is there anything that can be done for it?
Is it something that cannot be avoided? Or at least detected? (making it easier to ignore)
Also, were there retries that the build failures analysis perhaps could have detected?

@premun
Copy link
Member

premun commented Jun 17, 2022

We had a look at this with @fanyang-mono yesterday and:

  • One fix is waiting for 4 days here [main] Update dependencies from dotnet/arcade #70662 - this should bring retries back as there was a regression with older python version
  • One fix will happen on the side of the CoreCLR to improve the logs to make it more clear what's actually going on

@fanyang-mono
Copy link
Member

As @premun mentioned, the recent spike of Android failures were caused by an infrastructure issue due to recent python upgrade, which was fixed by dotnet/arcade#9663. It should go away as the change being merged into dotnet/runtime.

@steveisok
Copy link
Member

@fanyang-mono Can you check to see if this is still happening?

@steveisok steveisok removed the untriaged New issue has not been triaged by the area owner label Jul 5, 2022
@steveisok steveisok added this to the 7.0.0 milestone Jul 5, 2022
@fanyang-mono
Copy link
Member

Runtime tests are passing on Android now. Closing this issue.

@ghost ghost locked as resolved and limited conversation to collaborators Aug 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-mono blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'
Projects
None yet
Development

No branches or pull requests

6 participants