Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install / Uninstall apk Sometimes Fails on Android #354

Closed
steveisok opened this issue Nov 5, 2020 · 26 comments
Closed

Install / Uninstall apk Sometimes Fails on Android #354

steveisok opened this issue Nov 5, 2020 · 26 comments
Assignees
Labels
android Android area bug

Comments

@steveisok
Copy link
Member

steveisok commented Nov 5, 2020

There are some runs indicated in dotnet/runtime#44306 that show either the apk could not be installed or removed. Good examples of this are in:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-0d80c385864846a9b0/JIT/console.17d80b24.log?sv=2019-07-07&se=2020-11-22T18%3A21%3A05Z&sr=c&sp=rl&sig=7poXAdaraL1BwYAj7GxDrIhZLK%2F9Uo6PZg8P022tEmM%3D

    [18:48:43] info: Attempting to remove apk 'net.dot.JIT_SIMD': 
      [18:48:43] fail: Error: Exit code: 224
                       Standard Output:
                       cmd: Failure calling service package: Broken pipe (32)

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-43740-merge-1f95bf9c506d4bcb9e/JIT/console.8114e152.log?%3F%253F%25253Fsv%25253D2019-07-07%252526se%25253D2020-11-23T13%2525253A30%2525253A16Z%252526sr%25253Dc%252526sp%25253Drl%252526sig%25253Dyr2NQCCzTeXQKU3kTZHet06tu32Jnp1%2525252B6y3AUwdMIzM%2525253D

My guess is that adb is somehow corrupt and either resolves itself via workitem completion or an internal timeout. We should look into capturing the error state or states and try to correct it. We should try to avoid failing the tests the first time.

@akoeplinger akoeplinger added the bug label Nov 5, 2020
@akoeplinger
Copy link
Member

This is most likely the adb server process getting wrecked somehow. We might be able to fix it by running adb kill-server and adb start-server to reboot it.

@akoeplinger
Copy link
Member

/cc @MattGal

@tkapin
Copy link
Member

tkapin commented Nov 12, 2020

@MattGal do you want to look into this or you'd prefer us to investigate? Cheers!

@MattGal
Copy link
Member

MattGal commented Nov 12, 2020

I'm happy to have you all investigate... my thoughts here is if adb install fails it should perhaps time out earlier (5 minutes is the default in Xharness) and retry, requesting an infra retry and machine reboot if it fails. If you need me to be involved though I can be.

@akoeplinger
Copy link
Member

@MattGal in this case there's no timeout, adb fails immediately with the "Broken pipe" error (see the timestamps in the log).

@MattGal MattGal self-assigned this Nov 13, 2020
MattGal added a commit to MattGal/xharness that referenced this issue Nov 14, 2020
If this is insufficient there are some helix-based extra things to try.
MattGal added a commit that referenced this issue Nov 16, 2020
* Add a retry in the case of exit code 224; Seeks to address #354

If this is insufficient there are some helix-based extra things to try.

* Apply suggestions from code review

Co-authored-by: Přemek Vysoký <premek.vysoky@microsoft.com>

Co-authored-by: Přemek Vysoký <premek.vysoky@microsoft.com>
@MattGal
Copy link
Member

MattGal commented Nov 16, 2020

Merged PR that does a single retry here, and moved the issue to validate. If this doesn't make the problem go away I want to teach XHarness how to ask for reboot / retry when it detects it's running in Helix.

@steveisok
Copy link
Member Author

@MattGal I have noticed that there are other, seemingly similar failures with different exit codes.

See dotnet/runtime#42548 (comment) for the latest example.

What does xharness exit code 78 mean? I've found multiple rows in kusto with that exit code.

@premun
Copy link
Member

premun commented Nov 18, 2020

@MattGal
Copy link
Member

MattGal commented Nov 18, 2020

I'm happy to have you all investigate... my thoughts here is if adb install fails it should perhaps time out earlier (5 minutes is the default in Xharness) and retry, requesting an infra retry and machine reboot if it fails. If you need me to be involved though I can be.

This failure mode is not what I made a PR for, is not the log at the top, but it is the one I was talking about in this comment. I will investigate after I finish my inbox this morning, but my sense here is if adb uninstall fails, the emulator / device is in a broken state and this is the case where an infra retry / machine reboot makes good sense.

@mattleibow
Copy link
Member

mattleibow commented Nov 26, 2020

This may or may not be related, but it also seems to be that it tries to install to early.

I have a script that creates and boots the emulator, and then runs xharness. It has been failing on CI and locally (sometimes). However, if I boot up the emulator and then wait a bit, it installs packages just fine. It is also deceiving in that subsequent boots seem to work, but this is due to the fact that it is just restoring the emulator from a save point.

If I also wait for the boot to complete using adb shell getprop sys.boot_completed, then it also works.

dbug: Executing command: 'C:\Users\mattl\.dotnet\tools\.store\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\runtimes\any\native\adb\windows\adb.exe -s emulator-5554 wait-for-device'
dbug:
dbug: Executing command: 'C:\Users\mattl\.dotnet\tools\.store\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\runtimes\any\native\adb\windows\adb.exe -s emulator-5554 logcat -c'
dbug: Executing command: 'C:\Users\mattl\.dotnet\tools\.store\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\runtimes\any\native\adb\windows\adb.exe -s emulator-5554 version'
dbug: Working with Android Debug Bridge version 1.0.41
      Version 29.0.6-6198805
      Installed as C:\Users\mattl\.dotnet\tools\.store\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\runtimes\any\native\adb\windows\adb.exe

info: Attempting to remove apk 'com.microsoft.azure.devicetests':
dbug: Executing command: 'C:\Users\mattl\.dotnet\tools\.store\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\runtimes\any\native\adb\windows\adb.exe -s emulator-5554 uninstall com.microsoft.azure.devicetests'
fail: Error: Exit code: 20
      Standard Output:

      Standard Error:


info: Attempting to install C:/Projects/azure-mobile-apps-net-client/e2etests/DeviceTests.Android/bin/Debug/com.microsoft.azure.devicetests-Signed.apk:
dbug: Executing command: 'C:\Users\mattl\.dotnet\tools\.store\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20575.2\runtimes\any\native\adb\windows\adb.exe -s emulator-5554 install "C:/Projects/azure-mobile-apps-net-client/e2etests/DeviceTests.Android/bin/Debug/com.microsoft.azure.devicetests-Signed.apk"'
fail: Error:
      Exit code: 1
      Standard Output:
      Performing Streamed Install

      Standard Error:
      Performing Streamed Install


crit: Install failure: Test command cannot continue
XHarness exit code: 78

@akoeplinger
Copy link
Member

@mattleibow interesting, I'd have thought wait-for-device would only return once the device is ready ¯_(ツ)_/¯

I guess we can add adb shell getprop sys.boot_completed to make sure.

@akoeplinger
Copy link
Member

or maybe add -delay-adb to the emulator command line: https://androidstudio.googleblog.com/2019/05/emulator-2907-canary.html ?

btw. when you say "it has been failing on CI" you don't mean the Helix queues right? the emulator should always be running there.

@mattleibow
Copy link
Member

"it has been failing on CI" refers to the default Azure hosted bots. I have to manually download, install, create and then spin them up.

@akoeplinger
Copy link
Member

So looks like the #388 is working for fixing the issue we saw during apk installation on some of the physical machines, I saw one machine got autofixed with the new xharness by rebooting the devices:

C:\h\w\A40208F8\w\AD510979\e>dotnet exec "C:\h\w\A40208F8\p/microsoft.dotnet.xharness.cli/1.0.0-prerelease.20601.1/tools/netcoreapp3.1/any/Microsoft.DotNet.XHarness.CLI.dll" android test --app "Microsoft.Extensions.FileProviders.Physical.Tests.apk" --output-directory "C:\h\w\A40208F8\w\AD510979\uploads" --timeout 1800 -p="net.dot.Microsoft.Extensions.FileProviders.Physical.Tests" -v -i=net.dot.MonoRunner  
XHarness command issued: android test --app Microsoft.Extensions.FileProviders.Physical.Tests.apk --output-directory C:\h\w\A40208F8\w\AD510979\uploads --timeout 1800 -p=net.dot.Microsoft.Extensions.FileProviders.Physical.Tests -v -i=net.dot.MonoRunner
[05:17:02] dbug: Android Test command called: App = C:\h\w\A40208F8\w\AD510979\e\Microsoft.Extensions.FileProviders.Physical.Tests.apk
                 Instrumentation Name = net.dot.MonoRunner
[05:17:02] dbug: Output Directory:C:\h\w\A40208F8\w\AD510979\uploads
                 Timeout = 1800 seconds.
[05:17:02] dbug: Arguments to instrumentation:
[05:17:02] dbug: ADBRunner using ADB.exe supplied from C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\tools\netcoreapp3.1\any\..\..\..\runtimes\any\native\adb\windows\adb.exe
[05:17:02] dbug: Full resolved path:'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe'
[05:17:02] info: Will attempt to run device on detected architecture: 'arm64-v8a'
[05:17:02] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe  start-server'
[05:17:02] dbug: 
[05:17:02] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe  devices -l'
[05:17:02] dbug: Evaluating output line for device serial: 017AY1S22P             device product:sargo model:Pixel_3a device:sargo transport_id:1
[05:17:02] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe  -s 017AY1S22P shell getprop ro.product.cpu.abi'
[05:17:03] info: Using first-found compatible device of 1 total- serial: '017AY1S22P' - Arch: arm64-v8a
[05:17:03] info: Active Android device set to serial '017AY1S22P'
[05:17:03] info: Waiting for device to be available (max 5 minutes)
[05:17:03] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P wait-for-device'
[05:17:03] dbug: 
[05:17:03] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P shell getprop sys.boot_completed'
[05:17:03] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P logcat -c'
[05:17:03] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P version'
[05:17:03] dbug: Working with Android Debug Bridge version 1.0.41
                 Version 29.0.6-6198805
                 Installed as C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe
                 
[05:17:03] info: Attempting to remove apk 'net.dot.Microsoft.Extensions.FileProviders.Physical.Tests': 
[05:17:03] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P uninstall net.dot.Microsoft.Extensions.FileProviders.Physical.Tests'
[05:17:03] info: APK 'net.dot.Microsoft.Extensions.FileProviders.Physical.Tests' not on device.
[05:17:03] info: Attempting to install C:\h\w\A40208F8\w\AD510979\e\Microsoft.Extensions.FileProviders.Physical.Tests.apk: 
[05:17:03] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P install "C:\h\w\A40208F8\w\AD510979\e\Microsoft.Extensions.FileProviders.Physical.Tests.apk"'
[05:17:06] warn: It seems the package installation cache may be full on the device.  We'll try to reboot it before trying one more time.
                 Output:Exit code: 1
                 Standard Output:
                 Performing Streamed Install
                 
                 Standard Error:
                 adb: failed to install C:\h\w\A40208F8\w\AD510979\e\Microsoft.Extensions.FileProviders.Physical.Tests.apk: Failure [INSTALL_FAILED_INSUFFICIENT_STORAGE: Scanning Failed.: Package net.dot.Microsoft.Extensions.FileProviders.Physical.Tests could not be assigned a valid UID]
                 
                 
[05:17:06] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P reboot'
[05:17:10] info: Waiting for device to be available (max 5 minutes)
[05:17:10] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P wait-for-device'
[05:17:19] dbug: 
[05:17:19] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P shell getprop sys.boot_completed'
[05:17:19] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P shell getprop sys.boot_completed'
[05:17:19] dbug: sys.boot_completed = ''
[05:17:29] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P shell getprop sys.boot_completed'
[05:17:29] dbug: sys.boot_completed = '1'
[05:17:39] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P install "C:\h\w\A40208F8\w\AD510979\e\Microsoft.Extensions.FileProviders.Physical.Tests.apk"'
[05:17:43] info: Successfully installed C:\h\w\A40208F8\w\AD510979\e\Microsoft.Extensions.FileProviders.Physical.Tests.apk.
[05:17:43] info: Killing all running processes for 'net.dot.Microsoft.Extensions.FileProviders.Physical.Tests': 
[05:17:43] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P shell am kill --user all net.dot.Microsoft.Extensions.FileProviders.Physical.Tests'
[05:17:43] dbug: Success!
                 
[05:17:43] info: Starting instrumentation class 'net.dot.MonoRunner' on net.dot.Microsoft.Extensions.FileProviders.Physical.Tests
[05:17:43] dbug: Raw command: 'shell am instrument  -w net.dot.Microsoft.Extensions.FileProviders.Physical.Tests/net.dot.MonoRunner'
[05:17:43] dbug: Executing command: 'C:\h\w\A40208F8\p\microsoft.dotnet.xharness.cli\1.0.0-prerelease.20601.1\runtimes\any\native\adb\windows\adb.exe -s 017AY1S22P shell am instrument  -w net.dot.Microsoft.Extensions.FileProviders.Physical.Tests/net.dot.MonoRunner'
[05:18:03] info: Running instrumentation class net.dot.MonoRunner took 20.1445738 seconds
[05:18:03] dbug: Exit code: 0
                 Standard Output:
                 INSTRUMENTATION_RESULT: return-code=0
                 INSTRUMENTATION_RESULT: test-results-path=/storage/emulated/0/Android/data/net.dot.Microsoft.Extensions.FileProviders.Physical.Tests/files/Documents/testResults.xml
                 INSTRUMENTATION_CODE: 0

@mattleibow that fix also includes your suggestion of using sys.boot_completed for ensuring the device is booted so please try xharness >= 1.0.0-prerelease.20601.1.

@premun
Copy link
Member

premun commented Dec 3, 2020

Looks like XHarness 20601.1 which should contain the fix from #354 is still failing:
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-45486-merge-3c2f97f16f154752a9/Microsoft.Extensions.Configuration.EnvironmentVariables.Tests/console.74bfb490.log?sv=2019-07-07&se=2020-12-23T10:26:30Z&sr=c&sp=rl&sig=ImcNQ%2BZX5lvEc%2FHKcQ1cjkfhiIu5go%2F%2F6dcPcPw2aWw%3D

[02:29:12] fail: Error:
                 Exit code: 1
                 Standard Output:
                 Performing Streamed Install
                 
                 Standard Error:
                 adb: failed to install C:\h\w\AB7309B6\w\BD3C0A3B\e\Microsoft.Extensions.Configuration.EnvironmentVariables.Tests.apk: cmd: Failure calling service package: Broken pipe (32)
                 
                 
[02:29:12] crit: Install failure: Test command cannot continue
XHarness exit code: 78

@akoeplinger
Copy link
Member

We're checking for the exit code 224 for the "Broken pipe" error, looks like it can happen with exit code 1 as well :D

@MattGal
Copy link
Member

MattGal commented Dec 3, 2020

Bummer, I am doing other Xharness tweaks now and will fold this in.

@mattleibow
Copy link
Member

I don't think the current delay for booting is long enough on the DevOps agents. I get some more greens, but I still am having the same errors. I am adding back my 10 min wait and will confirm.

However, maybe this can be configurable via the CLI so that we never have to worry. If it is a really slow machine, then we can bump the number.

@MattGal
Copy link
Member

MattGal commented Dec 3, 2020

I don't think the current delay for booting is long enough on the DevOps agents. I get some more greens, but I still am having the same errors. I am adding back my 10 min wait and will confirm.

However, maybe this can be configurable via the CLI so that we never have to worry. If it is a really slow machine, then we can bump the number.

Can you clarify what you mean by the devops agents? I'm unaware of any Android emulators on the Azure DevOps agents, this could be an unusual use case.

https://github.com/dotnet/xharness/pull/392/files is up for this newest broken pipe issue.

@mattleibow
Copy link
Member

mattleibow commented Dec 3, 2020

I install them myself using the android sdk manager and then use avd to create the emulators. It very well could be that they are not accelerated, but it does now work after extending the delay.

@MattGal
Copy link
Member

MattGal commented Dec 3, 2020

Thanks for the details. In the case of most of the "Broken pipe" and "stuck offline" machines I've seen they stayed in this state until acted on externally (like restarting the adb server for the former and restarting the emulator / machine for the latter). If you have more suggestions about how to be reliable here I'd love to hear them (or see PRs)

@mattleibow
Copy link
Member

With booting, I have seen some emulators (mostly the API 30) take about 240 seconds to boot up properly.

@akoeplinger
Copy link
Member

Yeah I think waiting up to 5mins for sys.boot_completed is fine as a default and ideally it'd be overridable.

@MattGal
Copy link
Member

MattGal commented Dec 4, 2020

I'm doing PR feedbacks after lunch, will stick that in.

@mattleibow
Copy link
Member

I see there is a timeout arg for ios, maybe check to see if there is something we can do with that for consistency.

@MattGal
Copy link
Member

MattGal commented Jan 29, 2021

I believe this is merged and hasn't been a problem, closing (feel free to reactivate)

@MattGal MattGal closed this as completed Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
android Android area bug
Projects
None yet
Development

No branches or pull requests

6 participants