-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install / Uninstall apk Sometimes Fails on Android #354
Comments
This is most likely the adb server process getting wrecked somehow. We might be able to fix it by running |
/cc @MattGal |
@MattGal do you want to look into this or you'd prefer us to investigate? Cheers! |
I'm happy to have you all investigate... my thoughts here is if adb install fails it should perhaps time out earlier (5 minutes is the default in Xharness) and retry, requesting an infra retry and machine reboot if it fails. If you need me to be involved though I can be. |
@MattGal in this case there's no timeout, adb fails immediately with the "Broken pipe" error (see the timestamps in the log). |
If this is insufficient there are some helix-based extra things to try.
* Add a retry in the case of exit code 224; Seeks to address #354 If this is insufficient there are some helix-based extra things to try. * Apply suggestions from code review Co-authored-by: Přemek Vysoký <premek.vysoky@microsoft.com> Co-authored-by: Přemek Vysoký <premek.vysoky@microsoft.com>
Merged PR that does a single retry here, and moved the issue to validate. If this doesn't make the problem go away I want to teach XHarness how to ask for reboot / retry when it detects it's running in Helix. |
@MattGal I have noticed that there are other, seemingly similar failures with different exit codes. See dotnet/runtime#42548 (comment) for the latest example. What does xharness exit code 78 mean? I've found multiple rows in kusto with that exit code. |
@steveisok 78 seems to be |
This failure mode is not what I made a PR for, is not the log at the top, but it is the one I was talking about in this comment. I will investigate after I finish my inbox this morning, but my sense here is if adb uninstall fails, the emulator / device is in a broken state and this is the case where an infra retry / machine reboot makes good sense. |
This may or may not be related, but it also seems to be that it tries to install to early. I have a script that creates and boots the emulator, and then runs xharness. It has been failing on CI and locally (sometimes). However, if I boot up the emulator and then wait a bit, it installs packages just fine. It is also deceiving in that subsequent boots seem to work, but this is due to the fact that it is just restoring the emulator from a save point. If I also wait for the boot to complete using
|
@mattleibow interesting, I'd have thought I guess we can add |
or maybe add btw. when you say "it has been failing on CI" you don't mean the Helix queues right? the emulator should always be running there. |
"it has been failing on CI" refers to the default Azure hosted bots. I have to manually download, install, create and then spin them up. |
So looks like the #388 is working for fixing the issue we saw during apk installation on some of the physical machines, I saw one machine got autofixed with the new xharness by rebooting the devices:
@mattleibow that fix also includes your suggestion of using |
Looks like XHarness
|
We're checking for the exit code 224 for the "Broken pipe" error, looks like it can happen with exit code 1 as well :D |
Bummer, I am doing other Xharness tweaks now and will fold this in. |
I don't think the current delay for booting is long enough on the DevOps agents. I get some more greens, but I still am having the same errors. I am adding back my 10 min wait and will confirm. However, maybe this can be configurable via the CLI so that we never have to worry. If it is a really slow machine, then we can bump the number. |
Can you clarify what you mean by the devops agents? I'm unaware of any Android emulators on the Azure DevOps agents, this could be an unusual use case. https://github.com/dotnet/xharness/pull/392/files is up for this newest broken pipe issue. |
I install them myself using the android sdk manager and then use avd to create the emulators. It very well could be that they are not accelerated, but it does now work after extending the delay. |
Thanks for the details. In the case of most of the "Broken pipe" and "stuck offline" machines I've seen they stayed in this state until acted on externally (like restarting the adb server for the former and restarting the emulator / machine for the latter). If you have more suggestions about how to be reliable here I'd love to hear them (or see PRs) |
With booting, I have seen some emulators (mostly the API 30) take about 240 seconds to boot up properly. |
Yeah I think waiting up to 5mins for |
I'm doing PR feedbacks after lunch, will stick that in. |
I see there is a timeout arg for ios, maybe check to see if there is something we can do with that for consistency. |
I believe this is merged and hasn't been a problem, closing (feel free to reactivate) |
There are some runs indicated in dotnet/runtime#44306 that show either the apk could not be installed or removed. Good examples of this are in:
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-0d80c385864846a9b0/JIT/console.17d80b24.log?sv=2019-07-07&se=2020-11-22T18%3A21%3A05Z&sr=c&sp=rl&sig=7poXAdaraL1BwYAj7GxDrIhZLK%2F9Uo6PZg8P022tEmM%3D
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-43740-merge-1f95bf9c506d4bcb9e/JIT/console.8114e152.log?%3F%253F%25253Fsv%25253D2019-07-07%252526se%25253D2020-11-23T13%2525253A30%2525253A16Z%252526sr%25253Dc%252526sp%25253Drl%252526sig%25253Dyr2NQCCzTeXQKU3kTZHet06tu32Jnp1%2525252B6y3AUwdMIzM%2525253D
My guess is that adb is somehow corrupt and either resolves itself via workitem completion or an internal timeout. We should look into capturing the error state or states and try to correct it. We should try to avoid failing the tests the first time.
The text was updated successfully, but these errors were encountered: