Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runner action hangs after killing emulator with stop: not implemented #385

Open
ericswpark opened this issue Mar 18, 2024 · 25 comments
Open

Comments

@ericswpark
Copy link

When running my GitHub Actions workflow, the emulator runner action hangs after the emulator is killed, with the following log output:


Terminate Emulator
  /usr/local/lib/android/sdk/platform-tools/adb -s emulator-5554 emu kill
  OK: killing emulator, bye bye
  OK
  INFO    | Wait for emulator (pid 4126) 20 seconds to shutdown gracefully before kill;you can set environment variable ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL(in seconds) to change the default value (20 seconds)
INFO    | Discarding the changed state: command-line flag
WARNING | Discarding the changed state (command-line flag).
ERROR   | stop: Not implemented
@grodin
Copy link

grodin commented Mar 22, 2024

This seems to be related to #381. I debugged my workflow run with tmate which similarly had two crashpad_handler processes running and the same logs as in the issue.

Terminating the two crashpad_handler processes with SIGTERM allowed the android-emulator-runner step to complete.

@ericswpark
Copy link
Author

Seems like there should be a step at the end that kill -9s all the crashpad_handler processes.

Either that or disable crashpad_handler from running in the first place (I'm guessing it's some sort of error reporting mechanism from Google to report errors with the Android emulator?)

@grodin
Copy link

grodin commented Mar 25, 2024

So far I've managed to find out that crashpad-handler is the daemon part of crashpad, a crash reporter.

I've done some digging in the emulator source repo. It seems the emulator uses crashpad to report crashes back to Google, so we're not going to be able to prevent the crashpad-handler processes getting started.

Killing them after the emulator has shutdown seems like a reasonable workaround, but we should do that with SIGTERM first! Going straight to kill -KILL is a bit, er, overkill. I can't immediately think of any harm, given that the VM the action is running in will be thrown away soon, but it's generally not recommended to use SIGKILL until necessary.

I'm fairly certain that this will need to be done as part of the action though.

I suspect, but haven't confirmed, that the root cause of this is that when the emulator is told to shutdown, some node.js code sends a signal to the emulator, but then waits for the whole process tree to end, not just the emulator process. However, if the emulator doesn't pass on signals to it's child processes, they won't receive any signal telling them to quit, so the waiting will go on forever.

If I'm correct (will try to find out this week), trying to kill the extra processes outside of the action won't work, since that code will be waiting for the action to finish to have a chance to run. It's a classic deadlock effectively.

@benszedlmayer
Copy link

benszedlmayer commented Jun 18, 2024

Are there any updates here? My log output is exactly same as in the original issue, it completes all processes then just hangs forever. Trying to kill the crashpad process with kill -9 $(pgrep -f crashpad_handler) fails.

@troZee
Copy link

troZee commented Jun 19, 2024

I have a similar issue here: https://github.com/callstack/react-native-pager-view/actions/runs/9576519647/job/26403174451?pr=829 . Does anyone know how to fix it?

@limpbrains
Copy link

@mustalk
Copy link

mustalk commented Jul 17, 2024

For anyone facing this issue, I've discovered that setting the environment variable ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL: 60 helps sometimes, but it's inconsistent.

In my case, the issue seemed to be affected by the emulator saved snapshot, so not saving the AVD snapshot should help as well.

If none of the previous solutions work, as @grodin mentioned regarding the crashpad_handler, you can use the following steps to terminate the processes:

- name: Kill crashpad_handler processes
  if: always()
  run: |
    pkill -SIGTERM crashpad_handler || true
    sleep 5
    pkill -SIGKILL crashpad_handler || true

This should definitely stop the hang issue.

@ericswpark
Copy link
Author

@mustalk are you sure that step will work? My understanding is that the previous step will hang and stop execution of that step that will kill crashpad_handler.

@mustalk
Copy link

mustalk commented Jul 18, 2024

@ericswpark that's what i thought at first too, but to my surprise it did execute, even without the if: always(), at least in my setup, give it a try.

@fernando-jascovich
Copy link

I was having this issue while running manually android emulator (I'm not using android-emulator-runner). And looking for answers I came here.
After that I discovered the solution. You'll need to kill android emulator's qemu process with SIGSTOP. For example:

# Being XXXXXX pid for android sdk qemu-system process
kill STOP XXXXXXX

That will handle snapshot generation and crashpad_handler as expected and emulator will end successfully

@ashishb
Copy link

ashishb commented Jul 21, 2024

@mustalk your suggestion didn't work for me https://github.com/ashishb/adb-enhanced/actions/runs/10024919828/job/27707518728?pr=246, it is stuck at the emulator execution step for me

Strangely, it only impacts API 26 and 29 though for me.

@Bhuvanaarkala07
Copy link

Bhuvanaarkala07 commented Jul 24, 2024

Hi Team,

We are also facing similar kind of issue.
It was working 2 days back, but suddenly stops failing with below error,
Screenshot 2024-07-24 at 11 34 52 PM

Script what we are using is, runs-on: macos-13
timeout-minutes: 25

  • name: Checkout the code
    uses: actions/checkout@v4

    • name: set up JDK 17
      uses: actions/setup-java@v4
      with:
      distribution: 'temurin'
      java-version: 17

      • name: Gradle cache
        uses: gradle/gradle-build-action@v3
    • name: AVD cache
      uses: actions/cache@v4
      id: avd-cache
      with:
      path: |
      ~/.android/avd/*
      ~/.android/adb*
      key: avd-29

    • name: create AVD and generate snapshot for caching
      if: steps.avd-cache.outputs.cache-hit != 'true'
      uses: reactivecircus/android-emulator-runner@v2
      env:
      ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL: 60
      with:
      api-level: 29
      force-avd-creation: false
      emulator-options: -no-window -gpu swiftshader_indirect -noaudio -no-boot-anim -no-metrics -camera-back none
      disable-animations: false
      script: echo "Generated AVD snapshot for caching."

    • name: Run espresso tests
      uses: reactivecircus/android-emulator-runner@v2
      with:
      api-level: 29
      avd-name: test
      force-avd-creation: false
      emulator-options: -no-snapshot-save -no-window -gpu swiftshader_indirect -noaudio -no-boot-anim -no-metrics -camera-back none
      disable-animations: true
      script: ./gradlew connectedMockDebugAndroidTest

Can some one suggest what is wrong here?

@ashishb
Copy link

ashishb commented Jul 28, 2024

@Bhuvanaarkala07
Copy link

We are already using runs-on: macos-13 , but still shwoing above error.

@Braggiouy
Copy link

I am facing the same issue. The step does not terminate the emulator, and it stays stuck in the step. I tried @mustalk suggestion, but the workflow is not able to reach the step where it kills the crashpad_handler.

    - name: Set up the Android emulator and run tests
       uses: reactivecircus/android-emulator-runner@v2
       with:
         api-level: 33
         target: google_apis_playstore
         arch: x86_64
         emulator-boot-timeout: 600
         disable-animations: true
         script: ./scripts/run-tests.sh

Added process termination commands within the custom script ./scripts/run-tests.sh, but still no success. This script runs the Appium testing that I have integrated.

In addition, I have :

  • ANDROID_EMULATOR_WAIT_TIME_BEFORE_KILL: 60
  • And my job has the following values :
runs-on: ubuntu-latest
uses: reactivecircus/android-emulator-runner@v2

Context and Background

Emulator
Running Android emulators on GitHub Actions can be challenging due to the lack of KVM support on ubuntu-latest.

macOS vs. Ubuntu
The macos-latest runner includes pre-installed Android SDKs and better support for Android emulation. However, using macos-latest is more expensive compared to ubuntu-latest. Setting up a self-hosted macOS runner might be a cost-effective solution, but, I would like to try with the Ubuntu image first, if possible.

Hardware Acceleration
Starting on February 23, 2023, GitHub Actions users can leverage hardware acceleration on larger Linux runners, significantly improving Android emulator performance. This requires adding the runner user to the KVM user group:

- name: Enable KVM group perms
  run: |
    echo 'KERNEL=="kvm", GROUP="kvm", MODE="0666", OPTIONS+="static_node=kvm"' | sudo tee /etc/udev/rules.d/99-kvm4all.rules
    sudo udevadm control --reload-rules
    sudo udevadm trigger --name-match=kvm

Questions:

Any suggestions or guidance on resolving this issue would be greatly appreciated. Specifically, I need help ensuring the emulator terminates properly, and the workflow can proceed without getting stuck.

I read earlier that this could be fixed by using the macos-latest runner. Is there any possibility to fix this by using the ubuntu-latest one ?

@vaind
Copy link

vaind commented Aug 1, 2024

I've faced the same issue and was able to resolve it by making sure the appium instance I created in the test script gets shut down properly by the end of the script (cc @Braggiouy).

you can check pgrep -f appium before and after your script execution

@Braggiouy
Copy link

Thanks a million @vaind. That was indeed my issue. Seems that the appium instance was still running in the background, not allowing the Emulator to shut down properly. No need to manually kill the crashpad_handler. Good catch !

jamesarich added a commit to jamesarich/Meshtastic-Android that referenced this issue Aug 31, 2024
@pyricau
Copy link

pyricau commented Sep 9, 2024

I ran into a similar issue and seemingly fixed it by removing the step for AVD caching and setting force-avd-creation: true. This is obviously not ideal but it seems to get rid of the flakiness.

@cipolleschi
Copy link

cipolleschi commented Sep 24, 2024

Same issue here, from React Native.
Everything was working fine. Then I started working on that to add a new dimension to our matrix and it is still hanging.

I tried all the solution proposed here:

  • kill the crashpad_handle ==> does nothing
  • kill the qemu_system process ==> the Terminate emulator command hangs with no logs (there is no emulator anymore)
  • moved to macos executor ==> does nothing
  • increased timeouts ==> does nothing
  • set force-avd-creation: true ==> does nothing

I'm now trying to give more power to the machine and to the emulator to see if it helps.

Weirdly, on main it is still working fine.

@ychescale9 I'm sorry to ping you directly, but do you mind having a look at this or rerouting it to the right person?
I might be able to help, if you need, but I'd need some guidance in the codebase.

@ychescale9
Copy link
Member

Does it fail with all API levels? And did it just start failing recently? Chances are a newer version of emulator binary or system image might have introduced new issues. If that's the case pinging the emulator-build to an older version might help.

@cipolleschi
Copy link

Thanks! I think I find the issue: I have a local server to which the emulator connects by using websockets. If I teardown the server, the emulator shuts down properly. I think that the action is not able to shut down the emulator if there is something that it is connected to.

@ochkarik05
Copy link

ochkarik05 commented Oct 19, 2024

For me it only happens when avd cache is enabled and the cache itself exists before the run. After removing cache file the whole pipeline goes well for the first time. On the second push it always fails. It's true for API <= 29. It never fails for API 34 and I haven't tested it yet between 29 and 34.

Can anybody confirm that having a cache can somehow cause the issue?

@truong-pham-dang
Copy link

Braggiouy

@Braggiouy How did you manage to kill the appium instance ? I also have Appium tests integrated. I use:

kill -9 $(pgrep -f appium)

just right after the gradle command line to execute Appium tests. But the workflow never reaches that line, it sticks in the gradle command line causing the job hangs indefinitely.

@Braggiouy
Copy link

Braggiouy

@Braggiouy How did you manage to kill the appium instance ? I also have Appium tests integrated. I use:

kill -9 $(pgrep -f appium)

just right after the gradle command line to execute Appium tests. But the workflow never reaches that line, it sticks in the gradle command line causing the job hangs indefinitely.

In my solution, I use an external script that runs alongside reactivecircus/android-emulator-runner@v2.

This script is responsible for starting Appium, running the tests, and then killing the Appium process afterward.

As an example, this is how I managed to follow the previous steps :

# Start Appium Server
appium &
APPIUM_PID=$!
sleep 10

# Run Android Tests
yarn test:android

# Shut down Appium Server
kill $APPIUM_PID

Explanation:
In my approach, I explicitly control the starting and stopping of Appium. First, I start Appium in the background and capture its process ID (PID). This way, I can track the specific Appium instance while the tests run. Once the tests are finished, I use the stored PID to kill that exact instance of Appium, ensuring it doesn’t interfere with any other processes.

This avoids the issue with the method you mentioned, where the kill -9 $(pgrep -f appium) command might terminate Appium too early or fail if the Gradle process is still running.

Hope this helps!

@grodin
Copy link

grodin commented Nov 21, 2024

Just for completeness sake, I've worked around this by adding the following line to the end of the script run by the action:

killall -INT crashpad_handler || true

The || true is just so that if something gets fixed somewhere and there aren't any crashpad_handler processes jamming things up, that won't fail the job.

It seems to fix things for me, so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests