Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue when running shards on Android (#1853) #1998

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tokou
Copy link
Contributor

@tokou tokou commented Sep 1, 2024

Proposed changes

override fun deviceInfo(): DeviceInfo {
return runDeviceCall {
val response = blockingStubWithTimeout.deviceInfo(deviceInfoRequest {})

The issue seems to originate in the simultaneous call to blockingStubWithTimeout.deviceInfo in AndroidDriver#deviceInfo() from Maestro#deviceInfo() which causes the following error:

Stacktrace "java.net.ConnectException: Connection refused"
io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:275)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:256)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:169)
	at maestro_android.MaestroDriverGrpc$MaestroDriverBlockingStub.deviceInfo(MaestroDriverGrpc.java:634)
	at maestro.drivers.AndroidDriver$deviceInfo$1.invoke(AndroidDriver.kt:182)
	at maestro.drivers.AndroidDriver$deviceInfo$1.invoke(AndroidDriver.kt:181)
	at maestro.drivers.AndroidDriver.runDeviceCall(AndroidDriver.kt:1070)
	at maestro.drivers.AndroidDriver.deviceInfo(AndroidDriver.kt:181)
	at maestro.Maestro.fetchDeviceInfo(Maestro.kt:63)
	at maestro.Maestro.access$fetchDeviceInfo(Maestro.kt:39)
	at maestro.Maestro$cachedDeviceInfo$2.invoke(Maestro.kt:47)
	at maestro.Maestro$cachedDeviceInfo$2.invoke(Maestro.kt:46)
	at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
	at maestro.Maestro.getCachedDeviceInfo(Maestro.kt:46)
	at maestro.Maestro.deviceInfo(Maestro.kt:57)
	at maestro.orchestra.Orchestra.initJsEngine(Orchestra.kt:217)
	at maestro.orchestra.Orchestra.runFlow(Orchestra.kt:107)
	at maestro.cli.runner.TestSuiteInteractor.runFlow(TestSuiteInteractor.kt:250)
	at maestro.cli.runner.TestSuiteInteractor.runTestSuite(TestSuiteInteractor.kt:82)
	at maestro.cli.command.TestCommand$handleSessions$1$1$results$1$1$1.invoke(TestCommand.kt:280)
	at maestro.cli.command.TestCommand$handleSessions$1$1$results$1$1$1.invoke(TestCommand.kt:258)
	at maestro.cli.session.MaestroSessionManager.newSession(MaestroSessionManager.kt:101)
	at maestro.cli.session.MaestroSessionManager.newSession$default(MaestroSessionManager.kt:54)
	at maestro.cli.command.TestCommand$handleSessions$1$1$results$1$1.invokeSuspend(TestCommand.kt:258)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
	at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:585)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:802)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:706)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:693)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/[0:0:0:0:0:0:0:1]:7078
Caused by: java.net.ConnectException: Connection refused
	at java.base/sun.nio.ch.Net.pollConnect(Native Method)
	at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:672)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:1589)

Even though cachedDeviceConfig is lazy which means it's synchronized, it seems that the issue happens because all the calls come from on same thread.

private val cachedDeviceInfo by lazy {
fetchDeviceInfo()
}

I added a semaphore in AndroidDevice#runDeviceCall to ensure it can only be called once at a time.

private fun <T> runDeviceCall(call: () -> T): T {
return try {
call()

This seems to have fixed the issue. From now on, calls to runDeviceCall will not happen synchronously.

Testing

  • I removed the workaround by deleting the Thread.sleep and rebuilding
  • Launched 2 emulators
  • Ran maestro test --shards 2 flows/ with a folder with around 7 successful flows
  • I had the issue way more than half the time
  • Detected the origin of the issue and added the semaphore
  • Couldn't get the issue again after running multiple times

Issues fixed

Linked #1867
Fixes #1853

@tokou tokou changed the title Fix deadlock when running shards on Android (#1853) Fix issue when running shards on Android (#1853) Sep 1, 2024
@bartekpacia
Copy link
Contributor

Hey @tokou, it seems that this is never starting the test on CI. Any idea?

@tokou
Copy link
Contributor Author

tokou commented Sep 3, 2024

@bartekpacia No maybe just retrigger it?

@bartekpacia
Copy link
Contributor

I did retrigger, unfortunately the result is the same

@tokou tokou force-pushed the bug-android-deadlock branch from f32c913 to 52126d0 Compare September 4, 2024 19:41
@tokou
Copy link
Contributor Author

tokou commented Sep 4, 2024

@bartekpacia I need some help here. I couldn't repro the issue and don't have access to more details from the CI. So I don't really know what to do next.

@bartekpacia
Copy link
Contributor

Hey, unfortunately I don't have any ideas too (aside from capturing as much log output with adb logcat into some file and then exposing that file as a workflow artifact, and trying to go from there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to run parallel/sharded tests on Android
2 participants