Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outbound flow control bugfix #61

Merged
merged 25 commits into from
Aug 22, 2019
Merged

Conversation

marcoferrer
Copy link
Owner

No description provided.

requestIter.next(),
requestIter.next(),
requestIter.next().also { requestIter.hasNext() },
requestIter.next().also { requestIter.hasNext() },
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was required after updating to coroutines 1.3.0-RC. Every call to next() must be preceded by a call to hasNext()

.consumeAsFlow()
.collect {
responseChannel.send(HelloReply.newBuilder().setMessage("Reply: ${it.name}").build())
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapTo operator on channel is now deprecated. With out this refactor, the test would fail.

@@ -74,10 +75,10 @@ class ClientStreamingBackPressureTests {

@Test
fun `Client send suspends until server invokes receive`() {
lateinit var serverRequestChannel: ReceiveChannel<HelloRequest>
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was to prevent race conditions between the serverRequestChannel var being populated and test assertions running.

@@ -332,4 +343,58 @@ class ClientCallBidiStreamingTests {
assert(responseChannel.isClosedForReceive) { "Response channel should be closed for receive" }
}

@Test
fun `High throughput call succeeds`() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was introduced as part of the discussion the was had with @chris-blacker in #59

@@ -149,7 +160,7 @@ class ClientCallBidiStreamingTests {
requestChannel.close()
}

responseChannel.map { it.message }.toList()
responseChannel.consumeAsFlow().map { it.message }.toList()
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map operator was deprecated

@@ -55,6 +62,10 @@ class ClientCallBidiStreamingTests {
@[Rule JvmField]
var grpcServerRule = GrpcServerRule().directExecutor()

@[Rule JvmField]
public val timeout = CoroutinesTimeout.seconds(30)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows us to debug coroutine deadlocks in the unit tests.

@@ -42,40 +43,51 @@ internal fun <T> CallStreamObserver<*>.applyInboundFlowControl(
}
}

private typealias MessageHandler = suspend ActorScope<*>.() -> Unit

internal fun <T> CoroutineScope.applyOutboundFlowControl(
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The outbound flow control handler has been refactored. It no longer spawns multiple jobs when applying backpressure and can properly handle superfluous invocations of the on ready handler runnable.

@marcoferrer
Copy link
Owner Author

Looks like the failing test in CI could be unrelated to the code change. Its possible it has something to do with the dependency version bump.

com.github.[secure].krotoplus.coroutines.server.ServerCallServerStreamingTests > Server responds with error when scope cancelled exceptionally FAILED
    java.lang.IllegalStateException
        Caused by: java.lang.reflect.InvocationTargetException
            Caused by: java.lang.UnsatisfiedLinkError

Tests seem consistently stable locally. @chris-blacker Could you verify your use case against this branch?
I'll work on sorting out the UnsatisfiedLinkError in the mean time.

}

val messageHandlerActor = actor<MessageHandler>(
capacity = Channel.UNLIMITED,
Copy link

@blachris blachris Aug 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A limited capacity didn't work? I am wondering if it's possible that the channel becomes a memory leak if jobs are added faster than the worker consumes them.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good catch. I must've changed it while debugging. I'll test it with the value reverted to CONFLATED.

This implementation is based off the native grpc util StreamObservers.copyWithFlowControl() which ensured each invocation of the onReadyHandler always ran. source

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using CONFLATED and have no issues.

blachris
blachris previously approved these changes Aug 5, 2019
Copy link

@blachris blachris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good solution. I tested the new flow control in my setup and the issue has been fixed.

capacity = Channel.UNLIMITED,
context = Dispatchers.Unconfined + CoroutineExceptionHandler { _, e ->
streamObserver.completeSafely(e)
targetChannel.close(e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is missing that the targetChannel is canceled in some situations. I tried canceling it here but this code was not being executed when I expected.
I have encountered that a client bidi call send hangs when the call is getting canceled in another thread because the channel is not canceled. This happened when the call is canceled while a send was pending in the call rendezvous outboundChannel. I think the rpc scope then somehow cleans up without canceling the channel, thus the send hangs.

for(handler in channel){
if(isCompleted.get()) break
handler(this)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this here reduced the problem but didn't eliminate it. I guess there must be other paths for the scope to cancel.

       catch(ex: CancellationException) {
            targetChannel.cancel(ex)
            throw ex
        }

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the scope could cancellation can also be propagated from its parent under normal normal coroutine usage. This case was covered before because executing new launch on a cancelled scope would take care. Its hard to reproduce but Im trying a few things now.

@marcoferrer
Copy link
Owner Author

In regards to the UnsatisfiedLinkError, it looks like the source of the issue is related to using the kotlinx-coroutines-debug artifact coupled with mockk. Details are outlined here. Ill temporarily comment out the dependency.

}

@Test
fun `Client cancellation cancels server rpc scope`() {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chris-blacker I think I was able to reproduce your hanging client issue with this test. It turns out that the outbound message handler wasnt bound to normal channel cancellation. Thats been fixed in the latest changes. Since the actor wasnt bound, client coroutines would hang since one of their children would never complete.

@codecov-io
Copy link

codecov-io commented Aug 12, 2019

Codecov Report

Merging #61 into master will increase coverage by 0.42%.
The diff coverage is 85.71%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #61      +/-   ##
============================================
+ Coverage     85.66%   86.09%   +0.42%     
  Complexity       19       19              
============================================
  Files            15       15              
  Lines           279      302      +23     
  Branches         42       48       +6     
============================================
+ Hits            239      260      +21     
- Misses           13       15       +2     
  Partials         27       27
Impacted Files Coverage Δ Complexity Δ
...oferrer/krotoplus/coroutines/server/ServerCalls.kt 81.7% <100%> (-0.8%) 0 <0> (ø)
...otoplus/coroutines/client/ClientBidiCallChannel.kt 100% <100%> (ø) 0 <0> (ø) ⬇️
...s/coroutines/client/ClientResponseStreamChannel.kt 85.71% <50%> (-14.29%) 9 <1> (ø)
...rcoferrer/krotoplus/coroutines/call/FlowControl.kt 83.33% <84%> (+8.33%) 0 <0> (ø) ⬇️
...utines/call/FlowControlledInboundStreamObserver.kt 79.16% <0%> (+4.16%) 0% <0%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a643a4...4375f58. Read the comment docs.

@blachris
Copy link

Thanks for the changes. I think they go in the right direction but the hang-on-bidi-call-close issue is not fully resolved for me. I will investigate and give you more information and/or a failing test.

@marcoferrer
Copy link
Owner Author

marcoferrer commented Aug 13, 2019

Using the coroutines debug probes might shed some light as to what is stuck in suspension in your use case

@blachris
Copy link

I tried but I cannot not reproduce the hang-on-bidi-call-close issue outside of one specific stress scenario. And in that scenario it only happens during the shutdown and I can work around it. You can see my attempt here but the test passes fine.
I would say that the issue may be related to my application and this branch of Kroto+ works fine. How do you feel about advancing the branch to a release version eventually?

blachris
blachris previously approved these changes Aug 21, 2019
@marcoferrer
Copy link
Owner Author

That’s good to hear. I can work on getting this merged and into a release today. There’s a few spots that need a little clean up first.

@marcoferrer marcoferrer force-pushed the outbound-flow-control-bugfix branch 2 times, most recently from fbb77bd to 9bfb977 Compare August 22, 2019 02:24
@marcoferrer marcoferrer force-pushed the outbound-flow-control-bugfix branch from 9bfb977 to 9bcd734 Compare August 22, 2019 02:34
@marcoferrer marcoferrer merged commit 0454f13 into master Aug 22, 2019
@marcoferrer marcoferrer deleted the outbound-flow-control-bugfix branch August 23, 2019 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants