Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions core/src/main/scala/org/apache/spark/rpc/netty/Outbox.scala
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ private[netty] class Outbox(nettyEnv: NettyRpcEnv, val address: RpcAddress) {
*/
@GuardedBy("this")
private var draining = false
private val outboxStoppedEx = new SparkException("Message is dropped because Outbox is stopped")

/**
* Send a message. If there is no active connection, cache it and launch a new connection. If
Expand All @@ -68,7 +69,7 @@ private[netty] class Outbox(nettyEnv: NettyRpcEnv, val address: RpcAddress) {
}
}
if (dropped) {
message.callback.onFailure(new SparkException("Message is dropped because Outbox is stopped"))
message.callback.onFailure(outboxStoppedEx)
} else {
drainOutbox()
}
Expand Down Expand Up @@ -160,7 +161,7 @@ private[netty] class Outbox(nettyEnv: NettyRpcEnv, val address: RpcAddress) {
}

/**
* Stop [[Inbox]] and notify the waiting messages with the cause.
* Stop [[Outbox]] and notify the waiting messages with the cause.
*/
private def handleNetworkFailure(e: Throwable): Unit = {
synchronized {
Expand Down Expand Up @@ -215,7 +216,7 @@ private[netty] class Outbox(nettyEnv: NettyRpcEnv, val address: RpcAddress) {
// update messages and it's safe to just drain the queue.
var message = messages.poll()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception creation can be done here instead of inside the loop, keeping usable stack trace

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why? What's being optimized here? It's such a rare occurrence that it's better to pay the allocation cost when it happens than waste the memory to keep the singleton around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case the exception is no longer a singleton - it is created inside handleNetworkFailure().
Network failure should be rare but when it happens, there may be non-trivial number of messages to be notified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt there will be many messages in the outbox at a time. This code mostly fixes a race during initialization of the RPC channel, later the outbox would be mostly a bypass. #9210 even bypasses the outbox altogether in a lot of cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing richer background information.

while (message != null) {
message.callback.onFailure(new SparkException("Message is dropped because Outbox is stopped"))
message.callback.onFailure(outboxStoppedEx)
message = messages.poll()
}
}
Expand Down