Fix Transport Stopped Exception #48930

henningandersen · 2019-11-11T08:28:59Z

When a node shuts down, TransportService moves to stopped state and
then closes connections. If a request is done in between, an exception
was thrown that was not retried in replication actions. Now throw a
wrapped NodeClosedException exception instead, which is correctly
handled in replication action. Fixed other usages too.

Relates #42612

When a node shutdowns, `TransportService` moves to stopped state and then closes connections. If a request is done in between, an exception was thrown that was not retried in replication actions. Now throw a wrapped `NodeClosedException` exception instead, which is correctly handled in replication action. Fixed other usages too. Relates elastic#42612

elasticmachine · 2019-11-11T08:29:01Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

Turns out new exception was unnecessary. The `Transport is stopped` messages were hardcoded in a few more places, fixed those too.

original-brownbear

Looks good, just a few random comments :)

server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

original-brownbear · 2019-11-11T11:05:56Z

server/src/main/java/org/elasticsearch/action/support/replication/ReplicationOperation.java

-        final boolean nodeIsClosing =
-                cause instanceof NodeClosedException
-                        || ExceptionsHelper.isTransportStoppedForAction(cause, "internal:cluster/shard/failure");
+        final boolean nodeIsClosing = cause instanceof NodeClosedException;


Maybe use the ExceptionsHelper.unwrap(e, NodeClosedException.class) != null pattern here too to be a little more resilient to future changes (and present unexpected paths that get us here ...?)?

I thought about this too, but ended up keeping this as is, since the unwrap's are different in that unwrapCause only unwraps ElasticsearchWrapperException, whereas unwrap unwraps all causes. It could be important in case we end up unwrapping a deep exception from another host? Do you think we should try to follow your suggestion anyway?

Nah let's not do that. I def. doesn't look impossible for that exception from another node making it here (since it's not trivial to tell it the suggestion is pointless in the first place :)).

.../elasticsearch/action/support/replication/TransportReplicationActionRetryOnClosedNodeIT.java

server/src/test/java/org/elasticsearch/indices/cluster/IndicesClusterStateServiceTests.java

.../elasticsearch/action/support/replication/TransportReplicationActionRetryOnClosedNodeIT.java

original-brownbear

LGTM :)

original-brownbear · 2019-11-12T07:55:44Z

server/src/main/java/org/elasticsearch/action/support/replication/ReplicationOperation.java

-        final boolean nodeIsClosing =
-                cause instanceof NodeClosedException
-                        || ExceptionsHelper.isTransportStoppedForAction(cause, "internal:cluster/shard/failure");
+        final boolean nodeIsClosing = cause instanceof NodeClosedException;


Nah let's not do that. I def. doesn't look impossible for that exception from another node making it here (since it's not trivial to tell it the suggestion is pointless in the first place :)).

ywelsch

I like the simplification of exception handling in this PR a lot. I've left one comment, looking good o.w.
AFAICS, this exception is only bubbled up locally on a node, so we don't need to care about BWC (i.e. be able to detect the old TransportException bubbling up from other nodes).

server/src/main/java/org/elasticsearch/transport/TransportService.java

…ed_exception

Tim-Brooks

LGTM

henningandersen · 2019-11-12T21:00:45Z

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/default-distro

henningandersen · 2019-11-13T07:21:23Z

@elasticmachine run elasticsearch-ci/packaging-sample-matrix

ywelsch

LGTM

When a node shuts down, `TransportService` moves to stopped state and then closes connections. If a request is done in between, an exception was thrown that was not retried in replication actions. Now throw a wrapped `NodeClosedException` exception instead, which is correctly handled in replication action. Fixed other usages too. Relates elastic#42612

When a node shuts down, `TransportService` moves to stopped state and then closes connections. If a request is done in between, an exception was thrown that was not retried in replication actions. Now throw a wrapped `NodeClosedException` exception instead, which is correctly handled in replication action. Fixed other usages too. Relates #42612

henningandersen added >bug :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.6.0 labels Nov 11, 2019

henningandersen added 2 commits November 11, 2019 10:00

Removed LocalTransportException and fixes

62af945

Turns out new exception was unnecessary. The `Transport is stopped` messages were hardcoded in a few more places, fixed those too.

Checkstyle fixes

3641614

henningandersen requested review from Tim-Brooks and ywelsch November 11, 2019 09:58

original-brownbear reviewed Nov 11, 2019

View reviewed changes

Armin comments

4e2cc6d

original-brownbear approved these changes Nov 12, 2019

View reviewed changes

ywelsch reviewed Nov 12, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/transport/TransportService.java Outdated Show resolved Hide resolved

henningandersen added 2 commits November 12, 2019 17:55

Fix node in exception.

c6c741f

Merge remote-tracking branch 'origin/master' into fix_transport_stopp…

78f3b3c

…ed_exception

Tim-Brooks approved these changes Nov 12, 2019

View reviewed changes

ywelsch approved these changes Nov 13, 2019

View reviewed changes

henningandersen merged commit fbaf8c4 into elastic:master Nov 13, 2019

henningandersen added the backport pending label Nov 13, 2019

henningandersen mentioned this pull request Nov 13, 2019

Fix Transport Stopped Exception (#48930) #49035

Merged

henningandersen removed the backport pending label Nov 13, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Fix Transport Stopped Exception #48930

Fix Transport Stopped Exception #48930

Uh oh!

Conversation

henningandersen commented Nov 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Nov 11, 2019

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

original-brownbear Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

henningandersen Nov 11, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Tim-Brooks left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen commented Nov 12, 2019

Uh oh!

henningandersen commented Nov 13, 2019

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

henningandersen commented Nov 11, 2019 •

edited

Loading