Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw AssertionError when no master #38432

Conversation

DaveCTurner
Copy link
Contributor

Today we throw a fatal RuntimeException if an exception occurs in
getMasterName(), and this includes the case where there is currently no
master. However, sometimes we call this method inside an assertBusy() in
order to allow for a cluster that is in the process of stabilising and electing
a master. The trouble is that assertBusy() only retries on an
AssertionError and not on a general RuntimeException, so the lack of a
master is immediately fatal.

This commit fixes the issue by asserting there is a master, triggering a retry
if there is not.

Fixes #38331.

Today we throw a fatal `RuntimeException` if an exception occurs in
`getMasterName()`, and this includes the case where there is currently no
master. However, sometimes we call this method inside an `assertBusy()` in
order to allow for a cluster that is in the process of stabilising and electing
a master. The trouble is that `assertBusy()` only retries on an
`AssertionError` and not on a general `RuntimeException`, so the lack of a
master is immediately fatal.

This commit fixes the issue by asserting there is a master, triggering a retry
if there is not.

Fixes elastic#38331
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Feb 5, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@DaveCTurner
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/packaging-sample

@DaveCTurner DaveCTurner merged commit b7ab521 into elastic:master Feb 5, 2019
@DaveCTurner DaveCTurner deleted the 2019-02-05-fix-testElectOnlyBetweenMasterNodes branch February 5, 2019 17:11
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 5, 2019
* master: (23 commits)
  Lift retention lease expiration to index shard (elastic#38380)
  Make Ccr recovery file chunk size configurable (elastic#38370)
  Prevent CCR recovery from missing documents (elastic#38237)
  re-enables awaitsfixed datemath tests (elastic#38376)
  Types removal fix FullClusterRestartIT warnings (elastic#38445)
  Make sure to reject mappings with type _doc when include_type_name is false. (elastic#38270)
  Updates the grok patterns to be consistent with logstash (elastic#27181)
  Ignore type-removal warnings in XPackRestTestHelper (elastic#38431)
  testHlrcFromXContent() should respect assertToXContentEquivalence() (elastic#38232)
  add basic REST test for geohash_grid (elastic#37996)
  Remove DiscoveryPlugin#getDiscoveryTypes (elastic#38414)
  Fix the clock resolution to millis in GetWatchResponseTests (elastic#38405)
  Throw AssertionError when no master (elastic#38432)
  `if_seq_no` and `if_primary_term` parameters aren't wired correctly in REST Client's CRUD API (elastic#38411)
  Enable CronEvalToolTest.testEnsureDateIsShownInRootLocale (elastic#38394)
  Fix failures in BulkProcessorIT#testGlobalParametersAndBulkProcessor. (elastic#38129)
  SQL: Implement CURRENT_DATE (elastic#38175)
  Mute testReadRequestsReturnLatestMappingVersion (elastic#38438)
  [ML] Report index unavailable instead of waiting for lazy node (elastic#38423)
  Update Rollup Caps to allow unknown fields (elastic#38339)
  ...
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request May 23, 2019
Today in `SpecificMasterNodesIT` we assert the name of the master and throw a
NPE if there is no master. This doesn't work within an `assertBusy()` because
the NPE triggers an immediate failure rather than the desired retry. This
commit addresses this by first asserting that the master is non-null.

Fixes elastic#38331
Relates elastic#38432
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request May 23, 2019
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.

This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.

Fixes elastic#38331
Relates elastic#38432
pull bot pushed a commit to fabriziofortino/elasticsearch that referenced this pull request May 24, 2019
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.

This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.

Fixes elastic#38331
Relates elastic#38432
DaveCTurner added a commit that referenced this pull request May 24, 2019
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.

This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.

Fixes #38331
Relates #38432
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.

This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.

Fixes elastic#38331
Relates elastic#38432
DaveCTurner added a commit that referenced this pull request Jul 17, 2019
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.

This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.

Fixes #38331
Relates #38432
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >test Issues or PRs that are addressing/adding tests v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] SpecificMasterNodesIT#testElectOnlyBetweenMasterNodes fails
5 participants