Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fold node metadata into new node storage #50741

Merged

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jan 8, 2020

Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source.

@ywelsch ywelsch added >non-issue :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jan 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Cluster Coordination)

@ywelsch ywelsch added the WIP label Jan 8, 2020
@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 8, 2020

@elasticmachine run elasticsearch-ci/2 (:x-pack:test:idp-fixture:composeUp infra failure)

@ywelsch ywelsch removed the WIP label Jan 8, 2020
@ywelsch ywelsch requested a review from DaveCTurner January 8, 2020 15:53
@ywelsch
Copy link
Contributor Author

ywelsch commented Jan 8, 2020

@DaveCTurner This is ready for review :-)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some slightly wonky logic in PersistedClusterStateService#nodeMetaData but otherwise this looks fine. Bit of a pain we have to preserve the old-style format for downgrade protection - maybe it's ok to remove that in the future? v9? v10?

} else {
assert nodeIds.equals(Collections.singleton(metaData.nodeId())) : nodeIds + " doesn't match " + metaData;
metaData = metaData.upgradeToCurrentVersion();
// load legacy metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have to keep the node metadata around to prevent invalid downgrades, we may as well validate that it's consistent with the new-format metadata whether metaData is set or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean reading the node metadata even though it might not be required.
I would prefer to keep as is (but can change it if you feel strongly about this), as I would like for the Lucene index to be the only authoritative source. If for example we were to have a tool that changes the node id, it would have to update both Lucene and the old-style node meta here, and doing so atomically would be difficult.

@ywelsch ywelsch requested a review from DaveCTurner January 8, 2020 17:16
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ywelsch ywelsch merged commit c8bfe3d into elastic:reduce-metadata-writes-master Jan 8, 2020
ywelsch added a commit that referenced this pull request Jan 13, 2020
Today we split the on-disk cluster metadata across many files: one file for the metadata of each
index, plus one file for the global metadata and another for the manifest. Most metadata updates
only touch a few of these files, but some must write them all. If a node holds a large number of
indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election
from succeeding.

This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version
of the following PRs that were targeting a feature branch:


* Introduce Lucene-based metadata persistence (#48733)

This commit introduces `LucenePersistedState` which master-eligible nodes
can use to persist the cluster metadata in a Lucene index rather than in
many separate files.

Relates #48701

* Remove per-index metadata without assigned shards (#49234)

Today on master-eligible nodes we maintain per-index metadata files for every
index. However, we also keep this metadata in the `LucenePersistedState`, and
only use the per-index metadata files for importing dangling indices. However
there is no point in importing a dangling index without any shard data, so we
do not need to maintain these extra files any more.

This commit removes per-index metadata files from nodes which do not hold any
shards of those indices.

Relates #48701

* Use Lucene exclusively for metadata storage (#50144)

This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds
an interoperability layer for upgrades from prior versions.

This commit disables a number of tests related to dangling indices and command-line tools.
Those will be addressed in follow-ups.

Relates #48701

* Add command-line tool support for Lucene-based metadata storage (#50179)

Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard
commands) for the Lucene-based metadata storage.

Relates #48701

* Use single directory for metadata (#50639)

Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed
though, and introduces an additional unnecessary cognitive burden to the users.

Co-Authored-By: David Turner <david.turner@elastic.co>

* Add async dangling indices support (#50642)

Adds support for writing out dangling indices in an asynchronous way. Also provides an option to
avoid writing out dangling indices at all.

Relates #48701

* Fold node metadata into new node storage (#50741)

Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source.

* Write CS asynchronously on data-only nodes (#50782)

Writes cluster states out asynchronously on data-only nodes. The main reason for writing out
the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a
bit of bootstrap validation and so that the shard recovery tools work.
Cluster states that are written asynchronously have their voting configuration adapted to a non
existing configuration so that these nodes cannot mistakenly become master even if their node
role is changed back and forth.

Relates #48701

* Remove persistent cluster settings tool (#50694)

Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on
disk cluster state in case where it contains incompatible settings that prevent the cluster from
forming.

Relates #48701

* Make cluster state writer resilient to disk issues (#50805)

Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701

* Omit writing global metadata if no change (#50901)

Uses the same optimization for the new cluster state storage layer as the old one, writing global
metadata only when changed. Avoids writing out the global metadata if none of the persistent
fields changed. Speeds up server:integTest by ~10%.

Relates #48701

* DanglingIndicesIT should ensure node removed first (#50896)

These tests occasionally failed because the deletion was submitted before the
restarting node was removed from the cluster, causing the deletion not to be
fully acked. This commit fixes this by checking the restarting node has been
removed from the cluster.

Co-authored-by: David Turner <david.turner@elastic.co>
ywelsch added a commit to ywelsch/elasticsearch that referenced this pull request Jan 13, 2020
Today we split the on-disk cluster metadata across many files: one file for the metadata of each
index, plus one file for the global metadata and another for the manifest. Most metadata updates
only touch a few of these files, but some must write them all. If a node holds a large number of
indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election
from succeeding.

This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version
of the following PRs that were targeting a feature branch:

* Introduce Lucene-based metadata persistence (elastic#48733)

This commit introduces `LucenePersistedState` which master-eligible nodes
can use to persist the cluster metadata in a Lucene index rather than in
many separate files.

Relates elastic#48701

* Remove per-index metadata without assigned shards (elastic#49234)

Today on master-eligible nodes we maintain per-index metadata files for every
index. However, we also keep this metadata in the `LucenePersistedState`, and
only use the per-index metadata files for importing dangling indices. However
there is no point in importing a dangling index without any shard data, so we
do not need to maintain these extra files any more.

This commit removes per-index metadata files from nodes which do not hold any
shards of those indices.

Relates elastic#48701

* Use Lucene exclusively for metadata storage (elastic#50144)

This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds
an interoperability layer for upgrades from prior versions.

This commit disables a number of tests related to dangling indices and command-line tools.
Those will be addressed in follow-ups.

Relates elastic#48701

* Add command-line tool support for Lucene-based metadata storage (elastic#50179)

Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard
commands) for the Lucene-based metadata storage.

Relates elastic#48701

* Use single directory for metadata (elastic#50639)

Earlier PRs for elastic#48701 introduced a separate directory for the cluster state. This is not needed
though, and introduces an additional unnecessary cognitive burden to the users.

Co-Authored-By: David Turner <david.turner@elastic.co>

* Add async dangling indices support (elastic#50642)

Adds support for writing out dangling indices in an asynchronous way. Also provides an option to
avoid writing out dangling indices at all.

Relates elastic#48701

* Fold node metadata into new node storage (elastic#50741)

Moves node metadata to uses the new storage mechanism (see elastic#48701) as the authoritative source.

* Write CS asynchronously on data-only nodes (elastic#50782)

Writes cluster states out asynchronously on data-only nodes. The main reason for writing out
the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a
bit of bootstrap validation and so that the shard recovery tools work.
Cluster states that are written asynchronously have their voting configuration adapted to a non
existing configuration so that these nodes cannot mistakenly become master even if their node
role is changed back and forth.

Relates elastic#48701

* Remove persistent cluster settings tool (elastic#50694)

Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on
disk cluster state in case where it contains incompatible settings that prevent the cluster from
forming.

Relates elastic#48701

* Make cluster state writer resilient to disk issues (elastic#50805)

Adds handling to make the cluster state writer resilient to disk issues. Relates to elastic#48701

* Omit writing global metadata if no change (elastic#50901)

Uses the same optimization for the new cluster state storage layer as the old one, writing global
metadata only when changed. Avoids writing out the global metadata if none of the persistent
fields changed. Speeds up server:integTest by ~10%.

Relates elastic#48701

* DanglingIndicesIT should ensure node removed first (elastic#50896)

These tests occasionally failed because the deletion was submitted before the
restarting node was removed from the cluster, causing the deletion not to be
fully acked. This commit fixes this by checking the restarting node has been
removed from the cluster.

Co-authored-by: David Turner <david.turner@elastic.co>
ywelsch added a commit that referenced this pull request Jan 14, 2020
* Move metadata storage to Lucene (#50907)

Today we split the on-disk cluster metadata across many files: one file for the metadata of each
index, plus one file for the global metadata and another for the manifest. Most metadata updates
only touch a few of these files, but some must write them all. If a node holds a large number of
indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election
from succeeding.

This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version
of the following PRs that were targeting a feature branch:

* Introduce Lucene-based metadata persistence (#48733)

This commit introduces `LucenePersistedState` which master-eligible nodes
can use to persist the cluster metadata in a Lucene index rather than in
many separate files.

Relates #48701

* Remove per-index metadata without assigned shards (#49234)

Today on master-eligible nodes we maintain per-index metadata files for every
index. However, we also keep this metadata in the `LucenePersistedState`, and
only use the per-index metadata files for importing dangling indices. However
there is no point in importing a dangling index without any shard data, so we
do not need to maintain these extra files any more.

This commit removes per-index metadata files from nodes which do not hold any
shards of those indices.

Relates #48701

* Use Lucene exclusively for metadata storage (#50144)

This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds
an interoperability layer for upgrades from prior versions.

This commit disables a number of tests related to dangling indices and command-line tools.
Those will be addressed in follow-ups.

Relates #48701

* Add command-line tool support for Lucene-based metadata storage (#50179)

Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard
commands) for the Lucene-based metadata storage.

Relates #48701

* Use single directory for metadata (#50639)

Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed
though, and introduces an additional unnecessary cognitive burden to the users.

Co-Authored-By: David Turner <david.turner@elastic.co>

* Add async dangling indices support (#50642)

Adds support for writing out dangling indices in an asynchronous way. Also provides an option to
avoid writing out dangling indices at all.

Relates #48701

* Fold node metadata into new node storage (#50741)

Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source.

* Write CS asynchronously on data-only nodes (#50782)

Writes cluster states out asynchronously on data-only nodes. The main reason for writing out
the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a
bit of bootstrap validation and so that the shard recovery tools work.
Cluster states that are written asynchronously have their voting configuration adapted to a non
existing configuration so that these nodes cannot mistakenly become master even if their node
role is changed back and forth.

Relates #48701

* Remove persistent cluster settings tool (#50694)

Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on
disk cluster state in case where it contains incompatible settings that prevent the cluster from
forming.

Relates #48701

* Make cluster state writer resilient to disk issues (#50805)

Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701

* Omit writing global metadata if no change (#50901)

Uses the same optimization for the new cluster state storage layer as the old one, writing global
metadata only when changed. Avoids writing out the global metadata if none of the persistent
fields changed. Speeds up server:integTest by ~10%.

Relates #48701

* DanglingIndicesIT should ensure node removed first (#50896)

These tests occasionally failed because the deletion was submitted before the
restarting node was removed from the cluster, causing the deletion not to be
fully acked. This commit fixes this by checking the restarting node has been
removed from the cluster.

Co-authored-by: David Turner <david.turner@elastic.co>

* fix tests

Co-authored-by: David Turner <david.turner@elastic.co>
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
Today we split the on-disk cluster metadata across many files: one file for the metadata of each
index, plus one file for the global metadata and another for the manifest. Most metadata updates
only touch a few of these files, but some must write them all. If a node holds a large number of
indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election
from succeeding.

This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version
of the following PRs that were targeting a feature branch:


* Introduce Lucene-based metadata persistence (elastic#48733)

This commit introduces `LucenePersistedState` which master-eligible nodes
can use to persist the cluster metadata in a Lucene index rather than in
many separate files.

Relates elastic#48701

* Remove per-index metadata without assigned shards (elastic#49234)

Today on master-eligible nodes we maintain per-index metadata files for every
index. However, we also keep this metadata in the `LucenePersistedState`, and
only use the per-index metadata files for importing dangling indices. However
there is no point in importing a dangling index without any shard data, so we
do not need to maintain these extra files any more.

This commit removes per-index metadata files from nodes which do not hold any
shards of those indices.

Relates elastic#48701

* Use Lucene exclusively for metadata storage (elastic#50144)

This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds
an interoperability layer for upgrades from prior versions.

This commit disables a number of tests related to dangling indices and command-line tools.
Those will be addressed in follow-ups.

Relates elastic#48701

* Add command-line tool support for Lucene-based metadata storage (elastic#50179)

Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard
commands) for the Lucene-based metadata storage.

Relates elastic#48701

* Use single directory for metadata (elastic#50639)

Earlier PRs for elastic#48701 introduced a separate directory for the cluster state. This is not needed
though, and introduces an additional unnecessary cognitive burden to the users.

Co-Authored-By: David Turner <david.turner@elastic.co>

* Add async dangling indices support (elastic#50642)

Adds support for writing out dangling indices in an asynchronous way. Also provides an option to
avoid writing out dangling indices at all.

Relates elastic#48701

* Fold node metadata into new node storage (elastic#50741)

Moves node metadata to uses the new storage mechanism (see elastic#48701) as the authoritative source.

* Write CS asynchronously on data-only nodes (elastic#50782)

Writes cluster states out asynchronously on data-only nodes. The main reason for writing out
the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a
bit of bootstrap validation and so that the shard recovery tools work.
Cluster states that are written asynchronously have their voting configuration adapted to a non
existing configuration so that these nodes cannot mistakenly become master even if their node
role is changed back and forth.

Relates elastic#48701

* Remove persistent cluster settings tool (elastic#50694)

Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on
disk cluster state in case where it contains incompatible settings that prevent the cluster from
forming.

Relates elastic#48701

* Make cluster state writer resilient to disk issues (elastic#50805)

Adds handling to make the cluster state writer resilient to disk issues. Relates to elastic#48701

* Omit writing global metadata if no change (elastic#50901)

Uses the same optimization for the new cluster state storage layer as the old one, writing global
metadata only when changed. Avoids writing out the global metadata if none of the persistent
fields changed. Speeds up server:integTest by ~10%.

Relates elastic#48701

* DanglingIndicesIT should ensure node removed first (elastic#50896)

These tests occasionally failed because the deletion was submitted before the
restarting node was removed from the cluster, causing the deletion not to be
fully acked. This commit fixes this by checking the restarting node has been
removed from the cluster.

Co-authored-by: David Turner <david.turner@elastic.co>
mkleen added a commit to crate/crate that referenced this pull request Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >non-issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants