-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data stream support to CCR #61993
Conversation
This commit adds support data stream support to CCR's auto following by making the following changes: * When the auto follow coordinator iterates over the candidate indices to follow, the auto follow coordinator also checks whether the index is part of a data stream and if the name of data stream also matches with the auto follow pattern then the index will be auto followed. * When following an index, the put follow api also checks whether that index is part of a data stream and if so then also replicates the data stream definition to the local cluster. * In order for the follow index api to determine whether an index is part of a data stream, the cluster state api was modified to also fetch the data stream definition of the cluster state if only the state is queried for specific indices. When a data stream is auto followed, only new backing indices are auto followed. This is in line with how time based indices patterns are replicated today. This means that the data stream isn't copied 1 to 1 into the local cluster. The local cluster's data stream definition contains the same name, timestamp field and generation, but the list of backing indices may be different (depending on when a data stream was auto followed). Closes elastic#56259
Thanks @martijnvg (and sorry for the delay. I was offline for the last several days). I think this approach should work. I will do a detailed review when this PR becomes formal. I think we can make some optimizations for replicating data stream indices, such as not creating follow-tasks for backing indices. But let's do it a follow-up after this PR. |
Thanks @dnhatn, I will make this PR ready for a detailed review. |
Pinging @elastic/es-core-features (:Core/Features/Data streams) |
Pinging @elastic/es-distributed (:Distributed/CCR) |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martijnvg. I wonder if we should allow using data stream directly in a put-follow request instead of its backing index. We also need a better validation as an auto-follow pattern or a put-follow request with a different follower index does not work.
x-pack/plugin/ccr/qa/multi-cluster/src/test/java/org/elasticsearch/xpack/ccr/AutoFollowIT.java
Show resolved
Hide resolved
restoreService.restoreSnapshot(restoreRequest, delegatelistener); | ||
} else { | ||
String followerIndexName = request.getFollowerIndex(); | ||
BiConsumer<ClusterState, Metadata.Builder> updater = (currentState, mdBuilder) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this change directly in RestoreService
? A SnapshotInfo from CcrReposity should contain the data streams from the leader cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me see if that works. If so, then I agree that is a cleaner change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RestoreService
currently only restores the data stream metadata if a data stream needs to be restored or an entire snapshot. In the case that a specific backing index is restored the associated data stream isn't restored. The put follow only 'restores' a backing index at a time and if this logic would be included in RestoreService
then I think we always need to load and check global metadata state in order to determine whether an index is a backing index of a data stream (when #61525 is merged then some backing indices may not follow the backing index naming scheme).
I think we would need to introduce additional parameter on RestoreSnapshotRequest
to indicate that we intent to restore a backing index and therefor the associated data stream should be restored as well (or existing local data stream should be updated). Otherwise regular restore operations would always need to load the global metadata state in order to determine whether an index is a backing index of a data stream. Do you think that this is still an improvement over the current logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's keep the current implementation. We can look into the improvement once the datastream becomes mature.
A data stream is a moving target. Indices get added to it and removed from it. From that perspective I think the put follow api shouldn't follow a data stream, since what this api does is centered around an index.
I think I don't fully understand. What validation are you referring to specifically? |
+1. We should have a better error message to tell users to use auto-follow when they use put-follow to replicate data streams.
If |
A put follow that targets a data stream should return a client error.
Gotcha. This shouldn't be possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two small questions; otherwise looking good. Thanks Martijn!
if (localDataStream == null) { | ||
// The data stream and the backing indices have been created and validated in the remote cluster, | ||
// just copying the data stream is in this case safe. | ||
return new DataStream(remoteDataStream.getName(), remoteDataStream.getTimeStampField(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can't use the generation from the remoteDataStream as it will break the assertion in DataStream's constructor. For example, here we are following .ds-1 and the ds generation on the remote is 2 already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That assertion will be removed: https://github.com/elastic/elasticsearch/pull/61525/files#diff-c1a5087d37ad81d4b2339167eadc38bcee234a4f7d646b7b4b193ae665075334L60
This to support migrating write alias with indices to a data stream. It was added as sanity check
initially, because backing indices were created by Elasticsearch exclusively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
// (string sorting works because of the naming backing index naming scheme) | ||
backingIndices.sort(Comparator.comparing(Index::getName)); | ||
|
||
return new DataStream(localDataStream.getName(), localDataStream.getTimeStampField(), backingIndices, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the generation of the local datastream is higher than the remote?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generation isn't really used yet, so there isn't a real consequence when the local data stream's generation is higher than the leader's data stream.
This would happen if the local data stream would be rollover independently. This is possible, but shouldn't really happen. Because that would create an index that doesn't follow anything and will remain empty and unused. The rollover should only occur in the remote cluster.
So I think we should try to prevent data streams from being rolled over when that data stream is actually just following another data stream. What I think we can do for this is add a flag to a data stream that it is replicated and a rollover would disallow rolling over a replicated data stream. When unfollowing that data stream, the replicate flag would be unset. What do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think we should try to prevent data streams from being rolled over when that data stream is actually just following another data stream. What I think we can do for this is add a flag to a data stream that it is replicated and a rollover would disallow rolling over a replicated data stream. When unfollowing that data stream, the replicate flag would be unset. What do you think about this?
+1. This makes a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Martijn will implement this in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in a followup PR, I will add an additional field to DataStream
class that indicates that it is a replicated data stream. Then in the rollover api validation will be added that prevents rolling over a replicated data stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @martijnvg.
This commit adds support data stream support to CCR's auto following by making the following changes: * When the auto follow coordinator iterates over the candidate indices to follow, the auto follow coordinator also checks whether the index is part of a data stream and if the name of data stream also matches with the auto follow pattern then the index will be auto followed. * When following an index, the put follow api also checks whether that index is part of a data stream and if so then also replicates the data stream definition to the local cluster. * In order for the follow index api to determine whether an index is part of a data stream, the cluster state api was modified to also fetch the data stream definition of the cluster state if only the state is queried for specific indices. When a data stream is auto followed, only new backing indices are auto followed. This is in line with how time based indices patterns are replicated today. This means that the data stream isn't copied 1 to 1 into the local cluster. The local cluster's data stream definition contains the same name, timestamp field and generation, but the list of backing indices may be different (depending on when a data stream was auto followed). Closes elastic#56259
Backport of #61993 to 7.x branch. This commit adds support data stream support to CCR's auto following by making the following changes: * When the auto follow coordinator iterates over the candidate indices to follow, the auto follow coordinator also checks whether the index is part of a data stream and if the name of data stream also matches with the auto follow pattern then the index will be auto followed. * When following an index, the put follow api also checks whether that index is part of a data stream and if so then also replicates the data stream definition to the local cluster. * In order for the follow index api to determine whether an index is part of a data stream, the cluster state api was modified to also fetch the data stream definition of the cluster state if only the state is queried for specific indices. When a data stream is auto followed, only new backing indices are auto followed. This is in line with how time based indices patterns are replicated today. This means that the data stream isn't copied 1 to 1 into the local cluster. The local cluster's data stream definition contains the same name, timestamp field and generation, but the list of backing indices may be different (depending on when a data stream was auto followed). Closes #56259
When a data stream is being auto followed then a rollover in a local cluster can break auto following, if the local cluster performs a rollover then it creates a new write index and if then later the remote cluster rolls over as well then that new write index can't be replicated, because it has the same name as in the write index in the local cluster, which was created earlier. If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams. The data stream should be rolled over in the remote cluster and that change should replicate to the local cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should perform. To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class. The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new write index is no longer a follower index. Also if the put follow api is attempting to update this data stream (for example to attempt to resume auto following) then that with fail, because the data stream is no longer a replicated data stream. Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the alias doesn't have a write index. The added replicated field in the DataStream class and added validation achieve the same kind of protection, but in a more robust way. A followup from #61993.
Backporting elastic#64710 to the 7.x branch. When a data stream is being auto followed then a rollover in a local cluster can break auto following, if the local cluster performs a rollover then it creates a new write index and if then later the remote cluster rolls over as well then that new write index can't be replicated, because it has the same name as in the write index in the local cluster, which was created earlier. If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams. The data stream should be rolled over in the remote cluster and that change should replicate to the local cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should perform. To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class. The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new write index is no longer a follower index. Also if the put follow api is attempting to update this data stream (for example to attempt to resume auto following) then that with fail, because the data stream is no longer a replicated data stream. Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the alias doesn't have a write index. The added replicated field in the DataStream class and added validation achieve the same kind of protection, but in a more robust way. A followup from elastic#61993.
Backporting #64710 to the 7.x branch. When a data stream is being auto followed then a rollover in a local cluster can break auto following, if the local cluster performs a rollover then it creates a new write index and if then later the remote cluster rolls over as well then that new write index can't be replicated, because it has the same name as in the write index in the local cluster, which was created earlier. If a data stream is managed by ccr, then the local cluster should not do a rollover for those data streams. The data stream should be rolled over in the remote cluster and that change should replicate to the local cluster. Performing a rollover in the local cluster is an operation that the data stream support in ccr should perform. To protect against rolling over a replicated data stream, this PR adds a replicate field to DataStream class. The rollover api will fail with an error in case a data stream is being rolled over and the targeted data stream is a replicated data stream. When the put follow api creates a data stream in the local cluster then the replicate flag is set to true. There should be a way to turn a replicated data stream into a regular data stream when for example during disaster recovery. The newly added api in this pr (promote data stream api) is doing that. After a replicated data stream is promoted to a regular data stream then the local data stream can be rolled over, so that the new write index is no longer a follower index. Also if the put follow api is attempting to update this data stream (for example to attempt to resume auto following) then that with fail, because the data stream is no longer a replicated data stream. Today with time based indices behind an alias, the is_write_index property isn't replicated from remote cluster to the local cluster, so when attempting to rollover the alias in the local cluster the rollover fails, because the alias doesn't have a write index. The added replicated field in the DataStream class and added validation achieve the same kind of protection, but in a more robust way. A followup from #61993
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of a data stream while renaming the data stream. For example, assume a user has an auto-follow request that looks like this: ``` PUT /_ccr/auto_follow/my-auto-follow-pattern { "remote_cluster" : "other-cluster", "leader_index_patterns" : ["logs-*"], "follow_index_pattern" : "{{leader_index}}_copy" } ``` And then the data stream `logs-mysql-error` was created, creating the backing index `.ds-logs-mysql-error-2022-07-29-000001`. Prior to this commit, replicating this data stream means that the backing index would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not renamed for a data stream during following. After this commit, there are a couple of changes: First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes `logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works, this means we need to support renaming a data stream for a regular "create follower" request, so a new parameter has been added: `data_stream_name`. It works like this: ``` PUT /mynewindex/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": "myotherindex", "data_stream_name": "new_ds" } ``` Second, the backing index for a data stream must be renamed in a way that does not break the parsing of a data stream backing pattern, whereas previously the index `.ds-logs-mysql-error-2022-07-29-000001` would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the rollover digit), after this commit it will be renamed to `.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like: | Leader Cluster | Follower Cluster | |--------------|-----------| | `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) | | `.ds-logs-mysql-error-2022-07-29-000001` | `.ds-logs-mysql-error_copy-2022-07-29-000001` | Which internally means the auto-follow request turned into the create follower request of: ``` PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": ".ds-logs-mysql-error-2022-07-29-000001", "data_stream_name": "logs-mysql-error_copy" } ``` Relates to elastic#84940 (cherry-picked the commit for a test) Relates to elastic#61993 (where data stream support was first introduced for CCR) Resolves elastic#81751
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of a data stream while renaming the data stream. For example, assume a user has an auto-follow request that looks like this: ``` PUT /_ccr/auto_follow/my-auto-follow-pattern { "remote_cluster" : "other-cluster", "leader_index_patterns" : ["logs-*"], "follow_index_pattern" : "{{leader_index}}_copy" } ``` And then the data stream `logs-mysql-error` was created, creating the backing index `.ds-logs-mysql-error-2022-07-29-000001`. Prior to this commit, replicating this data stream means that the backing index would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not renamed for a data stream during following. After this commit, there are a couple of changes: First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes `logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works, this means we need to support renaming a data stream for a regular "create follower" request, so a new parameter has been added: `data_stream_name`. It works like this: ``` PUT /mynewindex/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": "myotherindex", "data_stream_name": "new_ds" } ``` Second, the backing index for a data stream must be renamed in a way that does not break the parsing of a data stream backing pattern, whereas previously the index `.ds-logs-mysql-error-2022-07-29-000001` would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the rollover digit), after this commit it will be renamed to `.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like: | Leader Cluster | Follower Cluster | |--------------|-----------| | `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) | | `.ds-logs-mysql-error-2022-07-29-000001` | `.ds-logs-mysql-error_copy-2022-07-29-000001` | Which internally means the auto-follow request turned into the create follower request of: ``` PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": ".ds-logs-mysql-error-2022-07-29-000001", "data_stream_name": "logs-mysql-error_copy" } ``` Relates to #84940 (cherry-picked the commit for a test) Relates to #61993 (where data stream support was first introduced for CCR) Resolves #81751
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of a data stream while renaming the data stream. For example, assume a user has an auto-follow request that looks like this: ``` PUT /_ccr/auto_follow/my-auto-follow-pattern { "remote_cluster" : "other-cluster", "leader_index_patterns" : ["logs-*"], "follow_index_pattern" : "{{leader_index}}_copy" } ``` And then the data stream `logs-mysql-error` was created, creating the backing index `.ds-logs-mysql-error-2022-07-29-000001`. Prior to this commit, replicating this data stream means that the backing index would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not renamed for a data stream during following. After this commit, there are a couple of changes: First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes `logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works, this means we need to support renaming a data stream for a regular "create follower" request, so a new parameter has been added: `data_stream_name`. It works like this: ``` PUT /mynewindex/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": "myotherindex", "data_stream_name": "new_ds" } ``` Second, the backing index for a data stream must be renamed in a way that does not break the parsing of a data stream backing pattern, whereas previously the index `.ds-logs-mysql-error-2022-07-29-000001` would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the rollover digit), after this commit it will be renamed to `.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like: | Leader Cluster | Follower Cluster | |--------------|-----------| | `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) | | `.ds-logs-mysql-error-2022-07-29-000001` | `.ds-logs-mysql-error_copy-2022-07-29-000001` | Which internally means the auto-follow request turned into the create follower request of: ``` PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": ".ds-logs-mysql-error-2022-07-29-000001", "data_stream_name": "logs-mysql-error_copy" } ``` Relates to elastic#84940 (cherry-picked the commit for a test) Relates to elastic#61993 (where data stream support was first introduced for CCR) Resolves elastic#81751 (cherry picked from commit 3420be0)
This commit fixes the situation where a user wants to use CCR to replicate indices that are part of a data stream while renaming the data stream. For example, assume a user has an auto-follow request that looks like this: ``` PUT /_ccr/auto_follow/my-auto-follow-pattern { "remote_cluster" : "other-cluster", "leader_index_patterns" : ["logs-*"], "follow_index_pattern" : "{{leader_index}}_copy" } ``` And then the data stream `logs-mysql-error` was created, creating the backing index `.ds-logs-mysql-error-2022-07-29-000001`. Prior to this commit, replicating this data stream means that the backing index would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` and the data stream would *not* be renamed. This caused a check to trip in `TransportPutLifecycleAction` asserting that a backing index was not renamed for a data stream during following. After this commit, there are a couple of changes: First, the data stream will also be renamed. This means that the `logs-mysql-error` becomes `logs-mysql-error_copy` when created on the follower cluster. Because of the way that CCR works, this means we need to support renaming a data stream for a regular "create follower" request, so a new parameter has been added: `data_stream_name`. It works like this: ``` PUT /mynewindex/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": "myotherindex", "data_stream_name": "new_ds" } ``` Second, the backing index for a data stream must be renamed in a way that does not break the parsing of a data stream backing pattern, whereas previously the index `.ds-logs-mysql-error-2022-07-29-000001` would be renamed to `.ds-logs-mysql-error-2022-07-29-000001_copy` (an illegal name since it doesn't end with the rollover digit), after this commit it will be renamed to `.ds-logs-mysql-error_copy-2022-07-29-000001` to match the renamed data stream. This means that for the given `follow_index_pattern` of `{{leader_index}}_copy` the index changes look like: | Leader Cluster | Follower Cluster | |--------------|-----------| | `logs-mysql-error` (data stream) | `logs-mysql-error_copy` (data stream) | | `.ds-logs-mysql-error-2022-07-29-000001` | `.ds-logs-mysql-error_copy-2022-07-29-000001` | Which internally means the auto-follow request turned into the create follower request of: ``` PUT /.ds-logs-mysql-error_copy-2022-07-29-000001/_ccr/follow { "remote_cluster": "other-cluster", "leader_index": ".ds-logs-mysql-error-2022-07-29-000001", "data_stream_name": "logs-mysql-error_copy" } ``` Relates to #84940 (cherry-picked the commit for a test) Relates to #61993 (where data stream support was first introduced for CCR) Resolves #81751
This is a draft PR, so that the overall approach can be validated before adding more tests.
This commit adds support data stream support to CCR's auto following by making the following changes:
the auto follow coordinator also checks whether the index is part of a data stream and
if the name of data stream also matches with the auto follow pattern then the index
will be auto followed.
of a data stream and if so then also replicates the data stream definition to the
local cluster.
stream, the cluster state api was modified to also fetch the data stream definition
of the cluster state if only the state is queried for specific indices.
When a data stream is auto followed, only new backing indices are auto followed.
This is in line with how time based indices patterns are replicated today. This
means that the data stream isn't copied 1 to 1 into the local cluster. The local
cluster's data stream definition contains the same name, timestamp field and
generation, but the list of backing indices may be different (depending on when
a data stream was auto followed).
Closes #56259