-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Create Snapshot checkpointing on Cluster Manager #14704
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
❌ Gradle check result for 41d77a2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
assert assertShardsConsistent(this.source, this.state, this.indices, this.shards, this.clones); | ||
this.centralSnap = centralSnap; | ||
if (!centralSnap) { | ||
assert assertShardsConsistent(this.source, this.state, this.indices, this.shards, this.clones); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this assertion not valid in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you please name centralSnap
better and update PR description on what you intend to do in this PR for early feedback?
return; | ||
} | ||
try { | ||
logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why info?
ActionListener.wrap(snapshot -> addListener(snapshot, ActionListener.map(listener, Tuple::v2)), listener::onFailure) | ||
); | ||
if (isCentralSnap) { | ||
createSnapshotCentralized( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets name this better. centralized
doesn't really convey its intent clearly.
final String repositoryName = request.repository(); | ||
final String snapshotName = indexNameExpressionResolver.resolveDateMathExpression(request.snapshot()); | ||
|
||
createSnapshotValidations(request, listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createSnapshotValidations
to validate...
@Override | ||
public void clusterStateProcessed(String source, ClusterState oldState, final ClusterState newState) { | ||
try { | ||
logger.info("snapshot [{}] started", snapshot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: switch to trace?
@Override | ||
public void onResponse(Collection<Void> result) { | ||
// Final step after all shards are processed | ||
logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: switch to trace?
final String repositoryName = request.repository(); | ||
final String snapshotName = indexNameExpressionResolver.resolveDateMathExpression(request.snapshot()); | ||
validate(repositoryName, snapshotName); | ||
// TODO: create snapshot UUID in CreateSnapshotRequest and make this operation idempotent to cleanly deal with transport layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of questions:
- If this is not idempotent then your in-memory map of maintaining status of create snapshot operations is also not serving any purpose now. Is this correct?
- After you create UUIDs at REST layer, in what scenarios do you see create snapshot being invoked multiple times for same UUIDs?
This PR is stalled because it has been open for 30 days with no activity. |
Description
[Describe what this change achieves]
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.