-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(1-1-restore): validates if source and target clusters nodes are equal #4230
base: 1-1-restore-feature-branch
Are you sure you want to change the base?
feat(1-1-restore): validates if source and target clusters nodes are equal #4230
Conversation
37409c0
to
7e9f877
Compare
022de10
to
d7ad144
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation process looks way cleaner now:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments.
One is about changing the logic - to iterate over target nodes instead of locations. This is the most important for me.
pkg/service/one2onerestore/model.go
Outdated
@@ -28,10 +28,17 @@ type nodeMapping struct { | |||
|
|||
type node struct { | |||
DC string `json:"dc"` | |||
Rack string `json:"rack"` | |||
Rack string `json:"rack_id"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a rack name not ID.
Example: https://github.com/scylladb/scylladb/blob/master/conf/cassandra-rackdc.properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now it shows that this PR is actually changing it from rack_id
to rack
(instead of not touching it at all).
I guess that's because of having multiple PRs rebased on one another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only shows changes in pkg/command/one2onerestore
, because i've renamed rack_id
to rack
in the scope of this PR
pkg/service/one2onerestore/worker.go
Outdated
logger log.Logger | ||
} | ||
|
||
// getManifestsAndHosts checks that each host in target cluster should have an access to at least one target location and fetches manifest info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is a bit confusing.
Each node must have the access to the location where its SSTables are stored.
1-1 restore provides mapping between source node and the target node.
Location defines the DC. If the DC is empty then it means (or should mean) that all DCs are here.
See https://manager.docs.scylladb.com/stable/backup#meta
The method must check that all nodes that are expected to restore particular DC have an access to the location which keeps SSTables of this DC.
The nodes-mapping for 1-1 restore defines which node is going to restore data from which DC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that we should require user to specify the full location <-> DC mapping with the --location
flag and not calculate it on our? I was under different impression when writing the comment about ignoring location DC (resolved).
Calculating it on our own is a little bit more complicated, but it's more convenient for the user (similar to providing nodes mapping, which can also be calculated by SM). At the end of the day, somebody needs to calculate it and SM has all necessary information.
I think that making 1-1-restore as easy to use is important in general, but it does not need to happen in the first iteration targeting mainly Scylla Cloud as a user.
On the other hand, the code matching locations to DCs on SM side is already a part of this PR, so does it make sense to delete it? What do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 1-1 restore, includes nodes-mapping
. It explicitly tells which target node is expected to download what set of SSTables. The goal is to find which location keeps SSTables of the source-node and ensure that the target node can access this location.
The current comment says: each host in target cluster should have an access to at least one target location
This is not true, we must ensure that each target host have an access to the exact location that stores SSTables for the mapped source node.
Do you mean that we should require user to specify the full location <-> DC mapping with the --location flag and not calculate it on our? I was under different impression when writing the #4230 (comment) about ignoring location DC (resolved).
We ARE requiring user to specifiy nodes-mapping
. Node is part of on of the DC. Scylla Manager must understand which location (from the given locations) keeps the DC that is owning the source node.
To understand the full location
<-> DC mapping
, or even rather location
<-> nodes mapping
it's enough to check meta
directory https://manager.docs.scylladb.com/stable/backup#meta from all bacup locations.
I think that making 1-1-restore as easy to use is important in general, but it does not need to happen in the first iteration targeting mainly Scylla Cloud as a user.
On the other hand, the code matching locations to DCs on SM side is already a part of this PR, so does it make sense to delete it? What do you guys think?
We don't care about the ease of use for end user much. There is still automation required to deliver cloned cluster. This automation creates the output which must involve the mapping -> .
We care about the compatibility with the automation tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this method is just expected to get list of hosts from the target cluster and the list of manifests from the snapshot.
It adds simple validation to compare if the number of target hosts is equal to number of manifests.
It's fine.
But still... I think the comment is misleading.
getManifestsAndHosts -> getAllSnapshotManifestsAndAllTargetHosts would be more meaningful for me.
pkg/service/one2onerestore/worker.go
Outdated
if len(allManifests) != nodesWithAccessCount || len(allManifests) != len(nodeStatus) { | ||
return nil, nil, fmt.Errorf("manifest count (%d) != target nodes (%d)", len(allManifests), len(nodesCountSet.List())) | ||
} | ||
|
||
return allManifests, nodesToHosts(nodeStatus), nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic will be completely different when you iterate over nodes instead of locations.
After ensuring that node has the access to the location, you just download corresponding manifest to check details like numer of shards and token ring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After ensuring that node has the access to the location, you just download corresponding manifest to check details like numer of shards and token ring.
That's exactly what is happening inside validateCluster method.
A few word in general about my implementation and what was the reasoning behind it:
In a nutshell we have a source cluster (backup) which nodes are represented by manifests file in the backup location(s) and a target cluster which nodes are actual live nodes in the cluster we want to restore data.
- Get source cluster nodes and target cluster nodes. (getManifestsAndHosts)
- Collect information needed to compare sources and target cluster nodes. (collectNodeValidationInfo) (here I iterate over nodes)
- Compare them using node mapping as rule how to match source cluster node and target cluster node. (checkOne2OneRestoreCompatibility)
If validation has passed successfully, then I can use nodes info from step 1 farther in the code, as I know - it's valid for 1-1-restore and each node has exact match.
Keeping the logic this way give us ability to keep validation logic in one place, without spreading it all across other parts of the code.
This logic will be completely different when you iterate over nodes instead of locations.
Here is how I see this logic
- find node dc by looking at nodes mappings
- find corresponding location by checking location.DC with DC from step 1
- Download manifest content. Two steps actually list dir and then download, because manifest path contains taskID which we don't know (or do we?)
- collect additional node info (token ring and etc)
- compare nodes info with manifest content.
Without getting into the details, the main difference between two solutions, is that in first we collect all the info and then do the validation, in second, we validate nodes one by one.
For me it's more a less the same, but if you prefer second over first, let me know and i'm gonna change this pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two steps actually list dir and then download, because manifest path contains taskID which we don't know (or do we?)
We don't know it from the start, but it can be established just once at the beginning (by listing manifests) and then reused, but we can't escape listing all manifest dirs as it's needed to verify that manifest count equals nodes count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I like the idea of creating per node worker as it provides more concurrent design.
In the context of the validation, such worker could be initialized with all basic information about backup and target nodes like:
- target and backup host ID
- target IP
- target and backup cluster ID
- DC
- backup task ID
- snapshot tag
- backup location
If we assume that we get all location <-> DC mapping by user, the only missing piece is the task ID, but it can be established just once by the main workflow and passed to worker initialization (main workflow would also need to validate manifest count, but it wouldn't need to read any manifest).
But even if we don't get such mapping, we can first calculate it, and then move to the per node worker.
The only concern is that it might be more difficult to support this approach if we would like to make nodes mapping optional for the sake of user's convenience. Then, we couldn't provide backup DC and host ID from the start, but they would need to be filled in some other way.
With this design, there are two benefits:
- easier parallelization of the validation workflow - all steps like reading the manifest, fetching target info, validating, etc. could sequential in the context of a single node, but we could just use
parallel.Run
on all of them without any additional effort. The current implementation also do the time consuming things in parallel, but it requires more code handling the parallelism. - easier body functions - there would be no need of handling slices or maps of some validation info and thinking whether we can be sure that they are at sync at given point. Validation functions would work just on the per node context with a single manifest and a single target node info.
EDIT: Note that such design would probably fit very well into the later restore data stages as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I also agree that the current implementation does it's job, so this is more about personal preferences, so if the things from the previous comment didn't convince you, I'm not against the current state of implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VAveryanov8 are you checking somewhere in the code if the target node has the access to the expected location ? I mean the location that keeps SSTables that are expected to be downloaded to this node ?
It should be part of validation stage.
Otherwise, it may happen that you go to the copy stage, but one of the target nodes is not able to get its SSTables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I like the idea of creating per node worker as it provides more concurrent design.
In the context of the validation, such worker could be initialized with all basic information about backup and target nodes like:
target and backup host ID
target IP
target and backup cluster ID
DC
backup task ID
snapshot tag
backup location
If we assume that we get all location <-> DC mapping by user, the only missing piece is the task ID, but it can be established just once by the main workflow and passed to worker initialization (main workflow would also need to validate manifest count, but it wouldn't need to read any manifest).
But even if we don't get such mapping, we can first calculate it, and then move to the per node worker.
The only concern is that it might be more difficult to support this approach if we would like to make nodes mapping optional for the sake of user's convenience. Then, we couldn't provide backup DC and host ID from the start, but they would need to be filled in some other way.
With this design, there are two benefits:
easier parallelization of the validation workflow - all steps like reading the manifest, fetching target info, validating, etc. could sequential in the context of a single node, but we could just use parallel.Run on all of them without any additional effort. The current implementation also do the time consuming things in parallel, but it requires more code handling the parallelism.
easier body functions - there would be no need of handling slices or maps of some validation info and thinking whether we can be sure that they are at sync at given point. Validation functions would work just on the per node context with a single manifest and a single target node info.
EDIT: Note that such design would probably fit very well into the later restore data stages as well.
That's the idea of 1-1 restore to do it per node in parallel. After all SSTables are copied to the corresponding node, then it's a matter of calling to refresh per all tables that are included in the backup location in this snapshot for the current node.
It should/must be done in parallel per node.
@VAveryanov8 @Michal-Leszczynski let's sync on monday. It may be faster way of discussing it.
I put in the design doc -> There must be one worker / goroutine per destination node copying and refreshing SSTables from the corresponding node.
func TestMapTargetHostToSource(t *testing.T) { | ||
testCases := []struct { | ||
name string | ||
|
||
nodeMappings []nodeMapping | ||
targetHosts []Host | ||
expected map[string]Host | ||
expectedErr bool | ||
}{ | ||
{ | ||
name: "All hosts have mappings", | ||
nodeMappings: []nodeMapping{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are OK.
But you should verify it bit broader way.
Please keep validateClusters
as the method you test. Most likely it will require to use integration tests instead of this ones.
This introduces new cli command for fast vnode restore procedure. Fixes: #4200
This is the result of running `make docs`
This renames fastrestore to 1-1 restore and also makes it a subcommand of restore cmd.
…s uuid This removes 1-1-restore options that are not needed at the moment (we can added them later if nedded). Also changed source-cluster-id type to uuid.UUID
…equal This adds a validation stage for 1-1-restore. The logic is as follows: - Collect node information for the source cluster from backup manifests - Collect node information for the target cluster from the Scylla API - Apply node mappings to the source cluster nodes - Compare each source node with its corresponding target node Fixes: #4201
This changes following parts of validation process: - Moves path.Join("backup", string(MetaDirKind)) to `backupspec` pkg - Moves getManifestContext to worker_manifest - Adds SourceClusterID validation to getManifestInfo - Simplifies how nodes info is collected by leveraging node mappings (maps manifests to nodes by host id) - Replaces LocationInfo struct with manifests and hosts - Sorts node tokens
This replaces CPUCount with ShardCount for clusters comparision. Fixes typo in function name.
This adds integration tests for the 1-1-restore validation stage.
This adds 1-1-restore integration tests to github actions.
45298e2
to
9139cf1
Compare
Somehow lost a few files during rebase :D
These are the only concerns I have right now:
Other than that, it's fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This introduces following changes: 1. Uses location.DC for deciding which node to use for initial location access 2. Changes validation stage to check nodes one by one in parallel (instead of collection slices and then comparing them) 3. Updates unit and integration tests
5983f40
to
647111d
Compare
@Michal-Leszczynski @karol-kokoszka I've made changes to validation stage according to your comments, so please take a look one more time - it should be easier now :) additionally updated unit and integrational tests. |
This adds a validation stage for 1-1-restore. The logic is as follows:
Fixes: #4201
Please make sure that: