Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a fix for generating the target partition path if base and part… #213

Closed
wants to merge 3 commits into from

Conversation

patduin
Copy link
Contributor

@patduin patduin commented Feb 15, 2021

fixes #212

…ition location are not matching in the source
@patduin patduin requested a review from a team February 15, 2021 15:22
ReplicaLocationManager locationManager) {
try {
return locationManager.getPartitionLocation(sourcePartition);
} catch (CircusTrainException e) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is thrown when base location doesn't match partition location, strangely enough the actual data copy doesn't have a problem with this so this logic fixes it by generating the partition name (instead of taking the name from the source partition folder).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm actually a bit concerned this might not work correctly with how this is implemented: https://github.com/HotelsDotCom/circus-train/blob/main/circus-train-core/src/main/java/com/hotels/bdp/circustrain/core/source/HdfsSnapshotLocationManager.java#L124

I need to find out how the different copiers generate the target paths.

Copy link
Contributor

@barnharts4 barnharts4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@patduin
Copy link
Contributor Author

patduin commented Feb 16, 2021

Gonna close this this is not going to work correctly for some use cases (e.g. partition x is in bucket x and partition y is in bucket /y/z) and might lead to us not getting a notification and data being not in a different location than advertised in the Metastore. I'll update the issue with findings and rewrite it to better capture the requirements unfortunately that's a lot more work there is no quick fix imo

@patduin patduin closed this Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update of metadata during a replication fails when base path is not matched in all partitions
2 participants