Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for kart upgrade so it handles complicated repos #449

Merged
merged 3 commits into from
Jun 24, 2021
Merged

Conversation

olsen232
Copy link
Collaborator

Most mirrored repos are complicated - user repos, where they exist at all, are simpler.

A complicated repo might have meta-items that are non-standard eg "generated-pks.json" - these are considered "non-standard" as they don't describe the state of the data itself, or even metadata about the data such as title, description, or schema. Instead, they are extra metadata that the data-source itself maintains so that it can accurately provide the actual data / metadata. As such, they currently don't show up in diffs.

A complicated repo might have attachments, which are currently mostly hidden from users - but still need to be in the mirrors to future proof the mirrors since we can't keep changing the mirrored history, and we will eventually support them.

#448

@olsen232 olsen232 requested a review from craigds June 24, 2021 00:46
Comment on lines +163 to +165
attachment_names = [
obj.name for obj in replacing_dataset.tree if obj.type_str == "blob"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this look deeper? it only looks for blobs which are immediate children of the dataset tree. I feel like we're going to miss something here if we ever add more complex hierarchies of 'attachments'.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now these are the only attachments which exist in V2 repos, and they really only exist in repos which we created (not user repos) and the set of all V2 repos is not supposed to grow very much (or at least, not with new types of V2 repos which we haven't yet seen).

There is another type of hypothetical attachment which won't currently be preserved and which we can't really do anything about, at least not with the current structure - attachments which are not attached to any particular dataset, but are at some other path. But again, these don't exist either. So, not too motivated to make this more complicated for no real benefit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay 👍

@@ -53,8 +60,13 @@ def get_meta_item(self, name, missing_ok=True):
return next(iter(metadata_xml))["text/xml"]
return result

def attachment_items(self):
attachments = [obj for obj in self.tree if obj.type_str == "blob"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same again here)

@olsen232 olsen232 merged commit dfdb555 into master Jun 24, 2021
@olsen232 olsen232 deleted the upgrade-fix branch June 24, 2021 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants