-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for kart upgrade so it handles complicated repos #449
Conversation
attachment_names = [ | ||
obj.name for obj in replacing_dataset.tree if obj.type_str == "blob" | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this look deeper? it only looks for blobs which are immediate children of the dataset tree. I feel like we're going to miss something here if we ever add more complex hierarchies of 'attachments'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now these are the only attachments which exist in V2 repos, and they really only exist in repos which we created (not user repos) and the set of all V2 repos is not supposed to grow very much (or at least, not with new types of V2 repos which we haven't yet seen).
There is another type of hypothetical attachment which won't currently be preserved and which we can't really do anything about, at least not with the current structure - attachments which are not attached to any particular dataset, but are at some other path. But again, these don't exist either. So, not too motivated to make this more complicated for no real benefit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay 👍
@@ -53,8 +60,13 @@ def get_meta_item(self, name, missing_ok=True): | |||
return next(iter(metadata_xml))["text/xml"] | |||
return result | |||
|
|||
def attachment_items(self): | |||
attachments = [obj for obj in self.tree if obj.type_str == "blob"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same again here)
Most mirrored repos are complicated - user repos, where they exist at all, are simpler.
A complicated repo might have meta-items that are non-standard eg "generated-pks.json" - these are considered "non-standard" as they don't describe the state of the data itself, or even metadata about the data such as title, description, or schema. Instead, they are extra metadata that the data-source itself maintains so that it can accurately provide the actual data / metadata. As such, they currently don't show up in diffs.
A complicated repo might have attachments, which are currently mostly hidden from users - but still need to be in the mirrors to future proof the mirrors since we can't keep changing the mirrored history, and we will eventually support them.
#448