-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
👌 IMPROVE: Legacy tar file archive migration performance #5275
👌 IMPROVE: Legacy tar file archive migration performance #5275
Conversation
Tar files do not allow performant single file streaming, and so the file is fully extracted to disk, before streaming to the new (zip file) archive.
Codecov Report
@@ Coverage Diff @@
## develop #5275 +/- ##
===========================================
+ Coverage 82.00% 82.01% +0.01%
===========================================
Files 533 533
Lines 38230 38243 +13
===========================================
+ Hits 31348 31362 +14
+ Misses 6882 6881 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisjsewell Thanks for the quick fix. I make the test on the tar achieve and it works great! Just a minor request from my side. And for the base_parts
change, I have to admit I am not familiar with this part, so maybe @ramirezfranciscof can have another eye on this?
aiida/tools/archive/implementations/sqlite/migrations/legacy_to_new.py
Outdated
Show resolved
Hide resolved
with get_progress_reporter()(desc='Converting repo', total=length) as progress: | ||
for subpath in path.glob('**/*'): | ||
progress.update() | ||
parts = subpath.parts | ||
parts = subpath.parts[base_parts:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have no clue what this change is for. But I think this is just because I miss the big picture of how migration work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are inside an archive, then the path is always relative to the archive (base_parts = 0
), but if we are inside a file system then the path may be absolute, and we want to get back to the relative path.
…o_new.py Co-authored-by: Jason.Yu <jusong.yeu@gmail.com>
Ok, as it works I'm gonna merge 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Thanks @chrisjsewell
Cheers! |
Tar files do not allow performant single file streaming,
and so the file is fully extracted to disk,
before streaming to the new (zip file) archive.