-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch tiles from S3, not just from the remote #921
Conversation
) | ||
click.echo("Running fetch with --dry-run:") | ||
if urls: | ||
click.echo(f" Found {_blobs(len(urls))} blobs to fetch from specific URLs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
click.echo(f" Found {_blobs(len(urls))} blobs to fetch from specific URLs") | |
click.echo(f" Found {_blobs(len(urls))} to fetch from specific URLs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if dry_run: | ||
dry_run_output.append(f"{lfs_oid} ({pointer_blob.hex})") | ||
if url: | ||
dry_run_output.append(f"{lfs_oid} ({pointer_blob.hex})\n⮑ {url}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of dry-run mode here? Are we using/parsing it anywhere?
this extra newline appears to make the output quite a bit harder to parse. Not sure if we're parsing it but probably better to put it on the same line and use a →
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
kart/point_cloud/metadata_util.py
Outdated
@@ -197,19 +197,24 @@ def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None): | |||
else: | |||
oid, size = get_hash_and_size_of_file(pc_tile_path) | |||
|
|||
name = Path(pc_tile_path).name | |||
url = str(pc_tile_path) if str(pc_tile_path).startswith("s3://") else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's pc_tile_path
if not a str
?
If it's a Path
there's probably a bug here:
>>> str(Path('s3://foo'))
's3:/foo'
If it's always a str
then you can drop the str()
calls here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is not actually a bug - it accepts Path paths, or str paths, but URLs it only accepts as strings, and it behaves sensibly with any of those. But the code is looking a bit sloppy and it's not well documented.
It still accepts those three things but at least now that's documented, and I just convert them all to str() and the start of the function to make the code less crap
Description
A working copy checkout of a dataset backed by S3 will have to fetch from S3, not the remote - there may not even be a remote.
This adds code for fetching using S3, and adds to
kart lfs+ fetch
which is used internally to fetch missing tiles when needed for checkout.Still TODO is to add more configuration for when a dataset should or should not be checked out, making it possible to work with datasets backed by S3 without checking them out.
Related links:
#905
Checklist: