Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch tiles from S3, not just from the remote #921

Merged
merged 2 commits into from
Oct 12, 2023
Merged

Fetch tiles from S3, not just from the remote #921

merged 2 commits into from
Oct 12, 2023

Conversation

olsen232
Copy link
Collaborator

Description

A working copy checkout of a dataset backed by S3 will have to fetch from S3, not the remote - there may not even be a remote.
This adds code for fetching using S3, and adds to kart lfs+ fetch which is used internally to fetch missing tiles when needed for checkout.
Still TODO is to add more configuration for when a dataset should or should not be checked out, making it possible to work with datasets backed by S3 without checking them out.

Related links:

#905

Checklist:

  • Have you reviewed your own change?
  • Have you included test(s)?
  • Have you updated the changelog?

@olsen232 olsen232 requested review from craigds and rcoup October 10, 2023 01:16
)
click.echo("Running fetch with --dry-run:")
if urls:
click.echo(f" Found {_blobs(len(urls))} blobs to fetch from specific URLs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
click.echo(f" Found {_blobs(len(urls))} blobs to fetch from specific URLs")
click.echo(f" Found {_blobs(len(urls))} to fetch from specific URLs")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if dry_run:
dry_run_output.append(f"{lfs_oid} ({pointer_blob.hex})")
if url:
dry_run_output.append(f"{lfs_oid} ({pointer_blob.hex})\n⮑ {url}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of dry-run mode here? Are we using/parsing it anywhere?

this extra newline appears to make the output quite a bit harder to parse. Not sure if we're parsing it but probably better to put it on the same line and use a

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

kart/lfs_commands/__init__.py Outdated Show resolved Hide resolved
@@ -197,19 +197,24 @@ def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None):
else:
oid, size = get_hash_and_size_of_file(pc_tile_path)

name = Path(pc_tile_path).name
url = str(pc_tile_path) if str(pc_tile_path).startswith("s3://") else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's pc_tile_path if not a str?

If it's a Path there's probably a bug here:

>>> str(Path('s3://foo'))
's3:/foo'

If it's always a str then you can drop the str() calls here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is not actually a bug - it accepts Path paths, or str paths, but URLs it only accepts as strings, and it behaves sensibly with any of those. But the code is looking a bit sloppy and it's not well documented.
It still accepts those three things but at least now that's documented, and I just convert them all to str() and the start of the function to make the code less crap

kart/raster/metadata_util.py Outdated Show resolved Hide resolved
kart/s3_util.py Outdated Show resolved Hide resolved
@olsen232 olsen232 merged commit 985f921 into master Oct 12, 2023
30 of 32 checks passed
@olsen232 olsen232 deleted the s3-fetch branch October 12, 2023 03:32
olsen232 added a commit that referenced this pull request Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants