Fetch tiles from S3, not just from the remote #921

olsen232 · 2023-10-10T01:16:36Z

Description

A working copy checkout of a dataset backed by S3 will have to fetch from S3, not the remote - there may not even be a remote.
This adds code for fetching using S3, and adds to kart lfs+ fetch which is used internally to fetch missing tiles when needed for checkout.
Still TODO is to add more configuration for when a dataset should or should not be checked out, making it possible to work with datasets backed by S3 without checking them out.

Checklist:

Have you reviewed your own change?
Have you included test(s)?
Have you updated the changelog?

craigds · 2023-10-11T23:42:34Z

kart/lfs_commands/__init__.py

-        )
+        click.echo("Running fetch with --dry-run:")
+        if urls:
+            click.echo(f"  Found {_blobs(len(urls))} blobs to fetch from specific URLs")


Suggested change

click.echo(f" Found {_blobs(len(urls))} blobs to fetch from specific URLs")

click.echo(f" Found {_blobs(len(urls))} to fetch from specific URLs")

craigds · 2023-10-11T23:46:57Z

kart/lfs_commands/__init__.py

        if dry_run:
-            dry_run_output.append(f"{lfs_oid} ({pointer_blob.hex})")
+            if url:
+                dry_run_output.append(f"{lfs_oid} ({pointer_blob.hex})\n⮑  {url}")


what's the purpose of dry-run mode here? Are we using/parsing it anywhere?

this extra newline appears to make the output quite a bit harder to parse. Not sure if we're parsing it but probably better to put it on the same line and use a →

kart/lfs_commands/__init__.py

craigds · 2023-10-11T23:55:20Z

kart/point_cloud/metadata_util.py

@@ -197,19 +197,24 @@ def extract_pc_tile_metadata(pc_tile_path, oid_and_size=None):
    else:
        oid, size = get_hash_and_size_of_file(pc_tile_path)

+    name = Path(pc_tile_path).name
+    url = str(pc_tile_path) if str(pc_tile_path).startswith("s3://") else None


what's pc_tile_path if not a str?

If it's a Path there's probably a bug here:

>>> str(Path('s3://foo')) 's3:/foo'

If it's always a str then you can drop the str() calls here

Hmm this is not actually a bug - it accepts Path paths, or str paths, but URLs it only accepts as strings, and it behaves sensibly with any of those. But the code is looking a bit sloppy and it's not well documented.
It still accepts those three things but at least now that's documented, and I just convert them all to str() and the start of the function to make the code less crap

kart/raster/metadata_util.py

kart/s3_util.py

as requested in review #921

Fetch tiles from S3, not just from the remote

1adac79

olsen232 requested review from craigds and rcoup October 10, 2023 01:16

craigds approved these changes Oct 12, 2023

View reviewed changes

Fetch S3 tiles: address review comments

2edd89c

olsen232 merged commit 985f921 into master Oct 12, 2023
30 of 32 checks passed

olsen232 deleted the s3-fetch branch October 12, 2023 03:32

olsen232 added a commit that referenced this pull request Oct 19, 2023

kart lfs+ fetch --dry-run output on one line only

2f66e97

as requested in review #921

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch tiles from S3, not just from the remote #921

Fetch tiles from S3, not just from the remote #921

olsen232 commented Oct 10, 2023

craigds Oct 11, 2023

olsen232 Oct 19, 2023

craigds Oct 11, 2023

olsen232 Oct 19, 2023

craigds Oct 11, 2023

olsen232 Oct 12, 2023

	click.echo(f" Found {_blobs(len(urls))} blobs to fetch from specific URLs")
	click.echo(f" Found {_blobs(len(urls))} to fetch from specific URLs")

Fetch tiles from S3, not just from the remote #921

Fetch tiles from S3, not just from the remote #921

Conversation

olsen232 commented Oct 10, 2023

Description

Related links:

Checklist:

craigds Oct 11, 2023

Choose a reason for hiding this comment

olsen232 Oct 19, 2023

Choose a reason for hiding this comment

craigds Oct 11, 2023

Choose a reason for hiding this comment

olsen232 Oct 19, 2023

Choose a reason for hiding this comment

craigds Oct 11, 2023

Choose a reason for hiding this comment

olsen232 Oct 12, 2023

Choose a reason for hiding this comment