Skip to content

Conversation

@wjones127
Copy link
Contributor

@wjones127 wjones127 commented Aug 27, 2024

This introduces ManifestNamingScheme with a V1 and V2 variant. V1 is the existing naming scheme. V2 uses a scheme optimized for object storage listing mechanisms, making looking up the latest manifest constant time.

On S3, this makes lance.dataset() take only 125 ms, regardless of how many versions of the dataset existed. Previously this time grew linearly with number of versions.

We also provide a method migrate_manifest_paths_v2. Because this method alters the manifest path scheme, and agreement on the scheme is critical for the write path, it's important is it not run while any read or write operations happen on that table. That's why this migration can't happen in the background like some other migrations.

Closes #2790

@github-actions github-actions bot added enhancement New feature or request python labels Aug 27, 2024
@wjones127
Copy link
Contributor Author

Benchmark results

Screenshot 2024-08-30 at 12 12 47 PM

Benchmark scripts

Running benchmark:

import lance
import pyarrow as pa
import pyarrow.fs as fs
import timeit

def bench_manifest_paths(use_v2, uri):
    # print("doing setup")
    # data = pa.table({'a': range(1)})
    # dataset = lance.write_dataset(data, uri, enable_v2_manifest_paths=use_v2)
    # for i in range(10_000):
    #     dataset.delete("false")
    # print("dataset now is on version: {}".format(dataset.version))
    s3, path = fs.FileSystem.from_uri(uri)
    infos = s3.get_file_info(fs.FileSelector(path + '/_versions', recursive=True))
    print("number of versions: {}".format(len(infos)))

    iters = 20
    total_time = timeit.timeit(lambda: lance.dataset(uri), number=iters)
    print(total_time / iters)


uri = "s3://lance-performance-testing/test_v2_manifests"
# uri = "test_ds"
# aws s3 cp --recursive ./test_ds s3://lance-performance-testing/test_v2_manifests
# aws s3 cp --recursive ./test_ds_v1 s3://lance-performance-testing/test_v2_manifests_v1

bench_manifest_paths(True, uri)
bench_manifest_paths(False, uri + "_v1")

Reducing number of files:

import pyarrow.fs as fs
import argparse
import boto3

uris = [
     "s3://lance-performance-testing/test_v2_manifests",
     "s3://lance-performance-testing/test_v2_manifests_v1"
]

for uri in uris:
    s3, path = fs.FileSystem.from_uri(uri)
    infos = s3.get_file_info(fs.FileSelector(path + '/_versions', recursive=True))

    parser = argparse.ArgumentParser()
    parser.add_argument('target_files', type=int)
    args = parser.parse_args()

    to_delete = len(infos) - args.target_files
    print(to_delete)

    s3_client = boto3.client('s3')
    objects = [{'Key': info.path.split("/", maxsplit=1)[1]} for info in infos[:to_delete]]
    print(objects[:10])
    s3_client.delete_objects(Bucket='lance-performance-testing', Delete={'Objects': objects})

@github-actions github-actions bot added the java label Aug 30, 2024
@wjones127 wjones127 changed the title feat: faster manifest lookup feat: constant-time manifest lookup on object stores Aug 30, 2024
@wjones127 wjones127 force-pushed the feat/fast-manifest-lookup branch from 2a551c3 to 9d23023 Compare August 30, 2024 21:46
@codecov-commenter
Copy link

codecov-commenter commented Aug 30, 2024

Codecov Report

Attention: Patch coverage is 77.98354% with 107 lines in your changes missing coverage. Please review.

Project coverage is 77.95%. Comparing base (9c42903) to head (c033028).

Files with missing lines Patch % Lines
rust/lance-table/src/io/commit.rs 71.42% 60 Missing and 8 partials ⚠️
...ust/lance-table/src/io/commit/external_manifest.rs 63.75% 16 Missing and 13 partials ⚠️
rust/lance/src/dataset.rs 94.01% 7 Missing ⚠️
java/core/lance-jni/src/blocking_dataset.rs 0.00% 1 Missing ⚠️
rust/lance-io/src/object_store.rs 90.00% 1 Missing ⚠️
rust/lance/src/index.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #2798    +/-   ##
========================================
  Coverage   77.94%   77.95%            
========================================
  Files         229      229            
  Lines       70147    70539   +392     
  Branches    70147    70539   +392     
========================================
+ Hits        54679    54987   +308     
- Misses      12393    12466    +73     
- Partials     3075     3086    +11     
Flag Coverage Δ
unittests 77.95% <77.98%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wjones127 wjones127 marked this pull request as ready for review August 30, 2024 22:46
Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks well thought out. Does this scheme work on local filesystems? For some reason I thought it did not. I ask because it is not clear from the comments that the V2 scheme should only be used on local filesystems.

let version = scheme
.parse_version(meta.location.filename().unwrap())
.unwrap();
if version > current_version {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this must be true according to our understanding of object stores yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10.manifest will be before 8.manifest, so not necessarily. It would be if we had zero-padded the numbers.

}
});

let first = valid_manifests.next().await.transpose()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the case of v2 we only look at the first result. Is there value in some kind of debug_assert that looks at the remaining asserts and ensures our understanding of object stores is correct? E.g. what happens if some new object store (e.g. r2, digital ocean, etc.) decides to use a different convention? Would we catch it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a sanity check that looks at the first 1k results to see if they are ordered. This is the page size in object store list operations. I've verified there is no latency impact on S3.

}

#[tokio::test]
async fn test_v2_manifest_path_create() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want a test creating a v2 manifest using commit instead of write?

@wjones127
Copy link
Contributor Author

Looks well thought out. Does this scheme work on local filesystems? For some reason I thought it did not. I ask because it is not clear from the comments that the V2 scheme should only be used on local filesystems.

It doesn't make much of a difference on local filesystems. Local filesystems have to use a special code branch that makes them list the entire directory every time, instead of relying upon the lexical ordering.

@wjones127 wjones127 force-pushed the feat/fast-manifest-lookup branch from b02dbe1 to 9ba0fca Compare September 4, 2024 20:52
@wjones127 wjones127 merged commit 3334116 into lance-format:main Sep 5, 2024
wjones127 added a commit to lancedb/lancedb that referenced this pull request Sep 9, 2024
The new V2 manifest path scheme makes discovering the latest version of
a table constant time on object stores, regardless of the number of
versions in the table. See benchmarks in the PR here:
lance-format/lance#2798

Closes #1583
wjones127 pushed a commit that referenced this pull request Jan 14, 2026
BREAKING CHANGE: defaults new datasets to use V2 manifest path naming
scheme. This makes these datasets unreadable for versions of Lance
library prior to v0.17.0 (released September 2024).

This default improves performance on object storage. See the original PR
(#2798) for details.

Close #5634
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
BREAKING CHANGE: defaults new datasets to use V2 manifest path naming
scheme. This makes these datasets unreadable for versions of Lance
library prior to v0.17.0 (released September 2024).

This default improves performance on object storage. See the original PR
(lance-format#2798) for details.

Close lance-format#5634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Constant-time manifest lookup on object storage

3 participants