Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix storage options for dataset builder #3156

Merged
merged 2 commits into from
Dec 4, 2024
Merged

Conversation

chenkovsky
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the bug Something isn't working label Nov 22, 2024
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix looks reasonable, but we don't have a test for it, which worries me.

Do you know why this test doesn't catch it?

@pytest.mark.integration
def test_s3_ddb_distributed_commit(s3_bucket: str, ddb_table: str):
table_name = uuid.uuid4().hex
table_dir = f"s3+ddb://{s3_bucket}/{table_name}?ddbTableName={ddb_table}"
schema = pa.schema([pa.field("a", pa.int64())])
fragments = write_fragments(
pa.table({"a": pa.array(range(1024))}),
f"s3+ddb://{s3_bucket}/distributed_commit?ddbTableName={ddb_table}",
storage_options=CONFIG,
)
operation = lance.LanceOperation.Overwrite(schema, fragments)
ds = lance.LanceDataset.commit(table_dir, operation, storage_options=CONFIG)
assert ds.count_rows() == 1024

Would it catch this if we added an append part to that test?

@codecov-commenter
Copy link

codecov-commenter commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 78.64%. Comparing base (1d3b204) to head (1123e5e).
Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/fragment/write.rs 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3156      +/-   ##
==========================================
+ Coverage   77.95%   78.64%   +0.68%     
==========================================
  Files         242      243       +1     
  Lines       81904    82860     +956     
  Branches    81904    82860     +956     
==========================================
+ Hits        63848    65162    +1314     
- Misses      14890    14919      +29     
+ Partials     3166     2779     -387     
Flag Coverage Δ
unittests 78.64% <0.00%> (+0.68%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chenkovsky
Copy link
Contributor Author

@wjones127 could you please review it again?

Comment on lines 369 to 375
@pytest.mark.integration
def test_append(s3_bucket: str):
storage_options = copy.deepcopy(CONFIG)
table = pa.table({"a": [1, 2], "b": ["a", "b"]})
lance.fragment.LanceFragment.create(
f"s3://{s3_bucket}/test_append.lance", table, storage_options=storage_options
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this test append? There is no existing dataset at s3://{s3_bucket}/test_append.lance, is there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no existing dataset. before checking the dataset. object store will first check bucket name. without this patch, it will throw bucket not found exception.

Comment on lines 27 to 34
CONFIG = {
"allow_http": "true",
"aws_access_key_id": "ACCESSKEY",
"aws_secret_access_key": "SECRETKEY",
"aws_endpoint": "http://localhost:9000",
"dynamodb_endpoint": "http://localhost:8000",
"aws_region": "us-west-2",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't put this test in test_fragment.py. I'd like to keep all the integration tests in lance/python/python/tests/test_s3_ddb.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved.

@wjones127 wjones127 merged commit 574b7d0 into lancedb:main Dec 4, 2024
24 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants