-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix storage options for dataset builder #3156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix looks reasonable, but we don't have a test for it, which worries me.
Do you know why this test doesn't catch it?
lance/python/python/tests/test_s3_ddb.py
Lines 230 to 243 in d79e870
@pytest.mark.integration | |
def test_s3_ddb_distributed_commit(s3_bucket: str, ddb_table: str): | |
table_name = uuid.uuid4().hex | |
table_dir = f"s3+ddb://{s3_bucket}/{table_name}?ddbTableName={ddb_table}" | |
schema = pa.schema([pa.field("a", pa.int64())]) | |
fragments = write_fragments( | |
pa.table({"a": pa.array(range(1024))}), | |
f"s3+ddb://{s3_bucket}/distributed_commit?ddbTableName={ddb_table}", | |
storage_options=CONFIG, | |
) | |
operation = lance.LanceOperation.Overwrite(schema, fragments) | |
ds = lance.LanceDataset.commit(table_dir, operation, storage_options=CONFIG) | |
assert ds.count_rows() == 1024 |
Would it catch this if we added an append part to that test?
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3156 +/- ##
==========================================
+ Coverage 77.95% 78.64% +0.68%
==========================================
Files 242 243 +1
Lines 81904 82860 +956
Branches 81904 82860 +956
==========================================
+ Hits 63848 65162 +1314
- Misses 14890 14919 +29
+ Partials 3166 2779 -387
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@wjones127 could you please review it again? |
python/python/tests/test_fragment.py
Outdated
@pytest.mark.integration | ||
def test_append(s3_bucket: str): | ||
storage_options = copy.deepcopy(CONFIG) | ||
table = pa.table({"a": [1, 2], "b": ["a", "b"]}) | ||
lance.fragment.LanceFragment.create( | ||
f"s3://{s3_bucket}/test_append.lance", table, storage_options=storage_options | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this test append? There is no existing dataset at s3://{s3_bucket}/test_append.lance
, is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no existing dataset. before checking the dataset. object store will first check bucket name. without this patch, it will throw bucket not found exception.
python/python/tests/test_fragment.py
Outdated
CONFIG = { | ||
"allow_http": "true", | ||
"aws_access_key_id": "ACCESSKEY", | ||
"aws_secret_access_key": "SECRETKEY", | ||
"aws_endpoint": "http://localhost:9000", | ||
"dynamodb_endpoint": "http://localhost:8000", | ||
"aws_region": "us-west-2", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't put this test in test_fragment.py
. I'd like to keep all the integration tests in lance/python/python/tests/test_s3_ddb.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved.
No description provided.