-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BSE-4358: Python S3 Table support #96
Conversation
|
||
if warehouse is None: | ||
raise IcebergError( | ||
"`warehouse` parameter required in connection string" | ||
) | ||
table_loc = gen_table_loc(catalog_type, warehouse, schema, table) # type: ignore | ||
table_loc = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this anymore as AFAIK since we now create a transaction to get the write location
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not used for read occasionally? Can we test with the E2E tests, cause its different catalogs that usually exhibit this behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think so but will check locally since the e2e tests are broken right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I tried to get the e2e tests running locally and couldn't even on main. Hadoop did pass which is the only one we don't have unittest coverage for so I think we're good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
@@ -87,7 +87,7 @@ scipy = "*" | |||
scikit-learn = "1.4.*" | |||
matplotlib = "<=3.8.2" | |||
# IO | |||
boto3 = "*" | |||
boto3 = ">=1.35.74" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could confine this to a testing dependency version requirement but I doubt it's an issue
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #96 +/- ##
=======================================
Coverage ? 77.86%
=======================================
Files ? 160
Lines ? 62064
Branches ? 8769
=======================================
Hits ? 48326
Misses ? 11617
Partials ? 2121 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM @IsaacWarren
docs/docs/iceberg/read_write.md
Outdated
@@ -103,6 +103,11 @@ The following catalogs are supported: | |||
- Parameter `token` or `credential` is required for authentication and should be retrieved from the REST catalog provider. | |||
- E.g. `iceberg+rest` or `iceberg+rest://<rest-uri>?warehouse=<warehouse>&token=<token>` | |||
|
|||
- S3 Tables | |||
- Connection string must be of the form `iceberg+arn:aws:s3tables:<region>:<account_number>:<bucket>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the example in the test, isn't it more like
iceberg+arn:aws:s3tables:<region>:<account_number>:bucket/<bucket>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you
|
||
if warehouse is None: | ||
raise IcebergError( | ||
"`warehouse` parameter required in connection string" | ||
) | ||
table_loc = gen_table_loc(catalog_type, warehouse, schema, table) # type: ignore | ||
table_loc = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not used for read occasionally? Can we test with the E2E tests, cause its different catalogs that usually exhibit this behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @IsaacWarren !
bodo/tests/test_s3_tables_iceberg.py
Outdated
@temp_env_override({"AWS_REGION": "us-east-2"}) | ||
def test_read_implicit_pruning(memory_leak_check): | ||
""" | ||
Test reading an Iceberg table from Snowflake with Bodo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test reading an Iceberg table from Snowflake with Bodo | |
Test reading an Iceberg table from S3 Tables with Bodo | |
``` ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Changes included in this PR
Add support for writing to/reading from iceberg s3 tables from python, BodoSQL support will be the next PR
Testing strategy
Unit tests
User facing changes
Support for io with iceberg s3 tables
Checklist
[run CI]
in your commit message.