Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSE-4358: Python S3 Table support #96

Merged
merged 14 commits into from
Jan 2, 2025
Merged

BSE-4358: Python S3 Table support #96

merged 14 commits into from
Jan 2, 2025

Conversation

IsaacWarren
Copy link
Contributor

@IsaacWarren IsaacWarren commented Dec 30, 2024

Changes included in this PR

Add support for writing to/reading from iceberg s3 tables from python, BodoSQL support will be the next PR

Testing strategy

Unit tests

User facing changes

Support for io with iceberg s3 tables

Checklist

  • Pipelines passed before requesting review. To run CI you must include [run CI] in your commit message.
  • I am familiar with the Contributing Guide
  • I have installed + ran pre-commit hooks.

@IsaacWarren IsaacWarren changed the title BSE-4358L: S3 Table support BSE-4358: Python S3 Table support Dec 30, 2024
Comment on lines -116 to +115

if warehouse is None:
raise IcebergError(
"`warehouse` parameter required in connection string"
)
table_loc = gen_table_loc(catalog_type, warehouse, schema, table) # type: ignore
table_loc = ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this anymore as AFAIK since we now create a transaction to get the write location

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not used for read occasionally? Can we test with the E2E tests, cause its different catalogs that usually exhibit this behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think so but will check locally since the e2e tests are broken right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I tried to get the e2e tests running locally and couldn't even on main. Hadoop did pass which is the only one we don't have unittest coverage for so I think we're good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@@ -87,7 +87,7 @@ scipy = "*"
scikit-learn = "1.4.*"
matplotlib = "<=3.8.2"
# IO
boto3 = "*"
boto3 = ">=1.35.74"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could confine this to a testing dependency version requirement but I doubt it's an issue

@IsaacWarren IsaacWarren requested review from srilman and scott-routledge2 and removed request for srilman December 30, 2024 21:08
@IsaacWarren IsaacWarren marked this pull request as ready for review December 30, 2024 21:08
Copy link

codecov bot commented Dec 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@f410638). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #96   +/-   ##
=======================================
  Coverage        ?   77.86%           
=======================================
  Files           ?      160           
  Lines           ?    62064           
  Branches        ?     8769           
=======================================
  Hits            ?    48326           
  Misses          ?    11617           
  Partials        ?     2121           

Copy link
Contributor

@srilman srilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM @IsaacWarren

@@ -103,6 +103,11 @@ The following catalogs are supported:
- Parameter `token` or `credential` is required for authentication and should be retrieved from the REST catalog provider.
- E.g. `iceberg+rest` or `iceberg+rest://<rest-uri>?warehouse=<warehouse>&token=<token>`

- S3 Tables
- Connection string must be of the form `iceberg+arn:aws:s3tables:<region>:<account_number>:<bucket>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the example in the test, isn't it more like

iceberg+arn:aws:s3tables:<region>:<account_number>:bucket/<bucket>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thank you

Comment on lines -116 to +115

if warehouse is None:
raise IcebergError(
"`warehouse` parameter required in connection string"
)
table_loc = gen_table_loc(catalog_type, warehouse, schema, table) # type: ignore
table_loc = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not used for read occasionally? Can we test with the E2E tests, cause its different catalogs that usually exhibit this behavior

Copy link
Contributor

@scott-routledge2 scott-routledge2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @IsaacWarren !

@temp_env_override({"AWS_REGION": "us-east-2"})
def test_read_implicit_pruning(memory_leak_check):
"""
Test reading an Iceberg table from Snowflake with Bodo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Test reading an Iceberg table from Snowflake with Bodo
Test reading an Iceberg table from S3 Tables with Bodo
``` ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@IsaacWarren IsaacWarren merged commit 972ae15 into main Jan 2, 2025
48 of 49 checks passed
@IsaacWarren IsaacWarren deleted the isaac/s3tables branch January 2, 2025 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants