Skip to content

Commit

Permalink
feat: harmonize and simplify storage configuration (delta-io#1052)
Browse files Browse the repository at this point in the history
# Description
Recently we moved some of our storage configuration via a property bag
upstream to the object_store crate. This allows us to simplify our
configuration handling here and make S3 configuration consistent with
azure and gcp.

I think as a follow up it would be great to migrate dynamodb_lock to
using the official SDKs as well, and then see what we still need form
the s3 storage options.

# Related Issue(s)

closes delta-io#999

# Documentation

<!---
Share links to useful documentation
--->

Co-authored-by: Will Jones <willjones127@gmail.com>
  • Loading branch information
2 people authored and chitralverma committed Mar 17, 2023
1 parent f52f58a commit 99703ab
Show file tree
Hide file tree
Showing 10 changed files with 469 additions and 799 deletions.
7 changes: 4 additions & 3 deletions python/docs/source/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ To load the current version, use the constructor:
>>> dt = DeltaTable("../rust/tests/data/delta-0.2.0")
Depending on your storage backend, you could use the ``storage_options`` parameter to provide some configuration.
Configuration is defined for specific backends - `s3 options`_, `azure options`_.
Configuration is defined for specific backends - `s3 options`_, `azure options`_, `gcs options`_.

.. code-block:: python
Expand Down Expand Up @@ -70,8 +70,9 @@ For AWS Glue catalog, use AWS environment variables to authenticate.
>>> dt.to_pyarrow_table().to_pydict()
{'id': [5, 7, 9, 5, 6, 7, 8, 9]}
.. _`s3 options`: https://github.com/delta-io/delta-rs/blob/17999d24a58fb4c98c6280b9e57842c346b4603a/rust/src/builder.rs#L423-L491
.. _`azure options`: https://github.com/delta-io/delta-rs/blob/17999d24a58fb4c98c6280b9e57842c346b4603a/rust/src/builder.rs#L524-L539
.. _`s3 options`: https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html#variants
.. _`azure options`: https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants
.. _`gcs options`: https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html#variants

Custom Storage Backends
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
10 changes: 2 additions & 8 deletions python/tests/test_fs.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ def test_read_files(s3_localstack):
@pytest.mark.s3
@pytest.mark.integration
@pytest.mark.timeout(timeout=15, method="thread")
def test_s3_authenticated_read_write(s3_localstack_creds):
def test_s3_authenticated_read_write(s3_localstack_creds, monkeypatch):
monkeypatch.setenv("AWS_DEFAULT_REGION", "us-east-1")
# Create unauthenticated handler
storage_handler = DeltaStorageHandler(
"s3://deltars/",
Expand Down Expand Up @@ -184,13 +185,6 @@ def test_roundtrip_azure_env(azurite_env_vars, sample_data: pa.Table):
def test_roundtrip_azure_direct(azurite_creds, sample_data: pa.Table):
table_path = "az://deltars/roundtrip2"

# Fails without any creds
with pytest.raises(PyDeltaTableError):
anon_storage_options = {
key: value for key, value in azurite_creds.items() if "ACCOUNT" not in key
}
write_deltalake(table_path, sample_data, storage_options=anon_storage_options)

# Can pass storage_options in directly
write_deltalake(table_path, sample_data, storage_options=azurite_creds)
dt = DeltaTable(table_path, storage_options=azurite_creds)
Expand Down
2 changes: 1 addition & 1 deletion rust/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ log = "0"
libc = ">=0.2.90, <1"
num-bigint = "0.4"
num-traits = "0.2.15"
object_store = "0.5.2"
object_store = "0.5.3"
once_cell = "1.16.0"
parking_lot = "0.12"
parquet = { version = "28", features = ["async"], optional = true }
Expand Down
Loading

0 comments on commit 99703ab

Please sign in to comment.