-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support get_backend_for_uri_with_options in Read Path #689
Comments
Thanks for reporting this! Our current API is problematic in that we can't retrieve the storage backend once it's been created in the DeltaTable. We'll need to refactor a bit, probably changing the storage backend to use an For now, the workaround is to explicitly pass a configured PyArrow filesystem: from deltalake import DeltaTable
from pyarrow.fs import S3FileSystem
storage_options = {
"AWS_ACCESS_KEY_ID": ACCESS_KEY,
"AWS_SECRET_ACCESS_KEY":SECRET_KEY,
"AWS_ENDPOINT_URL": ENDPOINT_URL,
"AWS_REGION": 'us-east-1'
}
dt = DeltaTable("s3://warehouse/tpcds_sf1000_dlt/item/", storage_options=storage_options)
pa_fs = S3FileSystem(
access_key = storage_options["AWS_ACCESS_KEY_ID"],
secret_key = storage_options["AWS_SECRET_ACCESS_KEY"],
endpoint_override = storage_options["AWS_ENDPOINT_URL"],
region = storage_options["AWS_REGION"],
)
print(dt.dt.pyarrow_schema()) # SUCCESS
print(dt.to_pandas(filesystem=pa_fs).head()) # This *should* work |
thanks @wjones127 for the quick response. This helps me get a little further but I think I then run afoul of differing path expectations. After making the change, I get the following error:
I think the problem is coming from here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L364 Where here it's expecting a "bucket/key" path without a URI (i.e., no "s3://"), but I could be wrong. Am I still missing something or is this maybe just not quite functional yet? If there's any other issues tracking the work needed, I'd be happy to take a look and see if I can't contribute. |
Darn. I guess that workaround doesn't work. Thanks for reporting that failure too. We don't yet have integration testing with object stores, but that's high priority in the next few months.
Fixing #696 should make the workaround function. Fixing #690 should make this work without the workaround. You are welcome to submit a PR on either! |
@joshuarobinson with the release of 0.6.1 we now - whenever possible - reuse the same backend store also in the read path. Could you check if that works for you? |
Hi, I'm reopening this as I've tested against 0.6.1 and I'm still seeing the same behavior, both with and without manually specifying the pyarrow Filesystem. I've installed "pip install deltalake==0.6.1" and I confirm that I can read the schema successfully, but not the data files. In particular, the exception seems to happen when I try to convert a pyarrow dataset to a pyarrow table. If I manually specify the S3FileSystem, the error is
but if I let it "auto-resolve" to the storage_options I originally used when loading the Delta Table, the error is different:
is it possible the problem is with the paths? I know that the pyarrow.S3FileSystem expects to see paths without the s3:// prefix (because it's assumed I guess). Sorry to be a bother @roeap, but I wonder if you can confirm that the fixes in 0.6.1 should address this issue? If it should, is it possible to get a pointer to an example or test case? |
Hi @roeap I've just tried again with 0.6.2 and the issue is now fixed! fyi, manually specifying the pyarrow.filesystem doesn't work, but letting Delta take care of it behind the scenes does work (and is the better option, IMO) |
Description
Goal: I am trying to read a Delta Table from a non-AWS S3 object store (like minio). I seem to be able to read the table metadata but then when reading data I get an error that seems to indicate that my endpoint_override is being ignored and requests are sent to aws instead.
My code looks like this:
In looking through the code, I wonder if the line here needs to use the "with_options" variant:
delta-rs/python/src/lib.rs
Line 485 in d10b428
in order to pass through my configs?
Use Case
Delta on S3 object stores like Minio, Swift, FlashBlade, Vast, ECS, etc.
Related Issue(s)
The text was updated successfully, but these errors were encountered: