-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_deltalake on Unity Catalog Table from Databricks has invalid region configuration #2903
Comments
👁️ @kevinzwang |
Yeah the delta-rs SDK is very dumb about regions -- we have to provide it with exactly the right region otherwise it will freak out. Reading the code, we take the IOConfig (and hence the region) provided by Unity Catalog:
My guess is that Unity doesn't give us the right region, or likely doesn't give us a region at all. We might need to corner case this 😬 |
It looks like we currently ignore the given We'd like to get a fix for this but in the mean time @lukaskratoch what you can do is add the region to the table's io config: cfg = unity.load_table('test_catalog.test_schema.test_table')
cfg.io_config = IOConfig(s3=cfg.io_config.s3.replace(region_name='eu-central-1'))
cfg_df = daft.read_deltalake(cfg) # should work now |
Initially I was getting this err: UnityCatalogTable has to be set to frozen=False (I just overwrote it in local libraries) After that I am getting this err: |
@lukaskratoch what version of deltalake are you using? As for a solution to the io config thing without having to modify the library, you could perhaps extract the table_uri and the io_config and pass those in manually. cfg = unity.load_table('test_catalog.test_schema.test_table')
io_config = IOConfig(s3=cfg.io_config.s3.replace(region_name='eu-central-1'))
daft.read_deltalake(cfg.table_uri, io_config=io_config) |
@kevinzwang , your recommended solution that did the trick for me. I had run into the same issue that @lukaskratoch had run into as well and was searching if an issue was logged on this and found this. @kevinzwang , this is the piece of code that works for me and uses Databricks SDK
|
previously deltalake==0.19.1, today I updated to deltalake==0.20.1, still having the same connection timeout "DispatchFailure" error. |
@lukaskratoch @anilmenon14 Thank you for the information! I am taking a look into these issues and hope to have an update for you soon |
Adding an azure perspective here, this is what I had to do to be able to read from the table, which seems like credential vending just isn't working for azure since it's still trying s3.
It still gives me the output of:
Didn't seem like anything I tried with the s3 config part did anything. I had to pass in my own credentials entirely. Even when I added a custom s3_config and set it to southcentralus it would still tell me us-east-1. Note, passing in my own azure credential does work and I can read the table. I just can't get my token to do credential vending. |
Hi @jordandakota , |
Yes, when we first built this integration the Unity Catalog only vended S3 credentials 😛 I think unity has made some progress since then, but we actually do need to probably upgrade the Python SDK to get the new updated API spec. WRT regions, we will have to play around with the API a little to figure out what databricks' implementation of Unity is returning us. |
We do this already when we read from S3, but delta-rs does not, so their metadata reads fail. This is especially an issue for unity catalog tables, where the region is not specified anywhere. Tested locally, it works but I'm unsure how I would test this in unit tests since this is sort of unity+aws behavior specific Fix for #2903
v0.3.9 has just been released which should fix the S3 region issue! I am closing this issue but @lukaskratoch @anilmenon14 please try it out and if you still run into issues, feel free to reopen this issue. @jordandakota your Azure issue was also addressed in the latest release. If you have trouble with it, please create a new issue to report it |
@kevinzwang, Thanks for the update on the S3 region issue fix. I can confirm it works fine, but with a small issue. It appears
As for the Azure issue, this appears to be fixed too, but needs a bit of workaround which I am hoping we can avoid in the future. What does not work for Azure:
Error:
What works for Azure now
@jordandakota , when you have a chance, you can test as well if you see this behavior in Azure. |
I did test and came to the same workaround just slightly differently. Reporting in now as confirmed. |
Hi @kevinzwang , As @anilmenon14 mentioned i too got the same error for Azure. Replacing the storage account name solved the issue for me. Please note or create a new issue, as this seems to be a workaround. Works in Azure: |
@anilmenon14 @g-kannan will take a look at these issues, thank you for reporting them. Tracking it in a new issue #3142 |
I am trying to read a table stored in Unity Catalog (external data access enabled) in Databricks and I am getting "OSError: Generic S3 error: Received redirect without LOCATION, this normally indicates an incorrectly configured region", even though the region is explicitly defined in io_config:
And I am getting this output
[...catalogs...]
[...schemas...]
[...tables...]
With this error
failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) }))
failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) }))
S3 Credentials not provided or found when making client for us-east-1! Reverting to Anonymous mode. the credential provider was not enabled
[2024-09-23T13:10:00Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) }))
[2024-09-23T13:10:00Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: io error: error trying to connect: tcp connect error: Connection refused (os error 111): tcp connect error: Connection refused (os error 111): Connection refused (os error 111) (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io, source: hyper::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })), connection: Unknown } }) }))
Traceback (most recent call last):
File "poc_uc_daft.py", line 29, in
cfg_df = daft.read_deltalake(cfg, io_config=io_config)
File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/daft/api_annotations.py", line 39, in _wrap
return timed_func(*args, **kwargs)
File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/daft/analytics.py", line 228, in tracked_fn
result = fn(*args, **kwargs)
File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/daft/io/_deltalake.py", line 74, in read_deltalake
delta_lake_operator = DeltaLakeScanOperator(table_uri, storage_config=storage_config)
File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/daft/delta_lake/delta_lake_scan.py", line 63, in init
self._table = DeltaTable(
File "/c/Users/xxx/Projects/xxx/venv/lib/python3.8/site-packages/deltalake/table.py", line 380, in init
self._table = RawDeltaTable(
OSError: Generic S3 error: Received redirect without LOCATION, this normally indicates an incorrectly configured region
Desktop (please complete the following information):
Am doing something wrong or is it a bug? Is there a workaround?
May it be related to this 2 days old issue? #2879
The text was updated successfully, but these errors were encountered: