Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: write_deltalake to ADLS Gen2 issue #1456

Closed
Dammi87 opened this issue Jun 13, 2023 · 3 comments
Closed

Python: write_deltalake to ADLS Gen2 issue #1456

Dammi87 opened this issue Jun 13, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@Dammi87
Copy link

Dammi87 commented Jun 13, 2023

Environment

Windows 10
Python 3.10.11

Delta-rs version:
deltalake 0.9.0
pyarrow 12.0.0
numpy 1.24.3

Binding:
N/A

Environment:

  • Cloud provider: Azure
  • OS: Windows
  • Other:

Bug

What happened:
When trying to write a delta-table to Azure DataLake Gen2 a issue occurs regarding the provided storage options. I've verified that the account key is definitely correct. Here is the code snippet that causes the issue

from deltalake import write_deltalake

fs = {
 'account_name':"mdieuwcoldpathcpdl",
 'account_key': "***"
}

write_deltalake('https://mdieuwcoldpathcpdl.dfs.core.windows.net/delta/bar', data=table, storage_options=fs)

And the stack trace is

---------------------------------------------------------------------------
PyDeltaTableError                         Traceback (most recent call last)
Cell In[14], line 9
      1 from deltalake import write_deltalake
      3 fs = {
      5  'account_name':"***",
      6  'account_key': "***"
      7 }
----> 9 write_deltalake('https://***.dfs.core.windows.net/delta/bar', data=table, storage_options=fs)

File [c:\devops\md-coldpath-preprocessing\.conda\lib\site-packages\deltalake\writer.py:147](file:///C:/devops/md-coldpath-preprocessing/.conda/lib/site-packages/deltalake/writer.py:147), in write_deltalake(table_or_uri, data, schema, partition_by, filesystem, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, overwrite_schema, storage_options, partition_filters)
    144     else:
    145         data, schema = delta_arrow_schema_from_pandas(data)
--> 147 table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
    149 # We need to write against the latest table version
    150 if table:

File [c:\devops\md-coldpath-preprocessing\.conda\lib\site-packages\deltalake\writer.py:392](file:///C:/devops/md-coldpath-preprocessing/.conda/lib/site-packages/deltalake/writer.py:392), in try_get_table_and_table_uri(table_or_uri, storage_options)
    389     raise ValueError("table_or_uri must be a str, Path or DeltaTable")
    391 if isinstance(table_or_uri, (str, Path)):
--> 392     table = try_get_deltatable(table_or_uri, storage_options)
    393     table_uri = str(table_or_uri)
    394 else:

File [c:\devops\md-coldpath-preprocessing\.conda\lib\site-packages\deltalake\writer.py:405](file:///C:/devops/md-coldpath-preprocessing/.conda/lib/site-packages/deltalake/writer.py:405), in try_get_deltatable(table_uri, storage_options)
    401 def try_get_deltatable(
    402     table_uri: Union[str, Path], storage_options: Optional[Dict[str, str]]
    403 ) -> Optional[DeltaTable]:
    404     try:
--> 405         return DeltaTable(table_uri, storage_options=storage_options)
    406     except PyDeltaTableError as err:
    407         # TODO: There has got to be a better way...
    408         if "Not a Delta table" in str(err):

File [c:\devops\md-coldpath-preprocessing\.conda\lib\site-packages\deltalake\table.py:122](file:///C:/devops/md-coldpath-preprocessing/.conda/lib/site-packages/deltalake/table.py:122), in DeltaTable.__init__(self, table_uri, version, storage_options, without_files)
    109 """
    110 Create the Delta Table from a path with an optional version.
    111 Multiple StorageBackends are currently supported: AWS S3, Azure Data Lake Storage Gen2, Google Cloud Storage (GCS) and local URI.
   (...)
    119                       DeltaTable will be loaded with a significant memory reduction.
    120 """
    121 self._storage_options = storage_options
--> 122 self._table = RawDeltaTable(
    123     str(table_uri),
    124     version=version,
    125     storage_options=storage_options,
    126     without_files=without_files,
    127 )
    128 self._metadata = Metadata(self._table)

PyDeltaTableError: Failed to read delta log object: Generic MicrosoftAzure error: Container name must be specified

I tried also adding the container_name keyword to the fs dictionary with the same issue.

What you expected to happen:
I expected a delta table to be created.

How to reproduce it:

More details:

@Dammi87 Dammi87 added the bug Something isn't working label Jun 13, 2023
@roeap
Copy link
Collaborator

roeap commented Jun 13, 2023

Could be an issue with our url parsing.

In the meantime, Could you try with "az://mdieuwcoldpathcpdl/delta/bar" as table url?

@Dammi87
Copy link
Author

Dammi87 commented Jun 13, 2023

@roeap Your comment put me on the right path at least! Thank you! :)
So it turns out the only thing needed was a path like this 'abfs://delta/foo' the container-name is injected via the account name it seems.

Should I keep this open as a bug for future improvements on documentation perhaps?

@rtyler
Copy link
Member

rtyler commented Sep 20, 2023

I'm going to close this out, I think our documentation can always use improvement 😆 but I believe the bug here has been addressed

@rtyler rtyler closed this as completed Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants