Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python deltalake library to_pandas not supporting latest delta protocol version #885

Closed
cady17d opened this issue Oct 14, 2022 · 2 comments

Comments

@cady17d
Copy link

cady17d commented Oct 14, 2022

Python release version: deltalake==0.6.2

The following code snippet is working for older version delta protocol:
Line1: dt = DeltaTable(deltaPathName)
Line2: df = dt.to_pandas()

But for latest ProtocolVersions(min_reader_version=2, min_writer_version=5) it is converting all column data to null or none while to_pandas conversion.

image

Storage backend: adls gen2

For some other tables it is giving warning when executing Line1:

image

For some other tables it is throwing error when executing Line2:

  File "/home/ankit/SICDPDataAnalyticsPipeline/deltalakeservices/.venv/lib/python3.8/site-packages/deltalake/table.py", line 334, in to_pyarrow_table
    return self.to_pyarrow_dataset(
  File "pyarrow/_dataset.pyx", line 331, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 2577, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
pyo3_runtime.PanicException: dispatch dropped without returning error

When will the support for latest delta protocol version will be added?

@wjones127
Copy link
Collaborator

Oh well first, it's definitely a bug that it doesn't error on reader protocol version 2. I'll create a separate issue for that, and we'll consider this issue about supporting the higher reader protocol version.

In Python, we probably won't support reading those tables until the late next year. Supporting column mapping is going to require a major refactor of the implementation.

And for higher delta writer protocols, I don't think we have any particular timeline for that. Supporting more operations (upsert, merge) is more important to us than the higher protocol versions. That being said, if someone wanted to implement the support we would definitely take PRs for it.

@wjones127
Copy link
Collaborator

I'm closing this since we will now error if the reader version > 1.

Support for reader version 2 is being tracked in #930
And support for reader version 3 is being tracked in #1094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants