Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error reading iceberg table #2603

Closed
djouallah opened this issue Feb 6, 2024 · 5 comments · Fixed by #2810
Closed

error reading iceberg table #2603

djouallah opened this issue Feb 6, 2024 · 5 comments · Fixed by #2810
Labels
bug Something isn't working support User-driven support

Comments

@djouallah
Copy link

djouallah commented Feb 6, 2024

image

data can be read fine using duckdb

@djouallah djouallah added the bug Something isn't working label Feb 6, 2024
@universalmind303 universalmind303 added blocked Not actionable due to a blocker support User-driven support priority-high ⛰️ and removed blocked Not actionable due to a blocker labels Feb 6, 2024
@vrongmeal
Copy link
Contributor

This seems like an issue with v1 format.

@tychoish
Copy link
Contributor

tychoish commented Mar 6, 2024

I believe that this was addressed in #2718, which was released yesterday in 0.9.1.

@djouallah let us know if this works or if you still have an issue or if you have any example datasets (or ways of producing data) that you think we should have integration tests for.

@djouallah
Copy link
Author

getting new errors
ExecutionException: External error: Data is invalid: Failed to read table metadata: missing field snapshot-log at line 81 column 1

you can genereate iceberg tables using pyiceberg local
https://colab.research.google.com/drive/1EjffJO75-8Rj4V0MGKUsoFHDOGgicKgK?usp=sharing

@vrongmeal
Copy link
Contributor

Thanks! I'll definitely take a look at this on priority. Once we have a variety of datasets, we should be able to resolve most of the incompatibility issues with v1.

@tychoish
Copy link
Contributor

tychoish commented Mar 6, 2024

Just collecting some notes after looking at the notebook you posted @djouallah:

glaredb.sql(""" select * from
iceberg_scan('/content/warehouse/default.db/taxi_dataset/metadata/00001-c13c72f3-6082-444c-9256-bea980ff7e0e.metadata.json') """)

And the error:

ExecutionException: External error: Failed to canonicalize path "/content/warehouse/default.db/taxi_dataset/metadata/00001-c13c72f3-6082-444c-9256-bea980ff7e0e.metadata.json/metadata/version-hint.text": Not a directory (os error 20)

It seems like glaredb (and duckdb) both expect to be pointed to the top level directory that contains the iceberg table, in this case /content/warehouse/default.db/taxi_dataset/, but when I do that I get an error:

 Failed to canonicalize path "/content/warehouse/default.db/taxi_dataset/metadata/version-hint.text": No such file or directory (os error 2)

DuckDB has the same error (it's looking for the same file) so I imagine that there's something unexpected about this dataset or the way it's saved. My inspection of the pyiceberg api did not render anything fruitful yet, but I will look back into it.

vrongmeal added a commit that referenced this issue Apr 11, 2024
Fixes #2603

---------

Signed-off-by: Vaibhav <vrongmeal@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working support User-driven support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants