fix: Iceberg fixes for reading table metadata #2810

vrongmeal · 2024-03-21T14:04:20Z

tychoish · 2024-03-26T23:50:51Z

What more do we need here? (I think there's some more test coverage that we could use here?)

vrongmeal · 2024-03-27T10:41:18Z

What more do we need here? (I think there's some more test coverage that we could use here?)

Added test coverage. There are issues though. If we overwrite the existing data (say running the generate_pyiceberg.py again without deleting the existing data), GlareDB returns twice the number of rows whereas the data is appropriately overwritten according to iceberg.

tychoish · 2024-03-29T12:52:20Z

testdata/generate_pyiceberg.py

+How to run this script:
+======================
+
+$ python3 -m venv venv
+$ source venv/bin/activate
+$ pip install "pyiceberg[pyarrow,sql-sqlite]"
+$ pip install botocore
+$ python ./testdata/generate_pyiceberg.py


I think the data set ends up being pretty big, and while I don't think it really matters, I think it'd be good to avoid just committing lots of test data to the repo just cause, when we have other options (e.g. uploading the data to the test bucket, or generating as part of a fixture in one of the pytests).

Yeah, thinking the same thing.

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

vrongmeal · 2024-04-10T09:54:57Z

@tychoish made the required changes for pytest

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

tychoish

looks great, one concern about the size of test data.

the current test failure is orthogonal to this PR and I can push through it.

is there anything that needs to be done here?

tychoish · 2024-04-10T14:07:02Z

testdata/iceberg/source_data/yellow_tripdata_2023-01.parquet

45mb feels big for a test file.

Yeah, I'll upload it on GCS. We have a script to handle big data files.

It was a simple HTTP url, added it into the fixture to download the data.

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

vrongmeal force-pushed the vrongmeal/iceberg branch from 3ca6c31 to 56a21c0 Compare March 25, 2024 14:05

vrongmeal changed the title ~~[WIP] Iceberg stuff~~ fix: Iceberg fixes for reading table metadata Mar 25, 2024

vrongmeal force-pushed the vrongmeal/iceberg branch from ca9eba9 to b4f7e5e Compare March 27, 2024 10:35

vrongmeal marked this pull request as ready for review March 29, 2024 09:33

tychoish reviewed Mar 29, 2024

View reviewed changes

vrongmeal added 4 commits April 10, 2024 10:48

fix: Iceberg fixes for reading table metadata

d802628

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

add pyiceberg test

4782d08

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

handle deleted entries

2cdf99a

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

convert test to pytest

c7bdfaa

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

vrongmeal force-pushed the vrongmeal/iceberg branch from 5e411ec to c7bdfaa Compare April 10, 2024 09:24

vrongmeal added 2 commits April 10, 2024 15:19

fix iceberg scan

0fb2465

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

fix test

2a89856

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

vrongmeal requested a review from tychoish April 10, 2024 09:55

psycopg2 without binary

8227f7a

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

tychoish reviewed Apr 10, 2024

View reviewed changes

vrongmeal added 2 commits April 11, 2024 15:06

download data during test

fb4e945

Signed-off-by: Vaibhav <vrongmeal@gmail.com>

Merge branch 'main' into vrongmeal/iceberg

700d447

tychoish approved these changes Apr 11, 2024

View reviewed changes

vrongmeal merged commit 15cd5dc into main Apr 11, 2024
26 checks passed

vrongmeal deleted the vrongmeal/iceberg branch April 11, 2024 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Iceberg fixes for reading table metadata #2810

fix: Iceberg fixes for reading table metadata #2810

vrongmeal commented Mar 21, 2024 •

edited

Loading

tychoish commented Mar 26, 2024

vrongmeal commented Mar 27, 2024

tychoish Mar 29, 2024

vrongmeal Mar 29, 2024

vrongmeal commented Apr 10, 2024

tychoish left a comment

tychoish Apr 10, 2024

vrongmeal Apr 10, 2024

vrongmeal Apr 11, 2024

fix: Iceberg fixes for reading table metadata #2810

fix: Iceberg fixes for reading table metadata #2810

Conversation

vrongmeal commented Mar 21, 2024 • edited Loading

tychoish commented Mar 26, 2024

vrongmeal commented Mar 27, 2024

tychoish Mar 29, 2024

Choose a reason for hiding this comment

vrongmeal Mar 29, 2024

Choose a reason for hiding this comment

vrongmeal commented Apr 10, 2024

tychoish left a comment

Choose a reason for hiding this comment

tychoish Apr 10, 2024

Choose a reason for hiding this comment

vrongmeal Apr 10, 2024

Choose a reason for hiding this comment

vrongmeal Apr 11, 2024

Choose a reason for hiding this comment

vrongmeal commented Mar 21, 2024 •

edited

Loading