You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The parquet SQL benchmarks no longer run cleanly, in particular the following query returns an error
select string_optional from t where dict_10_required = 'prefix#1' and dict_1000_required = 'prefix#1';
Parquet argument error: Parquet error: 'block_size' must be a multiple of 128, got 90") for files: [PartitionedFile { file_meta: FileMeta { sized_file: SizedFile { path: "/tmp/parquet_query_sql20TObt.parquet", size: 201093448 }, last_modified: Some(2022-03-10T12:17:51.953953953Z) }, partition_values: [] }]
I suspected this related to apache/arrow-rs#1284 which was included in the 9.1 release of arrow, but rolling back to before this upgrade just alters the error message
It is unclear at this stage if the problem is that the encoder is writing gibberish, or if the code has introduced a bug in the decoder. Either way, we should have caught this upstream in arrow-rs, if it is an upstream bug.
Unfortunately my go to tool of using alternative tools has not thus far yielded fruit. I guess I need to go work out how to get spark running...
>>> pq.read_table('/home/raphael/Downloads/borked.parquet', columns=['string_optional'])
OSError: Not yet implemented: Unsupported encoding.
>>> duckdb.query(f"select string_optional from '/home/raphael/Downloads/borked.parquet'").fetchall()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Unsupported page encoding
To Reproduce
Run the SQL benchmarks
Expected behavior
They run without errors
Additional context
There is a broader question that perhaps we should be running this benchmark suite as part of some nightly CI job or something, potentially relates to #1377
The text was updated successfully, but these errors were encountered:
Describe the bug
The parquet SQL benchmarks no longer run cleanly, in particular the following query returns an error
I suspected this related to apache/arrow-rs#1284 which was included in the 9.1 release of arrow, but rolling back to before this upgrade just alters the error message
It is unclear at this stage if the problem is that the encoder is writing gibberish, or if the code has introduced a bug in the decoder. Either way, we should have caught this upstream in arrow-rs, if it is an upstream bug.
Unfortunately my go to tool of using alternative tools has not thus far yielded fruit. I guess I need to go work out how to get spark running...
To Reproduce
Run the SQL benchmarks
Expected behavior
They run without errors
Additional context
There is a broader question that perhaps we should be running this benchmark suite as part of some nightly CI job or something, potentially relates to #1377
The text was updated successfully, but these errors were encountered: