Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zed can't read Parquet output from DuckDB #4527

Closed
philrz opened this issue Apr 21, 2023 · 1 comment · Fixed by #4547
Closed

Zed can't read Parquet output from DuckDB #4527

philrz opened this issue Apr 21, 2023 · 1 comment · Fixed by #4547
Labels
bug Something isn't working community

Comments

@philrz
Copy link
Contributor

philrz commented Apr 21, 2023

Repro is with Zed commit 0a375f8. The test data originally came from a community user that reported brimdata/zui#2751 and brimdata/zui#2754.

Start with the attached test data imdb.csv.

$ zq -version
Version: v1.7.0-28-g0a375f88

$ zq -z 'count()' imdb.csv 
347(uint64)

A user imports it into DuckDB as a table and then exports that table as Parquet.

$ duckdb
v0.7.1 b00b93f0b1
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

D create table imdb as select * from 'imdb.csv';
D copy imdb to 'output.parquet' (FORMAT PARQUET);

Zed cannot read the Parquet that was produced.

$ zq -i parquet output.parquet 
output.parquet: *parquet.PageHeader field 15 read error: don't know what type: 13

However, tools like https://parquetreader.com seem able to read it without complaint.

I know we've suspected that following through on #4278 may allow us to cover more Parquet dialects, so I'll mark this issue as dependent on that one just in case that alone fixes it.

@philrz philrz added bug Something isn't working community labels Apr 21, 2023
nwt added a commit that referenced this issue Apr 25, 2023
Reading and writing are much faster with it than with
github.com/fraugster/parquet-go.  Its only apparent drawback is that it
offers no easy way to support Zed's duration and float16 types, and
writing a value containing either produces a cryptic error.

    $ echo '{a:1.(float16)}' | zq -f parquet -
    parquetio: unsupported type: not implemented yet

Closes #764, closes #4278, and closes #4527.
@nwt nwt closed this as completed in deea4a4 Apr 27, 2023
@nwt nwt closed this as completed in #4547 Apr 27, 2023
@philrz
Copy link
Contributor Author

philrz commented Apr 27, 2023

Verified in Zed commit deea4a4.

The Parquet format that's output by DuckDB is now readable.

$ zq -version
Version: v1.7.0-50-gdeea4a47

$ zq -i parquet output.parquet
{name:"Allendale Elementary School",school_rating:5.,size:851.,reduced_lunch:10.,state_percentile_16:90.2,state_percentile_15:95.8,stu_teach_ratio:15.7,school_type:"Public",avg_score_15:89.4,avg_score_16:85.2,full_time_teachers:54.,percent_black:2.9,percent_white:85.5,percent_asian:1.6,percent_hispanic:5.6}
{name:"Anderson Elementary",school_rating:2.,size:412.,reduced_lunch:71.,state_percentile_16:32.8,state_percentile_15:37.3,stu_teach_ratio:12.8,school_type:"Public",avg_score_15:43.,avg_score_16:38.3,full_time_teachers:32.,percent_black:3.9,percent_white:86.7,percent_asian:1.,percent_hispanic:4.9}
{name:"Avoca Elementary",school_rating:4.,size:482.,reduced_lunch:43.,state_percentile_16:78.4,state_percentile_15:83.6,stu_teach_ratio:16.6,school_type:"Public",avg_score_15:75.7,avg_score_16:73.,full_time_teachers:29.,percent_black:1.,percent_white:91.5,percent_asian:1.2,percent_hispanic:4.4}
{name:"Bailey Middle",school_rating:0.,size:394.,reduced_lunch:91.,state_percentile_16:1.6,state_percentile_15:1.,stu_teach_ratio:13.1,school_type:"Public Magnet",avg_score_15:2.1,avg_score_16:4.4,full_time_teachers:30.,percent_black:80.7,percent_white:11.7,percent_asian:2.3,percent_hispanic:4.3}
{name:"Barfield Elementary",school_rating:4.,size:948.,reduced_lunch:26.,state_percentile_16:85.3,state_percentile_15:89.2,stu_teach_ratio:14.8,school_type:"Public",avg_score_15:81.3,avg_score_16:79.6,full_time_teachers:64.,percent_black:11.8,percent_white:71.2,percent_asian:7.1,percent_hispanic:6.}
...

Thanks @nwt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant