Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing lists in the arrow csv writer #4502

Closed
Tracked by #4460
mvanschellebeeck opened this issue Dec 4, 2022 · 2 comments
Closed
Tracked by #4460

Support writing lists in the arrow csv writer #4502

mvanschellebeeck opened this issue Dec 4, 2022 · 2 comments
Labels
enhancement New feature or request sqllogictest SQL Logic Tests (.slt)

Comments

@mvanschellebeeck
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Add support to the arrow CSV writer for writing lists.

Currently an sqllogictest of the format:

query T
SELECT array_agg(c13) FROM (SELECT * FROM aggregate_test_100 ORDER BY c13 LIMIT 2) test
----
[0VVIHzxWtNOFLtnhjHEKjXaJOSLJfm0keZ5G8BffGwgF2RwQD59TFzMStxCB]

will fail with
thread 'main' panicked at 'called Result::unwrap() on an Err value: CsvError("CSV Writer does not support List(Field { name: \"item\", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) data type")', datafusion/core/tests/sqllogictests/src/main.rs:134:33

Source of failure in arrow-csv: https://github.com/apache/arrow-rs/blob/master/arrow-csv/src/writer.rs#L228

Not sure if this makes sense to implement upstream (in arrow-csv) or as part of datafusions test harness

@mvanschellebeeck mvanschellebeeck added the enhancement New feature or request label Dec 4, 2022
@xudong963 xudong963 added the sqllogictest SQL Logic Tests (.slt) label Dec 4, 2022
@tustvold
Copy link
Contributor

tustvold commented Dec 7, 2022

This would be an upstream issue in arrow-rs if we wish to add support for this, and I would be happy to review a PR adding support for it.

That being said, I'm not sure how to encode nested data in CSV? I had understood the format to be strictly tabular? The docs for arrow python would appear to suggest only primitive types are supported although I haven't tested this?

FWIW arrow-rs supports reading/writing nested JSON

@melgenek
Copy link
Contributor

Sqllogictest doesn't use the CSV writer anymore #4578.
The test from this issue passes successfully https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/sqllogictests/test_files/aggregate.slt#L953-L957.

@xudong963 @alamb @tustvold I guess, this ticket can be closed.

@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

No branches or pull requests

4 participants