-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet Fuzz Tests #1053
Comments
After thinking about this for a week - I'm inclined to start driving with Arrow Python/Hypothesis and Python Parquet tests then gradually add Proptest. AWS Labs has the best proptest examples. Zooming out a bit more, DataFusion needs to be integrated in squirrel - sqlancer cross SQL engine tests. Can use sqlsmith for reductions of large queries. We also want to be like AWS Redshift where you write a query in Python/SQL - and it emits Rust code that gets compiled and sent to worker nodes. Seems we might need thin-lto even on dev builds to reduce false positives https://github.com/awslabs/rust-smt-ir/blob/551565ea5e97f502269d74d189e2e2c1e6b52f40/Cargo.toml#L11 |
FYI I'm experimenting with extending the existing fuzz tests to support nulls, dictionaries, etc... |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Whilst working on #1037 I've introduced bugs that have then been caught by the arrow array benchmarks.
It would therefore appear that these tests are exercising code paths not found in the other tests, and we could therefore increase the test coverage by including some variant of them.
Describe the solution you'd like
A set of fuzz tests that create various types of
PageIterator
with multiple column chunks, and multiple pages per column chunk. This can likely reuse much of the fuzz plumbing found in the arrow_array_reader benchmarks.The tests would then use the
ArrayReader
abstractions to read this data and verify it is what was written.Describe alternatives you've considered
We could not add fuzz tests, but there would be an increased likelihood of regressions.
The text was updated successfully, but these errors were encountered: