[FEA] parquet and orc corner case tests #5462
Labels
reliability
Features to improve reliability or bugs that severly impact the reliability of the plugin
test
Only impacts tests
We have run into a number of places recently where there are corner cases with old parquet data, or odd mixtures of things that are causing issues.
The goal of this is to really try hard to find corner cases for us to test for parquet and ORC. This is likely going to require us to understand the file formats themselves and write out data in a way that Spark cannot do. This is like with #5445
We should also look deeply at schema evolution and what happens if I add new files that have a modified schema. What does the CPU do and how do we handle it? Things like moving from an int to a long. We have implemented some of this for parquet but ORC is still really lacking #135
We should look at features like with parquet having the data stored in a different file from the footer. Does anyone use this? If so does Spark with with this?
To be clear not all of this work needs to be done in one issue. We can split this up into multiple issues, and if we find bugs we need to make sure to file those bugs against us.
The text was updated successfully, but these errors were encountered: