PARQUET-378: Add thoroughly parquet test encodings#274
PARQUET-378: Add thoroughly parquet test encodings#274spena wants to merge 1 commit intoapache:masterfrom spena:parquet-378
Conversation
There was a problem hiding this comment.
I think we should use a bigger value for this. I bumped it up to 16k and the tests work just fine. 8k or 16k is a setting more like what we would see in real data, and you can still fit several pages in each row group.
|
I tested this locally after removing the fix for PARQUET-246 and it caught the error. I'm +1 once the minor comments are addressed. Thanks @spena! |
|
@rdblue I committed another patch addressing your feedback. I had to force the push because of problems I had. In this patch, I added the following:
I run the test without the PARQUET-246 fix, and it catch the bug as well. |
|
@rdblue How can I re-run the tests to see if this patch passes? |
|
Rebase on the current master and force-push. Thanks @spena! I'll review it after. |
|
@rdblue These tests added ~10 min more time to unit-testing. Would you like to lower the time spent on unit-testing, or is that ok? |
|
Ouch. Any way we can get that down a bit? Adding 10 minutes to the build is pretty ugly. In that case I'd say we should add these as integration tests using the failsafe plugin. |
A new test case TestTypeEncodings is added that test v1 and v2 encodings for all supported column types. This test case spans many pages and row groups, and reads each page individually from first-to-last and from last-to-first.
|
@rdblue Now the tests run with 'mvn verify' as integration tests. Although, I checked travis-ci, and all tests run in ~17min, Last time with this encodings test were 21min. Maybe there are other tests adding more time, but this one is good for now I think. |
|
This looks good to me. |
|
@julienledem sure, thanks for pinging me on this. |
|
Merged. Thanks, Sergio! |
A new test case TestTypeEncodings is added that test v1 and v2 encodings for all supported column types. This test case spans many pages and row groups, and reads each page individually from first-to-last and from last-to-first. Author: Sergio Pena <sergio.pena@cloudera.com> Closes apache#274 from spena/parquet-378 and squashes the following commits: b35c339 [Sergio Pena] PARQUET-378: Add thoroughly parquet test encodings
A new test case TestTypeEncodings is added that test v1 and v2 encodings for all supported column types. This test case spans many pages and row groups, and reads each page individually from first-to-last and from last-to-first. Author: Sergio Pena <sergio.pena@cloudera.com> Closes apache#274 from spena/parquet-378 and squashes the following commits: b35c339 [Sergio Pena] PARQUET-378: Add thoroughly parquet test encodings
A new test case TestTypeEncodings is added that test v1 and v2 encodings for all supported column types. This test case spans many pages and row groups, and reads each page individually from first-to-last and from last-to-first. Author: Sergio Pena <sergio.pena@cloudera.com> Closes apache#274 from spena/parquet-378 and squashes the following commits: b35c339 [Sergio Pena] PARQUET-378: Add thoroughly parquet test encodings
A new test case TestTypeEncodings is added that test v1 and v2 encodings for all
supported column types. This test case spans many pages and row groups, and reads
each page individually from first-to-last and from last-to-first.