Skip to content

PARQUET-378: Add thoroughly parquet test encodings#274

Closed
spena wants to merge 1 commit intoapache:masterfrom
spena:parquet-378
Closed

PARQUET-378: Add thoroughly parquet test encodings#274
spena wants to merge 1 commit intoapache:masterfrom
spena:parquet-378

Conversation

@spena
Copy link

@spena spena commented Sep 23, 2015

A new test case TestTypeEncodings is added that test v1 and v2 encodings for all
supported column types. This test case spans many pages and row groups, and reads
each page individually from first-to-last and from last-to-first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a bigger value for this. I bumped it up to 16k and the tests work just fine. 8k or 16k is a setting more like what we would see in real data, and you can still fit several pages in each row group.

@rdblue
Copy link
Contributor

rdblue commented Sep 24, 2015

I tested this locally after removing the fix for PARQUET-246 and it caught the error. I'm +1 once the minor comments are addressed. Thanks @spena!

@spena
Copy link
Author

spena commented Sep 25, 2015

@rdblue I committed another patch addressing your feedback. I had to force the push because of problems I had.

In this patch, I added the following:

  • testTypesEncodingsWithDictionary
  • Increment page size from 64B to 16K
  • Reduced row group size from 256K to 128K so that booleans can span more pages
  • Increment record count to 2 million to span more pages
  • Test every value of WriterVersion.values() in case a new one is added in the future

I run the test without the PARQUET-246 fix, and it catch the bug as well.

@rdblue
Copy link
Contributor

rdblue commented Sep 25, 2015

Thanks @spena! The tests failed, but both were due to #269. I'll commit this soon.

@spena
Copy link
Author

spena commented Oct 23, 2015

@rdblue How can I re-run the tests to see if this patch passes?

@rdblue
Copy link
Contributor

rdblue commented Oct 23, 2015

Rebase on the current master and force-push. Thanks @spena! I'll review it after.

@spena
Copy link
Author

spena commented Oct 26, 2015

@rdblue These tests added ~10 min more time to unit-testing. Would you like to lower the time spent on unit-testing, or is that ok?

@rdblue
Copy link
Contributor

rdblue commented Oct 26, 2015

Ouch. Any way we can get that down a bit? Adding 10 minutes to the build is pretty ugly. In that case I'd say we should add these as integration tests using the failsafe plugin.

A new test case TestTypeEncodings is added that test v1 and v2 encodings for all
supported column types. This test case spans many pages and row groups, and reads
each page individually from first-to-last and from last-to-first.
@spena
Copy link
Author

spena commented Nov 2, 2015

@rdblue Now the tests run with 'mvn verify' as integration tests. Although, I checked travis-ci, and all tests run in ~17min, Last time with this encodings test were 21min. Maybe there are other tests adding more time, but this one is good for now I think.

@julienledem
Copy link
Member

This looks good to me.
@rdblue did you want to merge?

@rdblue
Copy link
Contributor

rdblue commented Nov 17, 2015

@julienledem sure, thanks for pinging me on this.

@asfgit asfgit closed this in efafa61 Nov 19, 2015
@rdblue
Copy link
Contributor

rdblue commented Nov 19, 2015

Merged. Thanks, Sergio!

piyushnarang pushed a commit to piyushnarang/parquet-mr that referenced this pull request Jun 15, 2016
A new test case TestTypeEncodings is added that test v1 and v2 encodings for all
supported column types. This test case spans many pages and row groups, and reads
each page individually from first-to-last and from last-to-first.

Author: Sergio Pena <sergio.pena@cloudera.com>

Closes apache#274 from spena/parquet-378 and squashes the following commits:

b35c339 [Sergio Pena] PARQUET-378: Add thoroughly parquet test encodings
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jul 13, 2016
A new test case TestTypeEncodings is added that test v1 and v2 encodings for all
supported column types. This test case spans many pages and row groups, and reads
each page individually from first-to-last and from last-to-first.

Author: Sergio Pena <sergio.pena@cloudera.com>

Closes apache#274 from spena/parquet-378 and squashes the following commits:

b35c339 [Sergio Pena] PARQUET-378: Add thoroughly parquet test encodings
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
A new test case TestTypeEncodings is added that test v1 and v2 encodings for all
supported column types. This test case spans many pages and row groups, and reads
each page individually from first-to-last and from last-to-first.

Author: Sergio Pena <sergio.pena@cloudera.com>

Closes apache#274 from spena/parquet-378 and squashes the following commits:

b35c339 [Sergio Pena] PARQUET-378: Add thoroughly parquet test encodings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants