Skip to content

Conversation

@saucam
Copy link

@saucam saucam commented Jan 9, 2015

         1. Statistics object should be marked non empty in case null values are written
         2. Keep a boolean in the object to identify presence of non-null values

still to add test cases, verified with spark-sql scenario mentioned in PARQUET-136

…re not null

             2. Statistics object should be marked non empty in case null values are written
             3. Keep a boolean in the object to identify presence of non-null values
@isnotinvain
Copy link
Contributor

Thanks for the PR!

Can this be accomplished by only making a single change to the super class, something like this:
https://github.com/isnotinvain/incubator-parquet-mr/compare/alexlevenson/PARQUET-161

I don't see why it needs to be in each subclass / why we need an extra variable to track this, if we can just make isEmpty() accurate we should be fine everywhere else right? Am I missing something?

Thanks again!
Alex

@saucam
Copy link
Author

saucam commented Jan 10, 2015

Hello Alex,

I tried precisely the change you have mentioned in your pull request. The problem comes in BinaryStatistics.java , because when we are writing null value, we falsify the isEmpty() check , so when the next valid value is to be written,

public void updateStats(Binary value) {

gets called, which calls

updateStats(value, value);

at which point , min/max have not been initialized, and we again get NPE.

Thats why we need this 3 state information in the BinaryStatistics.java , i.e empty column, no valid value column (or all nulls column) and lastly at-least 1 valid value column.

@isnotinvain
Copy link
Contributor

@saucam But we can still track that in the super class right?
EG, in the PR I sent you, we could just change this line:
if (this.isEmpty()) { in BinaryStatistics (and others) to if (!this.hasNonNullValue()) { right?

@julienledem
Copy link
Member

@saucam: Is this PR still applicable after #99?
If not please close it.

@saucam saucam closed this Jan 30, 2015
@saucam
Copy link
Author

saucam commented Jan 30, 2015

done !

@saucam saucam deleted the write_nulls branch January 30, 2015 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants