Skip to content

Conversation

@RussellSpitzer
Copy link
Member

@RussellSpitzer RussellSpitzer commented Apr 19, 2021

Previously the boxed value of contains NaN would be null if the table
was made before NaN stats were implemented. This leads to an NPE when
being unboxed. To fix this we use the Boxed boolean and on null report
"true" since we do not know if the partition contains any NaN values. In
addition we move this below the "allValuesNull" check just in case that
would have allowed us to ignore the manifest since the values are all null.

Fixes (#2492 )

Copy link
Contributor

@yyanyy yyanyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the PR!

// if lower bound is null, then there are no non-null values
if (lowerBound == null) {
// the value is non-null, so it cannot match
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the input is NaN and the file is written with the new code (NaN flips containsNaN to true without touching upper/lower bound) then I think we will return false here which wouldn't be correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @yyanyy is correct. It was my bad for suggesting it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I think i understand the issue here,

Previously we would report "null" for the lowerBound when we meant "NaN" because we weren't recording NaNs, this means if we see a missing lowerBound we either could have a mix of NaN and Null

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed !

Previously the boxed value of contains NaN would be null if the table
was made before NaN stats were implemented. This leads to an NPE when
being unboxed. To fix this we use the Boxed boolean and on null report
"true" since we do not know if the partition contains any NaN values.
@RussellSpitzer RussellSpitzer force-pushed the FixUnboxingNPEManifestUtil branch from 3d4c691 to 5c1e019 Compare April 19, 2021 22:24
Copy link
Contributor

@yyanyy yyanyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick fix! I guess the remaining item is an edge case that we discussed in #2492, I honestly am not sure if we should handle it or not and am fine either way updated in #2492 as well, I think the current change looks good enough and we shouldn't alter the current behavior.


if (NaNUtil.isNaN(value)) {
return containsNaN;
return containsNaN == null ? true : containsNaN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider doing this in the constructor to avoid calling the ternary operation in each canContain.

this.containsNaN = summary.containsNaN() == null || summary.containsNaN();

Then this will also be a one line change. I'll leave this up to you, @RussellSpitzer @yyanyy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep it explicit

@aokolnychyi
Copy link
Contributor

LGTM. I'd probably do the check once in the constructor but I am not sure it is a big deal.

@RussellSpitzer RussellSpitzer merged commit b0e33dc into apache:master Apr 20, 2021
@RussellSpitzer RussellSpitzer deleted the FixUnboxingNPEManifestUtil branch April 20, 2021 20:20
stevenzwu pushed a commit to stevenzwu/iceberg that referenced this pull request Jul 28, 2021
) (apache#2495)

* Core: Fix NPE caused by Unboxing a Null in ManifestFileUtil

Previously the boxed value of containsNaN would be null if the table
was made before NaN stats were implemented. This leads to an NPE when
being unboxed. To fix this on null we report "true" since we do not know if the
 partition contains any NaN values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants