-
Notifications
You must be signed in to change notification settings - Fork 3k
API: required nested fields within optional structs can produce null #13804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dejangvozdenac thanks for reporting and fixing the issue. I agree that can you add unit test in Spark module to cover the scenario you described? |
api/src/test/java/org/apache/iceberg/expressions/TestBoundReferenceProducesNull.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReferenceProducesNull.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReferenceProducesNull.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReferenceProducesNull.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/BoundReference.java
Outdated
Show resolved
Hide resolved
parquet/src/test/java/org/apache/iceberg/parquet/TestBloomRowGroupFilter.java
Outdated
Show resolved
Hide resolved
|
thanks for the review @stevenzwu, I appreciate it! I addressed all the comments, let me know if anything further is needed. |
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestFilterPushDown.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestBoundReference.java
Show resolved
Hide resolved
stevenzwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We will need 2 or 3 more committers' approval, since this modifies critical api code.
awesome, thanks for the quick review @stevenzwu. what's the usually process here? should I tag 2-3 people in who have reviewed code under api or do you have suggestions? |
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestFilterPushDown.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestSelect.java
Outdated
Show resolved
Hide resolved
b489986 to
535c8a1
Compare
|
Hey @pvary , mind taking a look? @stevenzwu mentioned we need at least 3 reviewers on this. |
|
Sorry wrong tag, meant to tag @nastra |
|
@nastra might be busy. @pvary or @singhpk234 mind taking a look? |
|
@stevenzwu we have three approvals now, would this be ready to merge? |
|
hey @stevenzwu @pvary @nastra @huaxingao , gentle ping on this |
|
@nastra you want to take another look before we merge this? |
…ce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated
…ce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated
…ce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated
…ce nulls (#14270) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by #13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL
…pache#13804) (cherry picked from commit c05f6c9)
…ce nulls (apache#14270) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by apache#13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL (cherry picked from commit a473b1c)
…ce nulls (#14270) (#14512) * API: Detect whether required fields nested within optionals can produce nulls This partially reverts some changes around the `Accessor` API that were introduced by #13804 and uses a Schema visitor to detect whether any of the parent fields of a nested required field are optional. This info is then used when IS_NULL / NOT_NULL is evaluated * only check parent fields on IS_NULL/NOT_NULL (cherry picked from commit a473b1c) Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com>
|
@dejangvozdenac Thank you for fixing this issue! I've verified that this change resolves the correctness issue in Trino. A regression test is included in trinodb/trino#26640. |
producesNull check is used when binding isNull and notNull predicate. Currently, the logic states that a field can produce null if and only if that field is optional. This is not true, however, in the case of required fields nested within optional structs. The field itself can produce nulls if the parent struct is null despite it being required.
I'm able to reproduce this case in Trino and Spark by creating the following schema and adding rows to it:
address.street is null for row 0, but trino/spark using iceberg api disagree:
You can see that when Trino reads the entire file, it correctly determines the row:
This also leads to unexpected behavior where null or not null check behaves differently based on the binding order:
After this change, iceberg can find the null row:
Closes #13328 (and relatedly trinodb/trino#20511)