-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Spark 4.0: Support recursive delegate unwrapping to find ExtendedParser in parser chains #13625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @ajantha-bhat , could you please take a look or help review this PR when convenient? Thank you! |
14416e9 to
bd8f3dc
Compare
|
@amogh-jahagirdar @manuzhang Hi! Could you please help review this PR when convenient? Thank you! |
|
@majian1998 Please target Spark 4.0 first and add a UT if possible. |
Thanks for your suggestion! I've updated to target Spark 4.0 and added a UT. |
Could you please review again? Thanks! @manuzhang |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
|
||
| static Object getDelegate(Object parser) { | ||
| try { | ||
| for (String methodName : new String[] {"delegate", "getDelegate"}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use getDeclaredFields to iterate over all fields instead of assuming it's always delegate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I've updated the code to iterate over all fields
|
Paimon and Kyuubi integration with iceberg may encounter this issue: |
…er in parser chains
…er in parser chains
Yes, I also encountered this issue when integrating with Paimon. This PR should fix the problem. |
…er in parser chains
…er in parser chains
0af9301 to
9a49797
Compare
|
It looks like this is a common issue with mixed formats. Could you help me continue reviewing this PR? @manuzhang |
|
@huaxingao could you please help review this PR? |
|
This approach looks reasonable to me. Also cc @RussellSpitzer @amogh-jahagirdar |
| targetField.set(sessionState, parser); | ||
| } | ||
|
|
||
| public static class WrapperParser implements ParserInterface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be private if not used elsewhere.
…er in parser chains
…er in parser chains
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/ExtendedParser.java
Outdated
Show resolved
Hide resolved
…er in parser chains
spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/TestExtendedParser.java
Show resolved
Hide resolved
spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/TestExtendedParser.java
Show resolved
Hide resolved
|
@majian1998 can we also add a test where the delegate lives in a superclass? |
…er in parser chains
Of course! I’ve added a test to cover superClass field lookup. |
…er in parser chains
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
cc @manuzhang |
spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/ExtendedParser.java
Outdated
Show resolved
Hide resolved
spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/TestExtendedParser.java
Outdated
Show resolved
Hide resolved
…er in parser chains
…er in parser chains
| } | ||
|
|
||
| private static class WrapperParser extends AbstractSqlParser { | ||
| private final ParserInterface delegate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have more than one levels of superclasses and also more than one fields in each class?
…er in parser chains
|
Thanks @majian1998 for the PR! Thanks @manuzhang for the review! |
|
@manuzhang @huaxingao Thanks! Super excited to contribute to Iceberg for the first time. I’ll apply this fix to Spark 3.4 and 3.5 as well. |
Spark SQL employs a chain of parsers to convert SQL text into logical plans for execution. Many extensions (such as Iceberg, Paimon, etc.) implement their own SQL parsers by wrapping the underlying parser with a delegate. This design allows each extension to support custom syntax, while passing unhandled SQL to the next parser in the chain.
Previously, the Spark/Iceberg logic only checked the outer-most parser instance to determine if it was an ExtendedParser. However, in environments where multiple extensions are stacked (for example, when the Paimon parser delegates to the Iceberg parser), this check fails because the top-level parser is no longer an instance of ExtendedParser. Consequently, features that rely on Iceberg's parser capabilities (such as the custom parseSortOrder logic) would not function correctly in these scenarios.
This PR improves the detection logic by recursively unwrapping delegate parsers until an ExtendedParser instance is found. This change ensures compatibility across multiple parser extensions and improves robustness when integrating with complex Spark SQL extension chains.
Related issue: #8004
I believe this change addresses the above issue.