Skip to content

Conversation

@majian1998
Copy link
Contributor

@majian1998 majian1998 commented Jul 22, 2025

Spark SQL employs a chain of parsers to convert SQL text into logical plans for execution. Many extensions (such as Iceberg, Paimon, etc.) implement their own SQL parsers by wrapping the underlying parser with a delegate. This design allows each extension to support custom syntax, while passing unhandled SQL to the next parser in the chain.

Previously, the Spark/Iceberg logic only checked the outer-most parser instance to determine if it was an ExtendedParser. However, in environments where multiple extensions are stacked (for example, when the Paimon parser delegates to the Iceberg parser), this check fails because the top-level parser is no longer an instance of ExtendedParser. Consequently, features that rely on Iceberg's parser capabilities (such as the custom parseSortOrder logic) would not function correctly in these scenarios.

This PR improves the detection logic by recursively unwrapping delegate parsers until an ExtendedParser instance is found. This change ensures compatibility across multiple parser extensions and improves robustness when integrating with complex Spark SQL extension chains.

Related issue: #8004

I believe this change addresses the above issue.

@majian1998
Copy link
Contributor Author

Hi @ajantha-bhat , could you please take a look or help review this PR when convenient? Thank you!

@majian1998 majian1998 force-pushed the fix-parser branch 2 times, most recently from 14416e9 to bd8f3dc Compare July 23, 2025 03:14
@majian1998
Copy link
Contributor Author

@amogh-jahagirdar @manuzhang Hi! Could you please help review this PR when convenient? Thank you!

@manuzhang
Copy link
Member

@majian1998 Please target Spark 4.0 first and add a UT if possible.

@majian1998 majian1998 changed the title Spark: Support recursive delegate unwrapping to find ExtendedParser in parser chains Spark 4.0: Support recursive delegate unwrapping to find ExtendedParser in parser chains Jul 28, 2025
@majian1998
Copy link
Contributor Author

@majian1998 Please target Spark 4.0 first and add a UT if possible.

Thanks for your suggestion! I've updated to target Spark 4.0 and added a UT.

@majian1998
Copy link
Contributor Author

@majian1998 Please target Spark 4.0 first and add a UT if possible.

Could you please review again? Thanks! @manuzhang

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 30, 2025
@github-actions
Copy link

github-actions bot commented Sep 7, 2025

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Sep 7, 2025
@manuzhang manuzhang reopened this Sep 7, 2025
@github-actions github-actions bot removed the stale label Sep 8, 2025

static Object getDelegate(Object parser) {
try {
for (String methodName : new String[] {"delegate", "getDelegate"}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use getDeclaredFields to iterate over all fields instead of assuming it's always delegate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I've updated the code to iterate over all fields

@majian1998
Copy link
Contributor Author

Paimon and Kyuubi integration with iceberg may encounter this issue:

Yes, I also encountered this issue when integrating with Paimon. This PR should fix the problem.

@majian1998
Copy link
Contributor Author

It looks like this is a common issue with mixed formats. Could you help me continue reviewing this PR? @manuzhang

@manuzhang
Copy link
Member

@huaxingao could you please help review this PR?

@huaxingao
Copy link
Contributor

This approach looks reasonable to me. Also cc @RussellSpitzer @amogh-jahagirdar

@majian1998 majian1998 requested a review from manuzhang October 23, 2025 02:51
targetField.set(sessionState, parser);
}

public static class WrapperParser implements ParserInterface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be private if not used elsewhere.

@majian1998 majian1998 requested a review from manuzhang October 23, 2025 06:03
@manuzhang manuzhang requested a review from huaxingao October 23, 2025 06:27
@majian1998 majian1998 requested a review from huaxingao October 27, 2025 07:37
@huaxingao
Copy link
Contributor

@majian1998 can we also add a test where the delegate lives in a superclass?

@majian1998
Copy link
Contributor Author

@majian1998 can we also add a test where the delegate lives in a superclass?

Of course! I’ve added a test to cover superClass field lookup.

@majian1998 majian1998 requested a review from huaxingao October 28, 2025 06:07
Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@majian1998
Copy link
Contributor Author

cc @manuzhang

}

private static class WrapperParser extends AbstractSqlParser {
private final ParserInterface delegate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have more than one levels of superclasses and also more than one fields in each class?

@huaxingao huaxingao merged commit aa14aae into apache:main Nov 1, 2025
27 checks passed
@huaxingao
Copy link
Contributor

Thanks @majian1998 for the PR! Thanks @manuzhang for the review!

@majian1998
Copy link
Contributor Author

@manuzhang @huaxingao Thanks! Super excited to contribute to Iceberg for the first time. I’ll apply this fix to Spark 3.4 and 3.5 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants