-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37173][SQL] SparkGetFunctionOperation return builtin function only once #34453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #144797 has finished for PR 34453 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #144798 has finished for PR 34453 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #145051 has finished for PR 34453 at commit
|
|
In #25252 @dongjoon-hyun suggests to make it under a legacy feature flag, since it's behaviour changing. |
|
Thank you, @juliuszsompolski |
Sure, will start work on this. |
|
Test build #146224 has finished for PR 34453 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
For legacy flag, should we must use |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
...erver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
Outdated
Show resolved
Hide resolved
...erver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
Outdated
Show resolved
Hide resolved
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
| Seq("shiftleft", "shiftright", "shiftrightunsigned")) | ||
| checkResult(metaData.getFunctions(null, "default", "upPer"), Seq("upper")) | ||
|
|
||
| statement.execute(s"SET ${SQLConf.THRIFTSERVER_SEPARATE_DISPLAY_SYSTEM_FUNCTION.key}=true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test with two schemas and run an unfiltered getFunctions call to show that previously we'd see duplicates, whereas now the functions are unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
My 2cents: working with various partners and vendors of BI tools, we found GetFunctions to rarely be used at all, and when we found it used, it was in the context of the current behaviour causing trouble (running to slow, causing UI freezes because of trying to render the humongous list of duplicated functions). I am not aware of any tool depending on the current behaviour. |
We use HUE for adhoc before, to support use hue in spark's thrift server. we also changed a lot.. |
|
Kubernetes integration test starting |
bogdanghit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @AngersZhuuuu
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #146267 has finished for PR 34453 at commit
|
|
Kubernetes integration test status failure |
|
Test build #146281 has finished for PR 34453 at commit
|
|
Test build #146284 has finished for PR 34453 at commit
|
|
gentle ping @dongjoon-hyun WDYT of current code? |
|
Any more suggestion? also cc @wangyum |
|
Any more suggestion? |
It looks good to me. @dongjoon-hyun WDYT? |
|
Gentle nudge @AngersZhuuuu @dongjoon-hyun @wangyum. |
|
@dongjoon-hyun what should still be done to push this through? |
|
Would the existing behavior ever be desirable? sounds like more of a bug? |
|
@srowen that was my initial thought as well, but there are concerns it may be a breaking change because of the different result format |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
According to https://github.com/apache/spark/pull/25252/files#r738489764, if we use wild pattern, it will return too much rows.
In this pr we return common builtin functions only once
Why are the changes needed?
Improve performance
Does this PR introduce any user-facing change?
No
How was this patch tested?
WIP