-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25832][SQL] remove newly added map related functions from FunctionRegistry #22821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending Jenkins.
BTW, you also need to remove the corresponding test cases that rely on function registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan and @gatorsmile .
I agree that this is the quickest way. However, if you don't mind, I want to ask to remove those expressions(MapFilter) and test suites completely in branch-2.4.
According to this PR, it's too easy to expose back. Since these are very valuable functions, developers will use this hack definitely. Also, since we will decide how to change the behavior in Spark 3.0, it would be great if we completely prevent this exposure.
scala> spark.sessionState.functionRegistry.createOrReplaceTempFunction("map_filter", x => org.apache.spark.sql.catalyst.expressions.MapFilter(x(0),x(1)))
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a PR to you. Could you review and merge cloud-fan#11 ?
|
Test build #98000 has finished for PR 22821 at commit
|
|
retest this please |
|
Test build #98005 has finished for PR 22821 at commit
|
|
@dongjoon-hyun if there are advanced users who know all the background, and still want to use these functions, why shall we stop them? If end users can't hit the bug with public APIs, I think we are fine. |
|
@cloud-fan . That's sounds like a Tech. Preview for the advance users, doesn't it? It looks like an excuse to ignore the whole context of the discussion and to try to ship in any way. I cann't agree with you with this approach. IMO, we need to fix this or we need to not ship any code like this to any users.
IIUC, the reason why we are not fixing this is that we want to keep the release cadence. If then, please simply remove this. I already gave you the code. |
|
I'm just confused here. Shall we finish the discussion on the email thread? @cloud-fan and @gatorsmile . If the decision is officially made like that (providing tech. preview to advance users) in the email thread, I'm okay with this. |
|
We seem to be splitting hairs here. Why are we providing tech preview to
advanced users? Are you saying they construct expressions directly using
internal APIs? I doubt that’s tech preview.
Users can construct a lot of invalid plans that lead to weird semantics or
behaviors if they try, and this doesn’t really make it worse.
In either case it’s not that difficult to remove them and add them back so
I could see it going either way.
…On Thu, Oct 25, 2018 at 7:48 AM Dongjoon Hyun ***@***.***> wrote:
I'm just confused here. Shall we finish the discussion on the email
thread? @cloud-fan <https://github.com/cloud-fan> and @gatorsmile
<https://github.com/gatorsmile> . If the decision is officially made like
that (providing tech. preview to advance users) in the email thread, I'm
okay with this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22821 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AATvPODR0wtirRLTLDDSb7agOF-U8fqLks5uoc8ogaJpZM4X5fHp>
.
|
|
Thank you, @rxin . In that case, +1 for complete removal. |
|
@cloud-fan . I'll update my PR to you once more. |
| expression[MapFromArrays]("map_from_arrays"), | ||
| expression[MapKeys]("map_keys"), | ||
| expression[MapValues]("map_values"), | ||
| expression[MapEntries]("map_entries"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove map_entries, we need to R together SPARK-24331. I'll include my PR to you, @cloud-fan .
cc @felixcheung
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we need to remove map_entires from functions.scala, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, cc @HyukjinKwon . We need to remove map_entries in Python.
|
I will submit a PR to revert all these changes. Thanks! |
|
Please ping me on the R removal.
…________________________________
|
|
Test build #98017 has finished for PR 22821 at commit
|
What changes were proposed in this pull request?
According to the discussion in RC4 voting thread and https://issues.apache.org/jira/browse/SPARK-25829, Spark current has a very weird behavior when we have duplicated keys in map. The newly added map related functions in 2.4 make it worse.
Before we entire fix the map behavior, we should not expose these functions to users
How was this patch tested?
N/A