Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Oct 25, 2018

What changes were proposed in this pull request?

According to the discussion in RC4 voting thread and https://issues.apache.org/jira/browse/SPARK-25829, Spark current has a very weird behavior when we have duplicated keys in map. The newly added map related functions in 2.4 make it worse.

Before we entire fix the map behavior, we should not expose these functions to users

How was this patch tested?

N/A

@cloud-fan cloud-fan changed the title [SPARK-25832][] remove newly added map related functions from FunctionRegistry [SPARK-25832][SQL] remove newly added map related functions from FunctionRegistry Oct 25, 2018
@cloud-fan
Copy link
Contributor Author

cc @dongjoon-hyun @gatorsmile @rxin

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending Jenkins.

BTW, you also need to remove the corresponding test cases that rely on function registry.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan and @gatorsmile .

I agree that this is the quickest way. However, if you don't mind, I want to ask to remove those expressions(MapFilter) and test suites completely in branch-2.4.

According to this PR, it's too easy to expose back. Since these are very valuable functions, developers will use this hack definitely. Also, since we will decide how to change the behavior in Spark 3.0, it would be great if we completely prevent this exposure.

scala> spark.sessionState.functionRegistry.createOrReplaceTempFunction("map_filter", x => org.apache.spark.sql.catalyst.expressions.MapFilter(x(0),x(1)))

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a PR to you. Could you review and merge cloud-fan#11 ?

@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Test build #98000 has finished for PR 22821 at commit 7e919e3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Test build #98005 has finished for PR 22821 at commit 7e919e3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

@dongjoon-hyun if there are advanced users who know all the background, and still want to use these functions, why shall we stop them? If end users can't hit the bug with public APIs, I think we are fine.

@dongjoon-hyun
Copy link
Member

@cloud-fan . That's sounds like a Tech. Preview for the advance users, doesn't it?

It looks like an excuse to ignore the whole context of the discussion and to try to ship in any way. I cann't agree with you with this approach. IMO, we need to fix this or we need to not ship any code like this to any users.

@dongjoon-hyun if there are advanced users who know all the background, and still want to use these functions, why shall we stop them? If end users can't hit the bug with public APIs, I think we are fine.

IIUC, the reason why we are not fixing this is that we want to keep the release cadence. If then, please simply remove this. I already gave you the code.

@dongjoon-hyun
Copy link
Member

I'm just confused here. Shall we finish the discussion on the email thread? @cloud-fan and @gatorsmile . If the decision is officially made like that (providing tech. preview to advance users) in the email thread, I'm okay with this.

@rxin
Copy link
Contributor

rxin commented Oct 25, 2018 via email

@dongjoon-hyun
Copy link
Member

Thank you, @rxin . In that case, +1 for complete removal.
It's easier for us to add the expressions back instead of updating exising expressions.

@dongjoon-hyun
Copy link
Member

@cloud-fan . I'll update my PR to you once more.

expression[MapFromArrays]("map_from_arrays"),
expression[MapKeys]("map_keys"),
expression[MapValues]("map_values"),
expression[MapEntries]("map_entries"),
Copy link
Member

@dongjoon-hyun dongjoon-hyun Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove map_entries, we need to R together SPARK-24331. I'll include my PR to you, @cloud-fan .
cc @felixcheung

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we need to remove map_entires from functions.scala, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, cc @HyukjinKwon . We need to remove map_entries in Python.

@gatorsmile
Copy link
Member

I will submit a PR to revert all these changes. Thanks!

@felixcheung
Copy link
Member

felixcheung commented Oct 25, 2018 via email

@SparkQA
Copy link

SparkQA commented Oct 25, 2018

Test build #98017 has finished for PR 22821 at commit 726fc30.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan cloud-fan closed this Oct 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants