Skip to content

Conversation

@mgaido91
Copy link
Contributor

What changes were proposed in this pull request?

For performance reason, we should resolve in operation on an empty list as false in the optimizations phase, ad discussed in #19522.

How was this patch tested?

Added UT

cc @gatorsmile


override def children: Seq[Expression] = value +: list
lazy val inSetConvertible = list.forall(_.isInstanceOf[Literal])
lazy val isListEmpty = list.isEmpty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this val?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using it to be consistent with the current implementation (see the line above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call list.isEmpty is, in comparison, fast and constant time. It doesn't save much of anything to cache it, and the overhead of a lazy val


case In(_: AttributeReference, list: Seq[Expression]) if list.isEmpty => Literal.FalseLiteral
// We rely on the optimizations in org.apache.spark.sql.catalyst.optimizer.OptimizeIn
// to be sure that the list cannot be empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this comment is not accurate, since in optimizer we only deal with the case attribute is not nullable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this line after we merge this PR #19522

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

@mgaido91 Could you update the PR title?

@SparkQA
Copy link

SparkQA commented Oct 20, 2017

Test build #82923 has finished for PR 19523 at commit 50c7af3.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Oct 20, 2017

Test build #82928 has finished for PR 19523 at commit 50c7af3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mgaido91 mgaido91 changed the title [SPARK-22301][SQL] Add rule to Optimizer for In with empty list of va… [SPARK-22301][SQL] Add rule to Optimizer for In with not-nullable value and empty list Oct 24, 2017
@SparkQA
Copy link

SparkQA commented Oct 24, 2017

Test build #83011 has finished for PR 19523 at commit 99df613.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

@gatorsmile
Copy link
Member

Thanks! Merged to master.

@asfgit asfgit closed this in 3f5ba96 Oct 24, 2017
@mgaido91 mgaido91 deleted the SPARK-22301 branch November 4, 2017 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants