Skip to content

Conversation

@yhuai
Copy link
Contributor

@yhuai yhuai commented Aug 11, 2015

I am changing the default behavior of First/Last to respect null values (the SQL standard default behavior).

https://issues.apache.org/jira/browse/SPARK-9740

@yhuai
Copy link
Contributor Author

yhuai commented Aug 11, 2015

@hvanhovell I am making the default in this PR.

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40531 has finished for PR 8113 at commit 6ee4a04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

One final question, shouldn't we introduce a skipNulls parameter? Or do you want to address this in a follow-up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could for now simplify this to Literal.create(true, BooleanType).

@yhuai
Copy link
Contributor Author

yhuai commented Aug 12, 2015

yeah. I think that only requires a small change. Let me add it.

@hvanhovell
Copy link
Contributor

Besides the valueSet update/merge expressions LGTM.

@yhuai
Copy link
Contributor Author

yhuai commented Aug 12, 2015

@hvanhovell Alright. I added the ignoreNulls flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggupta81 This is the fix of SPARK-9592 for our master and 1.5 branch.

@yhuai yhuai changed the title [SPARK-9740] [SQL] Change the default behavior of First/Last to RESPECT NULLS. [SPARK-9740] [SPARK-9592] [SQL] Change the default behavior of First/Last to RESPECT NULLS. Aug 12, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai Both if and else branches are executing the same code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. Will fix it.

@ggupta81
Copy link

Should i close my original pull request now?
#7928

On Wed, Aug 12, 2015 at 10:38 AM, UCB AMPLab notifications@github.com
wrote:

Merged build started.


Reply to this email directly or view it on GitHub
#8113 (comment).

_Gaurav Gupta_Engineering Manager @ Adobe
9971666780

@yhuai
Copy link
Contributor Author

yhuai commented Aug 12, 2015

@ggupta81 Yeah. We can close #7928. Can you open a pr to fix branch 1.4, or reopen #7233 and update that? Thanks!

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40589 has finished for PR 8113 at commit 15c9e31.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class First(child: Expression, ignoreNulls: Boolean) extends AlgebraicAggregate
    • case class Last(child: Expression, ignoreNulls: Boolean) extends AlgebraicAggregate
    • case class First(
    • case class FirstFunction(
    • case class Last(
    • case class LastFunction(

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #1488 has finished for PR 8113 at commit 2d93d89.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Or(valueSet, Not(IsNull(child))) is a bit shorter. It is a matter of preference though...

@hvanhovell
Copy link
Contributor

One more small thing. We should probably also add the ignoreNulls option to the first and last dataframe functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Or(valueSet.left. valueSet.right) is shorter.

@yhuai
Copy link
Contributor Author

yhuai commented Aug 13, 2015

@hvanhovell Since we already pass the feature freeze deadline, I will not add new interfaces to DataFrame API (this PR is mainly about fixing the default behavior). If users request adding DF functions that expose ignoreNulls flag, we can do it in 1.6.

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40705 has finished for PR 8113 at commit ad0ac67.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class First(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class Last(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class First(
    • case class FirstFunction(
    • case class Last(
    • case class LastFunction(

@yhuai yhuai changed the title [SPARK-9740] [SPARK-9592] [SQL] Change the default behavior of First/Last to RESPECT NULLS. [SPARK-9740] [SPARK-9592] [SPARK-9210] [SQL] Change the default behavior of First/Last to RESPECT NULLS. Aug 13, 2015
@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40724 has finished for PR 8113 at commit f828bdf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class First(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class Last(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class First(
    • case class FirstFunction(
    • case class Last(
    • case class LastFunction(

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1542 has finished for PR 8113 at commit f828bdf.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * Set thresholds in multiclass (or binary) classification to adjust the probability of
    • case class First(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class Last(child: Expression, ignoreNullsExpr: Expression) extends AlgebraicAggregate
    • case class First(
    • case class FirstFunction(
    • case class Last(
    • case class LastFunction(

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #1551 timed out for PR 8113 at commit f828bdf after a configured wait of 175m.

asfgit pushed a commit that referenced this pull request Aug 17, 2015
…pression1.

https://issues.apache.org/jira/browse/SPARK-9592

#8113 has the fundamental fix. But, if we want to minimize the number of changed lines, we can go with this one. Then, in 1.6, we merge #8113.

Author: Yin Huai <yhuai@databricks.com>

Closes #8172 from yhuai/lastFix and squashes the following commits:

b28c42a [Yin Huai] Regression test.
af87086 [Yin Huai] Fix last.

(cherry picked from commit 772e7c1)
Signed-off-by: Michael Armbrust <michael@databricks.com>
asfgit pushed a commit that referenced this pull request Aug 17, 2015
…pression1.

https://issues.apache.org/jira/browse/SPARK-9592

#8113 has the fundamental fix. But, if we want to minimize the number of changed lines, we can go with this one. Then, in 1.6, we merge #8113.

Author: Yin Huai <yhuai@databricks.com>

Closes #8172 from yhuai/lastFix and squashes the following commits:

b28c42a [Yin Huai] Regression test.
af87086 [Yin Huai] Fix last.
Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregates.scala
@yhuai
Copy link
Contributor Author

yhuai commented Oct 19, 2015

test this please

@SparkQA
Copy link

SparkQA commented Oct 20, 2015

Test build #43948 has finished for PR 8113 at commit 91c174f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Thanks! Merging to master.

@asfgit asfgit closed this in 3afe448 Oct 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants