[SPARK-17940][SQL] Fixed a typo in LAST function and improved its usage string #15487

lins05 · 2016-10-14T15:48:18Z

What changes were proposed in this pull request?

Fixed a a typo in the LAST function error message
Also improved its usage string to match the FIRST function

How was this patch tested?

Existing tests.

…ge string.

hvanhovell · 2016-10-14T17:55:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Last.scala

 @ExpressionDescription(
-  usage = "_FUNC_(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.")
+  usage = """_FUNC_(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.
+    _FUNC_(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows.


Why repeat this?

While you are added could you also add a line about this being a non-deterministic function

I just happened to take a look at this PR. I manually built and tested. It seems printing the message as below:

spark-sql> DESCRIBE FUNCTION last; Function: last Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows. last(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows. If isIgnoreNull is true, returns only non-null values.

Maybe, it'd be nicer if the indentation is pretty.

@hvanhovell @HyukjinKwon Thanks for the review. I'm simply following the usage string of other functions, e.g:

spark-sql> describe function first; Function: first Class: org.apache.spark.sql.catalyst.expressions.aggregate.First Usage: first(expr) - Returns the first value of `child` for a group of rows. first(expr,isIgnoreNull=false) - Returns the first value of `child` for a group of rows. If isIgnoreNull is true, returns only non-null values. spark-sql> describe function approx_count_distinct; Function: approx_count_distinct Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus Usage: approx_count_distinct(expr) - Returns the estimated cardinality by HyperLogLog++. approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated cardinality by HyperLogLog++ with relativeSD, the maximum estimation error allowed.

So it seems the current convention is that: the first line is a short one-line description, followed by a detail description. Do we have any explicit "usage string style" to follow?

@hvanhovell I'll add the note about nondeterministic.

Ah, I thought it should be as below if it dose not affect anything but only this:

spark-sql> DESCRIBE FUNCTION last; Function: last Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows. last(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows. If isIgnoreNull is true, returns only non-null values.

This was just my personal opinion.

SparkQA · 2016-10-14T18:05:02Z

Test build #66965 has finished for PR 15487 at commit 33988d3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-16T16:05:06Z

Test build #67036 has finished for PR 15487 at commit fcf8bb5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-10-17T03:34:49Z

@hvanhovell @lins05 I just had a confirmation for the documentation format and will proceed to sweep it (SPARK-17963). So, maybe you could do like the one as below:

@ExpressionDescription(
  usage =
    """
      _FUNC_() - Returns a random column with i.i.d. uniformly distributed values in [0, 1].
        seed is given randomly.

      _FUNC_(seed) - Returns a random column with i.i.d. uniformly distributed values in [0, 1].
        seed should be an integer/long/NULL literal.
    """,
  extended = "...")

In the PR I am going to submit, I will exclude both functions, first and last. Also, I will take out rand and randn which would be handled in #15432

lins05 · 2016-10-17T10:49:28Z

@HyukjinKwon thanks, I'll update the PR accordingly.

lins05 · 2016-10-18T11:18:25Z

@HyukjinKwon I've updated the usage string. Now it looks like this:

spark-sql> describe function first;
Function: first
Class: org.apache.spark.sql.catalyst.expressions.aggregate.First
Usage: 
      first(expr) - Returns the first value of `child` for a group of rows.

      first(expr, isIgnoreNull=false) - Returns the first value of `child` for a group of rows.
      If isIgnoreNull is true, returns only non-null values.

      Note that in most cases the result is nondeterministic.

spark-sql> describe function last;
Function: last
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
Usage: 
      last(expr) - Returns the last value of `child` for a group of rows.

      last(expr, isIgnoreNull) - Returns the last value of `child` for a group of rows.
      If isIgnoreNull is true, returns only non-null values.

      Note that in most cases the result is nondeterministic.

HyukjinKwon · 2016-10-18T11:43:27Z

@lins05 This is being discussed in #15513. One idea is, we might have to a bit wait until it is confirmed (although I guess it is a soft-yes). Alternatively, I can do the indentation stuff in my PR above together but let this PR fix only the typo and extra mention for nundeterministic. I can resolve the conflicts.

Let us please follow @hvanhovell's decision here.

SparkQA · 2016-10-18T13:42:46Z

Test build #67125 has finished for PR 15487 at commit 4c3f043.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

lins05 · 2016-10-28T01:40:29Z

@HyukjinKwon OK, please fix all these in your PR. I'll close this small one.

[SPARK-17940][SQL] Fixed a typo in LAST function and improved its usa…

33988d3

…ge string.

hvanhovell reviewed Oct 14, 2016

View reviewed changes

Added the note about nondeterministic.

fcf8bb5

Improve usage string format.

4c3f043

lins05 closed this Oct 28, 2016

lins05 deleted the spark-17940-typo-in-last-func branch October 28, 2016 01:40

[SPARK-17940][SQL] Fixed a typo in LAST function and improved its usage string #15487

[SPARK-17940][SQL] Fixed a typo in LAST function and improved its usage string #15487

Uh oh!

Conversation

lins05 commented Oct 14, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

hvanhovell Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

hvanhovell Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Oct 16, 2016

Choose a reason for hiding this comment

Uh oh!

lins05 Oct 16, 2016

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Oct 17, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 14, 2016

Uh oh!

SparkQA commented Oct 16, 2016

Uh oh!

HyukjinKwon commented Oct 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lins05 commented Oct 17, 2016

Uh oh!

lins05 commented Oct 18, 2016

Uh oh!

HyukjinKwon commented Oct 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 18, 2016

Uh oh!

lins05 commented Oct 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HyukjinKwon commented Oct 17, 2016 •

edited

Loading

HyukjinKwon commented Oct 18, 2016 •

edited

Loading