Skip to content

Conversation

@lins05
Copy link
Contributor

@lins05 lins05 commented Oct 14, 2016

What changes were proposed in this pull request?

  • Fixed a a typo in the LAST function error message
  • Also improved its usage string to match the FIRST function

How was this patch tested?

Existing tests.

@ExpressionDescription(
usage = "_FUNC_(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.")
usage = """_FUNC_(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.
_FUNC_(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why repeat this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are added could you also add a line about this being a non-deterministic function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just happened to take a look at this PR. I manually built and tested. It seems printing the message as below:

spark-sql> DESCRIBE FUNCTION last;
Function: last
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.
    last(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows.
      If isIgnoreNull is true, returns only non-null values.

Maybe, it'd be nicer if the indentation is pretty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell @HyukjinKwon Thanks for the review. I'm simply following the usage string of other functions, e.g:

spark-sql> describe function first;
Function: first
Class: org.apache.spark.sql.catalyst.expressions.aggregate.First
Usage: first(expr) - Returns the first value of `child` for a group of rows.
    first(expr,isIgnoreNull=false) - Returns the first value of `child` for a group of rows.
      If isIgnoreNull is true, returns only non-null values.

spark-sql> describe function approx_count_distinct;
Function: approx_count_distinct
Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus
Usage: approx_count_distinct(expr) - Returns the estimated cardinality by HyperLogLog++.
    approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated cardinality by HyperLogLog++
      with relativeSD, the maximum estimation error allowed.

So it seems the current convention is that: the first line is a short one-line description, followed by a detail description. Do we have any explicit "usage string style" to follow?

@hvanhovell I'll add the note about nondeterministic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought it should be as below if it dose not affect anything but only this:

spark-sql> DESCRIBE FUNCTION last;
Function: last
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.
       last(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows.
         If isIgnoreNull is true, returns only non-null values.

This was just my personal opinion.

@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66965 has finished for PR 15487 at commit 33988d3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2016

Test build #67036 has finished for PR 15487 at commit fcf8bb5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Oct 17, 2016

@hvanhovell @lins05 I just had a confirmation for the documentation format and will proceed to sweep it (SPARK-17963). So, maybe you could do like the one as below:

@ExpressionDescription(
  usage =
    """
      _FUNC_() - Returns a random column with i.i.d. uniformly distributed values in [0, 1].
        seed is given randomly.

      _FUNC_(seed) - Returns a random column with i.i.d. uniformly distributed values in [0, 1].
        seed should be an integer/long/NULL literal.
    """,
  extended = "...")

In the PR I am going to submit, I will exclude both functions, first and last. Also, I will take out rand and randn which would be handled in #15432

@lins05
Copy link
Contributor Author

lins05 commented Oct 17, 2016

@HyukjinKwon thanks, I'll update the PR accordingly.

@lins05
Copy link
Contributor Author

lins05 commented Oct 18, 2016

@HyukjinKwon I've updated the usage string. Now it looks like this:

spark-sql> describe function first;
Function: first
Class: org.apache.spark.sql.catalyst.expressions.aggregate.First
Usage: 
      first(expr) - Returns the first value of `child` for a group of rows.

      first(expr, isIgnoreNull=false) - Returns the first value of `child` for a group of rows.
      If isIgnoreNull is true, returns only non-null values.

      Note that in most cases the result is nondeterministic.

spark-sql> describe function last;
Function: last
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
Usage: 
      last(expr) - Returns the last value of `child` for a group of rows.

      last(expr, isIgnoreNull) - Returns the last value of `child` for a group of rows.
      If isIgnoreNull is true, returns only non-null values.

      Note that in most cases the result is nondeterministic.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Oct 18, 2016

@lins05 This is being discussed in #15513. One idea is, we might have to a bit wait until it is confirmed (although I guess it is a soft-yes). Alternatively, I can do the indentation stuff in my PR above together but let this PR fix only the typo and extra mention for nundeterministic. I can resolve the conflicts.

Let us please follow @hvanhovell's decision here.

@SparkQA
Copy link

SparkQA commented Oct 18, 2016

Test build #67125 has finished for PR 15487 at commit 4c3f043.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@lins05
Copy link
Contributor Author

lins05 commented Oct 28, 2016

@HyukjinKwon OK, please fix all these in your PR. I'll close this small one.

@lins05 lins05 closed this Oct 28, 2016
@lins05 lins05 deleted the spark-17940-typo-in-last-func branch October 28, 2016 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants