-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17940][SQL] Fixed a typo in LAST function and improved its usage string #15487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @ExpressionDescription( | ||
| usage = "_FUNC_(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.") | ||
| usage = """_FUNC_(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows. | ||
| _FUNC_(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why repeat this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are added could you also add a line about this being a non-deterministic function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just happened to take a look at this PR. I manually built and tested. It seems printing the message as below:
spark-sql> DESCRIBE FUNCTION last;
Function: last
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.
last(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows.
If isIgnoreNull is true, returns only non-null values.Maybe, it'd be nicer if the indentation is pretty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell @HyukjinKwon Thanks for the review. I'm simply following the usage string of other functions, e.g:
spark-sql> describe function first;
Function: first
Class: org.apache.spark.sql.catalyst.expressions.aggregate.First
Usage: first(expr) - Returns the first value of `child` for a group of rows.
first(expr,isIgnoreNull=false) - Returns the first value of `child` for a group of rows.
If isIgnoreNull is true, returns only non-null values.
spark-sql> describe function approx_count_distinct;
Function: approx_count_distinct
Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus
Usage: approx_count_distinct(expr) - Returns the estimated cardinality by HyperLogLog++.
approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated cardinality by HyperLogLog++
with relativeSD, the maximum estimation error allowed.
So it seems the current convention is that: the first line is a short one-line description, followed by a detail description. Do we have any explicit "usage string style" to follow?
@hvanhovell I'll add the note about nondeterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought it should be as below if it dose not affect anything but only this:
spark-sql> DESCRIBE FUNCTION last;
Function: last
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows.
last(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows.
If isIgnoreNull is true, returns only non-null values.This was just my personal opinion.
|
Test build #66965 has finished for PR 15487 at commit
|
|
Test build #67036 has finished for PR 15487 at commit
|
|
@hvanhovell @lins05 I just had a confirmation for the documentation format and will proceed to sweep it (SPARK-17963). So, maybe you could do like the one as below: @ExpressionDescription(
usage =
"""
_FUNC_() - Returns a random column with i.i.d. uniformly distributed values in [0, 1].
seed is given randomly.
_FUNC_(seed) - Returns a random column with i.i.d. uniformly distributed values in [0, 1].
seed should be an integer/long/NULL literal.
""",
extended = "...")In the PR I am going to submit, I will exclude both functions, |
|
@HyukjinKwon thanks, I'll update the PR accordingly. |
|
@HyukjinKwon I've updated the usage string. Now it looks like this: |
|
@lins05 This is being discussed in #15513. One idea is, we might have to a bit wait until it is confirmed (although I guess it is a soft-yes). Alternatively, I can do the indentation stuff in my PR above together but let this PR fix only the typo and extra mention for Let us please follow @hvanhovell's decision here. |
|
Test build #67125 has finished for PR 15487 at commit
|
|
@HyukjinKwon OK, please fix all these in your PR. I'll close this small one. |
What changes were proposed in this pull request?
How was this patch tested?
Existing tests.