-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17963][SQL][Documentation] Add examples (extend) in each expression and improve documentation #15677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @srowen, @rxin, @jodersky, @gatorsmile . I closed the previous one and reopened it here. |
|
@gatorsmile I double-checked the type ones again and tried to describe the types more specifically. Could you please take another look? |
|
BTW, I hope we can fix up all the minor comments together at once as a final look if each does not block each other. |
|
Test build #67733 has finished for PR 15677 at commit
|
| extended = """ | ||
| Arguments: | ||
| expr1 - numeric type expression. | ||
| expr2 - numeric type expression that defines the decimal places to round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numeric type expression -> a numeric type expression
|
Since this PR is 2000+ line changes, I am unable to quickly go over all of them. Normally, creating such a big PR is not welcomed, because it is hard for reviewers to read throught all the changes. I hope my comments can trigger you to think more and fix all the related issues, instead of I pointing them out one by one. For example, my most recent comment is about the complex type support. Have you tried the nested types? BTW, when my teammates do anything like this PR, I always tell them to write test cases if they are not 100% sure. Do not submit a PR without verification. I hope we can do the document changes more carefully, because the external users will not read the source codes. They might easily lose patience. |
|
Hey @gatorsmile, I am not rushing you to review and also I didn't mean to find every one. I meant I can wait for all the reviews and then sweep it if they are only minors and each does not block each other. As I said, for testing, I have tested one by one when I was in doubt. I have tested honestly most of them multiple times (I admit it wasn't all for every type). So, I guess mostly the argument types are correct. The recent changes include few minor changes such as In the case of this PR, basically, it introduces new sections in documentation. I guess fixing the same instances in single PR is reasonable. At least, I asked @rxin and @srowen first before/after making this PR. I also said several times this would be big and I guess I got approved. There are always rooms for improvements for changes. We can't make something perfect at once. Also, I guess we have still a lot time to fix them more. For example, we can add all the test cases in a folllowup. This would even not be released right after this PR. Other guys still can take a look for this and submit another PR. I mean.. this would not really affect other codes or functionalities too and this is an improvement of documentation, fixing existing typos/mistakes without any downside. Please let us just focus on this topic. |
| */ | ||
| @ExpressionDescription( | ||
| usage = "_FUNC_(col1, col2, col3, ...) - Creates a struct with the given field values.") | ||
| usage = "_FUNC_(expr1, expr2, expr2 ...) - Creates a struct with the given field values.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expr2, expr2 -> expr2, expr3
|
I am feeling sorry, but I have to say I will not merge the PR if the quality is not good enough. Maybe my standard is too high. If the other Committers have different opinions, they can merge it without considering my comments. I do not think the changes are minor. Without Writing the documents are the same as writing the source codes. Sometimes, documents might be more important, because users might read it. If the documents do not match the behaviors, users normally first doubt the quality. |
| extended = """ | ||
| Arguments: | ||
| expr - a numeric or interval expression. | ||
| Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add an extra space
| */ | ||
| @ExpressionDescription( | ||
| usage = "_FUNC_(key0, value0, key1, value1...) - Creates a map with the given key/value pairs.") | ||
| usage = "_FUNC_(key0, value0, key1, value1...) - Creates a map with the given key/value pairs.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: please correct all of them to make the usage of ... consistent. For example, x, y, z, ...
|
@gatorsmile OK, so, your concern is about potentially inaccurate types and typos. If you are really worried, maybe I can split this PR into several ones. |
| usage = "_FUNC_(timestamp) - Returns the minute component of the string/timestamp.", | ||
| extended = """ | ||
| Arguments: | ||
| timestamp - a timestamp expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with the function description. Please check which one is right.
Also go over all the date/time/timestamp function descriptions. At least they should be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. I knew this but I was hesitant to fix string/timestamp because fixing this to timestamp sounds a (strictly..) regression in the documentation (it mean, string/timestamp is not inaccurate..); however, I also could not add string in extended part as it is inconsistent with the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In DBMS, this is a very common issue. If users want to use strings to represent timestamp, or the other datetime values, they need to follow the formats. If I were users, my first question is which formats they should follow. For example, the precision is 6 or 2?
string/timestamp is not clear. We definitely should improve it. Below is a document you can refer: http://www.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/sqlref/src/tpc/db2z_datetimestringrepresentation.html
| val - an expression of any type. | ||
| Examples: | ||
| > SELECT _FUNC_("a", 1, "b", 2, "c", 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you follow the consistent way to specify the string literal in the example? Most of them are using single quotes
|
Not sure whether the other reviewers have the same feeling like me. 2000+ LOC is too big, imo. The focus of this PR should be on the newly added argument types, descriptions and examples. |
|
@gatorsmile yes, this is why I suggested earlier that we stop and commit the changes we had in the last PR. We have to break this up to make it manageable in this case, because of the size. There are important doc fixes here that we do need to commit. You have indeed asked for a different additional set of logical changes, and now we're mostly down that path, which has led to one big bang PR. I don't think it's useful to question now whether it should be committed! We can commit a change that is a coherent improvement, which does not make the docs worse. We do not have to commit a change that improves or even fixes everything in the existing docs. Some of the problems you're highlighting were there already. Some are improvements to new docs, where there was simply no information before. Re-skimming this large change, it seems like it makes a coherent improvement to the docs (modulo a few more fixes you suggested). If the issue is down to type info, do you believe anything here is actually wrong-er than it was before? If so, let's zero in on those essential changes and make them, and merge this. If not let's merge this. If there is more to discuss, it will be much easier to address in a second pass, I think. We won't have a release in between the two PRs. |
|
@srowen If the newly added contents are not right, should we commit them and fix them later? If this PR does not add argument descriptions, I am fine to merge it (after one more pass) because this is an incremental improvement, as what you said. |
|
How about splitting this PR to two parts? One is the document improvement (fixing typos and adding examples). Another is to add argument descriptions (many parts are not clearly defined. More discussions are needed). |
| extended = "> SELECT _FUNC_('Spark SQL');\n 'LQS krapS'") | ||
| extended = """ | ||
| Arguments: | ||
| str - a string expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check this. This is wrong.
|
@gatorsmile, I guess the changes here are mostly correct and accurate and make sense. At least, I guess the changes here are mostly logical and consistent. This does not mean "not right" or "wrong". For example, if users run Before After Do you think this is completely incorrect and what users might easily lose patience at? I guess this is an improvement. Sure, we can improve them more and make them more consistent but I guess this might not be the reason to block this change. |
|
Could I please ask your opinion @rxin ? I would rather simply follow the majority. |
|
Test build #67779 has finished for PR 15677 at commit
|
val df2 = Seq((2.0f, 345)).toDF("a", "b")
checkAnswer(
df2.select(reverse($"a"), reverse($"b")),
Row("0.2", "543"))The argument of However, the complex type we should block. We also need to fix it in the code. |
|
Based on my review in this PR, I think we need to resolve the following three major issues when we document the argument type:
|
|
@gatorsmile I suggest we handle all of that as a follow-on PR, because this has already become a series of good related changes that has nevertheless become a very big change. Each of those were existing issues in the docs so I don't think this makes it worse. |
|
I am still uncomfortable about the newly added contents regarding the argument types, if we do not resolve the above three general issues. If this were merged, basically, in the follow-up PR, we need to rewrite almost all the argument types. |
|
That's fine, they can be changed soon after. Put it this way: do you feel the same argument applies separately to the existing documentation? if it wasn't a 'regression' in this PR i think it's separable. |
|
If the existing way to document argument types is not right, why we still want to merge it into the master branch? Why not removing them from this PR, if we care the PR grows too big? |
|
The argument would be: because it's not less-right? (I'm not even judging whether your points are right but I presume they are.) I think we did originally not have nearly this much change regarding types and I wanted to merge the last PR, so yes I had the same question before. But we're here now, so what's the simplest path forward? The review has been dragging on, which is OK if it's converging, but now you're suggesting you want to abandon this change. I don't think that's the right outcome. I'd rather resolve the stall by breaking down the change instead. I don't hear that you're arguing that the docs are wrong, just as incomplete in some ways as they were before. More practically, it's hard for you to ask for a hundred changes in comments. It's hard for me to track whether the changes affect what I've previously reviewed. I think we have to draw this to a checkpoint and so I'm asking the narrower question, whether this is actually making anything worse? it makes a number of different things better, according to its original purpose. |
|
IMO, when we add the argument types in the function descriptions, we should deliver the corresponding test cases in the same PRs. Otherwise, I do not think how we can know the argument types are right or wrong, especially when we have to consider the implicit type conversion. Strictly speaking, the newly added argument types are wrong. I think we just need to remove the argument types from this PR. That is just part of this PR. |
|
Test build #67962 has finished for PR 15677 at commit
|
|
Test build #67964 has finished for PR 15677 at commit
|
|
@gatorsmile @HyukjinKwon do you both feel like this is ready to merge? |
|
Thanks for asking. I think yes it is ready. |
d25abca to
cd0c810
Compare
… arguments Address typos Let's use English for the examples at least to reduce the lines style off for line length Address both comments
cd0c810 to
28edbd7
Compare
|
Thanks @gatorsmile and @srowen. It seems ready. |
|
LGTM pending test. |
|
Test build #68031 has finished for PR 15677 at commit
|
|
Test build #68033 has finished for PR 15677 at commit
|
|
Test build #68027 has finished for PR 15677 at commit
|
|
(#15677 (comment) was on the past commit. I resolved the conflicts twice.) |
…ssion and improve documentation ## What changes were proposed in this pull request? This PR proposes to change the documentation for functions. Please refer the discussion from #15513 The changes include - Re-indent the documentation - Add examples/arguments in `extended` where the arguments are multiple or specific format (e.g. xml/ json). For examples, the documentation was updated as below: ### Functions with single line usage **Before** - `pow` ``` sql Usage: pow(x1, x2) - Raise x1 to the power of x2. Extended Usage: > SELECT pow(2, 3); 8.0 ``` - `current_timestamp` ``` sql Usage: current_timestamp() - Returns the current timestamp at the start of query evaluation. Extended Usage: No example for current_timestamp. ``` **After** - `pow` ``` sql Usage: pow(expr1, expr2) - Raises `expr1` to the power of `expr2`. Extended Usage: Examples: > SELECT pow(2, 3); 8.0 ``` - `current_timestamp` ``` sql Usage: current_timestamp() - Returns the current timestamp at the start of query evaluation. Extended Usage: No example/argument for current_timestamp. ``` ### Functions with (already) multiple line usage **Before** - `approx_count_distinct` ``` sql Usage: approx_count_distinct(expr) - Returns the estimated cardinality by HyperLogLog++. approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated cardinality by HyperLogLog++ with relativeSD, the maximum estimation error allowed. Extended Usage: No example for approx_count_distinct. ``` - `percentile_approx` ``` sql Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. percentile_approx(col, array(percentage1 [, percentage2]...) [, accuracy]) - Returns the approximate percentile array of column `col` at the given percentage array. Each value of the percentage array must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. Extended Usage: No example for percentile_approx. ``` **After** - `approx_count_distinct` ``` sql Usage: approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. `relativeSD` defines the maximum estimation error allowed. Extended Usage: No example/argument for approx_count_distinct. ``` - `percentile_approx` ``` sql Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array. Extended Usage: Examples: > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100); [10.0,10.0,10.0] > SELECT percentile_approx(10.0, 0.5, 100); 10.0 ``` ## How was this patch tested? Manually tested **When examples are multiple** ``` sql spark-sql> describe function extended reflect; Function: reflect Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection Usage: reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection. Extended Usage: Examples: > SELECT reflect('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 ``` **When `Usage` is in single line** ``` sql spark-sql> describe function extended min; Function: min Class: org.apache.spark.sql.catalyst.expressions.aggregate.Min Usage: min(expr) - Returns the minimum value of `expr`. Extended Usage: No example/argument for min. ``` **When `Usage` is already in multiple lines** ``` sql spark-sql> describe function extended percentile_approx; Function: percentile_approx Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array. Extended Usage: Examples: > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100); [10.0,10.0,10.0] > SELECT percentile_approx(10.0, 0.5, 100); 10.0 ``` **When example/argument is missing** ``` sql spark-sql> describe function extended rank; Function: rank Class: org.apache.spark.sql.catalyst.expressions.Rank Usage: rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Extended Usage: No example/argument for rank. ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes #15677 from HyukjinKwon/SPARK-17963-1. (cherry picked from commit 7eb2ca8) Signed-off-by: gatorsmile <gatorsmile@gmail.com>
|
Merging to master and 2.1! Thanks! |
…stently with expressions ## What changes were proposed in this pull request? This PR proposes to improve documentation and fix some descriptions equivalent to several minor fixes identified in #15677 Also, this suggests to change `Note:` and `NOTE:` to `.. note::` consistently with the others which marks up pretty. ## How was this patch tested? Jenkins tests and manually. For PySpark, `Note:` and `NOTE:` to `.. note::` make the document as below: **From**      **To**      Author: hyukjinkwon <gurwls223@gmail.com> Closes #15765 from HyukjinKwon/minor-function-doc. (cherry picked from commit 15d3926) Signed-off-by: Felix Cheung <felixcheung@apache.org>
…stently with expressions ## What changes were proposed in this pull request? This PR proposes to improve documentation and fix some descriptions equivalent to several minor fixes identified in apache#15677 Also, this suggests to change `Note:` and `NOTE:` to `.. note::` consistently with the others which marks up pretty. ## How was this patch tested? Jenkins tests and manually. For PySpark, `Note:` and `NOTE:` to `.. note::` make the document as below: **From**      **To**      Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#15765 from HyukjinKwon/minor-function-doc.
|
@HyukjinKwon why did we add backticks to surround all parameters? It looks pretty weird actually. |
|
Some of them already had backricks in the description and others did not. Matching it up with backticks or double-quotes was initially suggested by some comments such as #15677 (comment), #15513 (comment) and #15677 (comment). I could not find the concrete reason to not follow. I can remove all of them if you confirm. |
|
Normally, most RDBMS docs are using Italic fonts for the parameter/argument names. Linux man is using I do not know which ways are better in our function descriptions. |
|
These function docs are printed onto the console without formatting. |
|
True. That is why I do not know which is the best way for us to show the argument/parameter names. |
…ssion and improve documentation ## What changes were proposed in this pull request? This PR proposes to change the documentation for functions. Please refer the discussion from apache#15513 The changes include - Re-indent the documentation - Add examples/arguments in `extended` where the arguments are multiple or specific format (e.g. xml/ json). For examples, the documentation was updated as below: ### Functions with single line usage **Before** - `pow` ``` sql Usage: pow(x1, x2) - Raise x1 to the power of x2. Extended Usage: > SELECT pow(2, 3); 8.0 ``` - `current_timestamp` ``` sql Usage: current_timestamp() - Returns the current timestamp at the start of query evaluation. Extended Usage: No example for current_timestamp. ``` **After** - `pow` ``` sql Usage: pow(expr1, expr2) - Raises `expr1` to the power of `expr2`. Extended Usage: Examples: > SELECT pow(2, 3); 8.0 ``` - `current_timestamp` ``` sql Usage: current_timestamp() - Returns the current timestamp at the start of query evaluation. Extended Usage: No example/argument for current_timestamp. ``` ### Functions with (already) multiple line usage **Before** - `approx_count_distinct` ``` sql Usage: approx_count_distinct(expr) - Returns the estimated cardinality by HyperLogLog++. approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated cardinality by HyperLogLog++ with relativeSD, the maximum estimation error allowed. Extended Usage: No example for approx_count_distinct. ``` - `percentile_approx` ``` sql Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. percentile_approx(col, array(percentage1 [, percentage2]...) [, accuracy]) - Returns the approximate percentile array of column `col` at the given percentage array. Each value of the percentage array must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. Extended Usage: No example for percentile_approx. ``` **After** - `approx_count_distinct` ``` sql Usage: approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. `relativeSD` defines the maximum estimation error allowed. Extended Usage: No example/argument for approx_count_distinct. ``` - `percentile_approx` ``` sql Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array. Extended Usage: Examples: > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100); [10.0,10.0,10.0] > SELECT percentile_approx(10.0, 0.5, 100); 10.0 ``` ## How was this patch tested? Manually tested **When examples are multiple** ``` sql spark-sql> describe function extended reflect; Function: reflect Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection Usage: reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection. Extended Usage: Examples: > SELECT reflect('java.util.UUID', 'randomUUID'); c33fb387-8500-4bfa-81d2-6e0e3e930df2 > SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2'); a5cf6c42-0c85-418f-af6c-3e4e5b1328f2 ``` **When `Usage` is in single line** ``` sql spark-sql> describe function extended min; Function: min Class: org.apache.spark.sql.catalyst.expressions.aggregate.Min Usage: min(expr) - Returns the minimum value of `expr`. Extended Usage: No example/argument for min. ``` **When `Usage` is already in multiple lines** ``` sql spark-sql> describe function extended percentile_approx; Function: percentile_approx Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array. Extended Usage: Examples: > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100); [10.0,10.0,10.0] > SELECT percentile_approx(10.0, 0.5, 100); 10.0 ``` **When example/argument is missing** ``` sql spark-sql> describe function extended rank; Function: rank Class: org.apache.spark.sql.catalyst.expressions.Rank Usage: rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence. Extended Usage: No example/argument for rank. ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#15677 from HyukjinKwon/SPARK-17963-1.
…stently with expressions ## What changes were proposed in this pull request? This PR proposes to improve documentation and fix some descriptions equivalent to several minor fixes identified in apache#15677 Also, this suggests to change `Note:` and `NOTE:` to `.. note::` consistently with the others which marks up pretty. ## How was this patch tested? Jenkins tests and manually. For PySpark, `Note:` and `NOTE:` to `.. note::` make the document as below: **From**      **To**      Author: hyukjinkwon <gurwls223@gmail.com> Closes apache#15765 from HyukjinKwon/minor-function-doc.
What changes were proposed in this pull request?
This PR proposes to change the documentation for functions. Please refer the discussion from #15513
The changes include
extendedwhere the arguments are multiple or specific format (e.g. xml/ json).For examples, the documentation was updated as below:
Functions with single line usage
Before
powcurrent_timestampAfter
powcurrent_timestampFunctions with (already) multiple line usage
Before
approx_count_distinctpercentile_approxUsage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. percentile_approx(col, array(percentage1 [, percentage2]...) [, accuracy]) - Returns the approximate percentile array of column `col` at the given percentage array. Each value of the percentage array must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. Extended Usage: No example for percentile_approx.After
approx_count_distinctUsage: approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. `relativeSD` defines the maximum estimation error allowed. Extended Usage: No example/argument for approx_count_distinct.percentile_approxUsage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. When `percentage` is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column `col` at the given percentage array. Extended Usage: Examples: > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100); [10.0,10.0,10.0] > SELECT percentile_approx(10.0, 0.5, 100); 10.0How was this patch tested?
Manually tested
When examples are multiple
When
Usageis in single lineWhen
Usageis already in multiple linesWhen example/argument is missing