[SPARK-23619][DOCS] Add output description for some generator expressions / functions #23748

jashgala · 2019-02-08T14:13:11Z

What changes were proposed in this pull request?

This PR addresses SPARK-23619: https://issues.apache.org/jira/browse/SPARK-23619

It adds additional comments indicating the default column names for the explode and posexplode
functions in Spark-SQL.

Functions for which comments have been updated so far:

stack
inline
explode
posexplode
explode_outer
posexplode_outer

How was this patch tested?

This is just a change in the comments. The package builds and tests successfullly after the change.

…e functions

maropu · 2019-02-09T06:25:55Z

You need to update ExpressionDescription if you want to update the docs, e.g.,

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala

Line 355 in d0443a7

    
           usage = "_FUNC_(expr) - Separates the elements of array `expr` into multiple rows, or the elements of map `expr` into multiple rows and columns.",

https://spark.apache.org/docs/2.4.0/api/sql/index.html#explode

maropu · 2019-02-09T06:27:55Z

Also, could you check the other functions for the same fix. e.g., explode_outer, ...

…osexplode

jashgala · 2019-02-09T13:14:56Z

@maropu
Thanks for the guidance.

I've fixed the comments, added comments for explode_outer and posexplode_outer and also added the ExpressionDescriptions.

Please let me know if there is anything else I might've missed.

maropu · 2019-02-11T05:26:40Z

Thanks. Could you cover all the related document fixes in this pr? How about Inline and Stack? They have default column names, right?

HyukjinKwon · 2019-02-11T10:12:00Z

sql/core/src/main/scala/org/apache/spark/sql/functions.scala

  /**
   * Creates a new row for each element in the given array or map column.
+   * Uses the default column name `col` for elements in the array and
+   * `key` and `value` for elements in the map unless specified otherwise.


Let's also update Python doc and R doc while we're here. Take a look for functions.py and functions.R

oh... I forgot the viewpoint...thanks.

HyukjinKwon · 2019-02-11T10:12:15Z

ok to test

SparkQA · 2019-02-11T13:21:06Z

Test build #102194 has finished for PR 23748 at commit 4ed035b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-11T13:51:43Z

Test build #102195 has finished for PR 23748 at commit 4ed035b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jashgala · 2019-02-12T17:01:52Z

@maropu @HyukjinKwon

While looking at the default column names used by inline and stack, I found that inline uses col1, col2, etc. (i.e. 1-indexed columns), while stack uses col0, col1, col2, etc. (i.e. 0-indexed columns).

scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
+----+----+
|col0|col1|
+----+----+
|   1|   2|
|   3|null|
+----+----+


scala>  spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 'b')))").show
+----+----+
|col1|col2|
+----+----+
|   1|   a|
|   2|   b|
+----+----+

This feels like an issue with consistency. Is there a reason why this column naming convention is kept different?

maropu · 2019-02-13T00:55:18Z

ah, yes. I think it'd better to fix this, too, in a separate pr.
Probably, I think zero-based indexing is more natural. For example, auto-generated column names in CSV read is baed on zero-based;

scala> spark.read.csv("test.csv").show
+---+---+---+                                                                   
|_c0|_c1|_c2|
+---+---+---+
|  1|  2|  3|
+---+---+---+

Could you check if we have other expressions having one-based indexing?

jashgala · 2019-02-14T15:12:48Z

Have created a JIRA for tracking the consistency fix: https://issues.apache.org/jira/browse/SPARK-26879

Will update the other expressions having one-based indexing on the JIRA itself, so that it can continue as a separate investigation.

SparkQA · 2019-02-22T05:21:21Z

Test build #102622 has finished for PR 23748 at commit e5751ed.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-22T05:39:58Z

Test build #102623 has finished for PR 23748 at commit 80c0649.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

… Python

jashgala · 2019-02-22T07:11:21Z

@maropu I've added comments for Inline and Stack in the generators.scala file. I didn't see these in the functions.* files though.
@HyukjinKwon Have added comments in Python and R functions files as well as per the review comments above.

I see that there are a lot of functions (e.g. mathematical functions, etc.) in the functions.* files. Is there value in adding the column names in the comments for all of them or do should we make this change specifically for the functions that are specifically used to generate columns/rows (e.g. explode, etc.)?

HyukjinKwon · 2019-02-22T07:25:32Z

I haven;t checked super closely but looks fine.

SparkQA · 2019-02-22T08:05:01Z

Test build #102625 has finished for PR 23748 at commit d0dea9c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-22T08:05:02Z

Test build #102624 has finished for PR 23748 at commit d029d3c.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2019-02-22T08:10:57Z

retest this please

maropu · 2019-02-22T09:59:33Z

@jashgala Yea, I think we would be nice to fix all the same issues in this pr where possible.

SparkQA · 2019-02-22T13:00:16Z

Test build #102631 has finished for PR 23748 at commit d0dea9c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jashgala · 2019-02-24T05:40:23Z

Cool... Will make the change and add more commits!

jashgala · 2019-03-04T15:54:45Z

@maropu
Just wanted to update... I'm going through each function, understanding it and then adding the comments... It's taking a bit of a while especially due to real life responsibilities, but I want to reassure I am working on it!

HyukjinKwon · 2019-03-05T02:24:41Z

I took a quick look and looks fine. Let me leave it to @maropu since he's been actively reviewing this.

maropu · 2019-03-06T00:57:09Z

Could you update the PR description; plz list up all the function you addressed in this PR?

jashgala · 2019-03-07T15:03:59Z

So far I've addressed the following (also updated in PR description):

stack
inline
explode
posexplode
explode_outer
posexplode_outer

The others are still WIP

HyukjinKwon · 2019-04-26T07:39:48Z

retest this please

SparkQA · 2019-04-26T11:23:26Z

Test build #104934 has finished for PR 23748 at commit d0dea9c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-04-27T01:29:26Z

Merged to master.

jashgala · 2019-04-27T05:14:29Z

Thanks.... I will fix the other documentation when I have some free time!

[SPARK-23619][SQL] Added default column names for explode & posexplod…

82e217b

…e functions

jashgala added 2 commits February 9, 2019 18:32

Added more comments for explode_outer and posexplode_outer

e7a34eb

Added default column names in ExpressionDescription for explode and p…

4ed035b

…osexplode

HyukjinKwon reviewed Feb 11, 2019

View reviewed changes

Added default column names for inline and stack in comments

e5751ed

Added scala style line-size exception to prevent failure due to comment

80c0649

jashgala added 3 commits February 22, 2019 11:18

Fixed typo in scala style exceptions in comments

d029d3c

Added default column names in comments for explode, etc. functions in R

0313b59

Added default column names in comments for explode, etc. functions in…

d0dea9c

… Python

jashgala changed the title ~~[SPARK-23619][SQL] Added default column names for explode & posexplode in comments~~ [WIP][SPARK-23619][SQL] Added default column names for explode & posexplode in comments Feb 25, 2019

HyukjinKwon approved these changes Mar 5, 2019

View reviewed changes

maropu changed the title ~~[WIP][SPARK-23619][SQL] Added default column names for explode & posexplode in comments~~ [SPARK-23619][SQL] Added default column names for explode & posexplode in comments Mar 6, 2019

chakravarthiT mentioned this pull request Mar 11, 2019

[SPARK-26879][SQL] Standardize one-based column indexing for stack and json_tuple function #24051

Closed

HyukjinKwon changed the title ~~[SPARK-23619][SQL] Added default column names for explode & posexplode in comments~~ [SPARK-23619][DOC] Add output description for some generator expressions / functions Apr 27, 2019

HyukjinKwon changed the title ~~[SPARK-23619][DOC] Add output description for some generator expressions / functions~~ [SPARK-23619][DOCS] Add output description for some generator expressions / functions Apr 27, 2019

HyukjinKwon closed this in 90085a1 Apr 27, 2019

[SPARK-23619][DOCS] Add output description for some generator expressions / functions #23748

[SPARK-23619][DOCS] Add output description for some generator expressions / functions #23748

Uh oh!

Conversation

jashgala commented Feb 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

maropu commented Feb 9, 2019

Uh oh!

maropu commented Feb 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jashgala commented Feb 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Feb 11, 2019

Uh oh!

HyukjinKwon Feb 11, 2019

Choose a reason for hiding this comment

Uh oh!

maropu Feb 12, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Feb 11, 2019

Uh oh!

SparkQA commented Feb 11, 2019

Uh oh!

SparkQA commented Feb 11, 2019

Uh oh!

jashgala commented Feb 12, 2019

Uh oh!

maropu commented Feb 13, 2019

Uh oh!

jashgala commented Feb 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Feb 22, 2019

Uh oh!

SparkQA commented Feb 22, 2019

Uh oh!

jashgala commented Feb 22, 2019

Uh oh!

HyukjinKwon commented Feb 22, 2019

Uh oh!

SparkQA commented Feb 22, 2019

Uh oh!

SparkQA commented Feb 22, 2019

Uh oh!

dilipbiswal commented Feb 22, 2019

Uh oh!

maropu commented Feb 22, 2019

Uh oh!

SparkQA commented Feb 22, 2019

Uh oh!

jashgala commented Feb 24, 2019

Uh oh!

jashgala commented Mar 4, 2019

Uh oh!

HyukjinKwon commented Mar 5, 2019

Uh oh!

maropu commented Mar 6, 2019

Uh oh!

jashgala commented Mar 7, 2019

Uh oh!

HyukjinKwon commented Apr 26, 2019

Uh oh!

SparkQA commented Apr 26, 2019

Uh oh!

HyukjinKwon commented Apr 27, 2019

Uh oh!

jashgala commented Apr 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

jashgala commented Feb 8, 2019 •

edited

Loading

maropu commented Feb 9, 2019 •

edited

Loading

jashgala commented Feb 9, 2019 •

edited

Loading

jashgala commented Feb 14, 2019 •

edited

Loading