Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Mar 23, 2016

What changes were proposed in this pull request?

  1. Deprecated unionAll. It is pretty confusing to have both "union" and "unionAll" when the two do the same thing in Spark but are different in SQL.
  2. Rename reduce in KeyValueGroupedDataset to reduceGroups so it is more consistent with rest of the functions in KeyValueGroupedDataset. Also makes it more obvious what "reduce" and "reduceGroups" mean. Previously it was confusing because it could be reducing a Dataset, or just reducing groups.
  3. Added a "name" function, which is more natural to name columns than "as" for non-SQL users.
  4. Remove "subtract" function since it is just an alias for "except".

How was this patch tested?

All changes should be covered by existing tests. Also added couple test cases to cover "name".

@rxin
Copy link
Contributor Author

rxin commented Mar 23, 2016

cc @davies for Python, and @cloud-fan / @liancheng for the rest.

* df.select($"colA".name("colB"))
* }}}
*
* If the current column has metadata associated with it, this metadata will be propagated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to avoid duplicating documents at scala side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no unfortunately...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually there is. You can use @define <name> <definition> and then reference it using $name later. See here. Scala standard library uses this trick extensively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work with the unidoc for java?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked, unfortunately it doesn't work for Java :(

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53866 has finished for PR 11908 at commit 87c4641.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Mar 23, 2016

LGTM for python part

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53886 has finished for PR 11908 at commit a7e98b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Mar 23, 2016

Thanks - merging in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants