-
-
Notifications
You must be signed in to change notification settings - Fork 138
[Final] More examples to highlight new 0.5 features. #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #241 +/- ##
=========================================
Coverage ? 96.56%
=========================================
Files ? 51
Lines ? 874
Branches ? 11
=========================================
Hits ? 844
Misses ? 30
Partials ? 0
Continue to review full report at Codecov.
|
9daa4d1 to
7668fbd
Compare
|
Hi @OlivierBlanvillain this is ready for review. Thanks! |
7668fbd to
d037624
Compare
| Only column types that can be sorted are allowed to be selected for sorting. | ||
|
|
||
| ```tut:book | ||
| aptTypedDs.orderBy(aptTypedDs('city).asc).show(2).run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default ordering as asc is also supported with #236. so this should be the same as
aptTypedDs.orderBy(aptTypedDs('city)).show(2).run()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it like that but implicit with poly didn’t work in tut. Not sure why ... I will give it another shot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm you're right. It seems that explicitly calling the apply with the column type works but otherwise it fails to infer the sort. E.g.:
aptTypedDs.orderBy(aptTypedDs[String]('city)).show(2).run()That's fairly annoying.
What's interesting is that orderByMany works without that.
aptTypedDs.orderByMany(aptTypedDs('city)).show(2).run()It's not immediately obvious to me why the implicit isn't getting picked up. I'll try to look into it shortly.
OlivierBlanvillain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @imarios, this is very nicely done!
docs/src/main/tut/FeatureOverview.md
Outdated
|
|
||
| The union of `aptTypedDs2` with `aptTypedDs` uses all the fields of the caller (`aptTypedDs2`) | ||
| and expects the other (`aptTypedDs`) dataset to include all those fields. | ||
| If field names/types do not much you get a compilation error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not match
docs/src/main/tut/FeatureOverview.md
Outdated
| ``` | ||
|
|
||
| The union of `aptTypedDs2` with `aptTypedDs` uses all the fields of the caller (`aptTypedDs2`) | ||
| and expects the other (`aptTypedDs`) dataset to include all those fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other dataset (aptTypedDs) to ...
docs/src/main/tut/FeatureOverview.md
Outdated
|
|
||
| Frameless supports many of Spark's functions and transformations. | ||
| However, whenever a Spark function does not exist in Frameless, | ||
| calling the `.dataset` will take you back to vanilla `Dataset` where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will expose the underlying Dataset (from org.apache.spark.sql, the original Spark APIs), where you can use anything that would be missing from the Frameless API.
docs/src/main/tut/FeatureOverview.md
Outdated
| ``` | ||
|
|
||
| A simple way to add a column without loosing important schema information is | ||
| to project the entire source schema into a single column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should maybe note that his is something new that is not part of the vanilla APIs
|
|
||
| ## Aggregate vs Projected columns | ||
|
|
||
| Vanilla `Datasets` do not distinguish between columns created from aggregate operations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would all them "Spark's Datasets" in the docs
a524664 to
b8e343d
Compare
|
Thanks for all the help guys! @OlivierBlanvillain @frosforever ... don't forget the squash, I have a ton of useless commits here :) |
|
LGTM 👍 |
@OlivierBlanvillain, this is ready for review, but I might be adding few more examples here (hence the WIP).