Skip to content

Conversation

@imarios
Copy link
Contributor

@imarios imarios commented Jan 31, 2018

@OlivierBlanvillain, this is ready for review, but I might be adding few more examples here (hence the WIP).

@imarios imarios added this to the 0.5-release milestone Jan 31, 2018
@codecov-io
Copy link

codecov-io commented Jan 31, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@dfd224a). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #241   +/-   ##
=========================================
  Coverage          ?   96.56%           
=========================================
  Files             ?       51           
  Lines             ?      874           
  Branches          ?       11           
=========================================
  Hits              ?      844           
  Misses            ?       30           
  Partials          ?        0
Impacted Files Coverage Δ
...ataset/src/main/scala/frameless/TypedEncoder.scala 100% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfd224a...b8e343d. Read the comment docs.

@imarios imarios force-pushed the update_docs_for_0.5 branch 2 times, most recently from 9daa4d1 to 7668fbd Compare February 7, 2018 16:56
@imarios imarios changed the title [WIP] More examples to highlight new 0.5 features. [Final] More examples to highlight new 0.5 features. Feb 7, 2018
@imarios
Copy link
Contributor Author

imarios commented Feb 7, 2018

Hi @OlivierBlanvillain this is ready for review. Thanks!

@imarios imarios force-pushed the update_docs_for_0.5 branch from 7668fbd to d037624 Compare February 9, 2018 04:18
Only column types that can be sorted are allowed to be selected for sorting.

```tut:book
aptTypedDs.orderBy(aptTypedDs('city).asc).show(2).run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default ordering as asc is also supported with #236. so this should be the same as

aptTypedDs.orderBy(aptTypedDs('city)).show(2).run()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it like that but implicit with poly didn’t work in tut. Not sure why ... I will give it another shot

Copy link
Contributor

@frosforever frosforever Feb 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm you're right. It seems that explicitly calling the apply with the column type works but otherwise it fails to infer the sort. E.g.:

aptTypedDs.orderBy(aptTypedDs[String]('city)).show(2).run()

That's fairly annoying.
What's interesting is that orderByMany works without that.

aptTypedDs.orderByMany(aptTypedDs('city)).show(2).run()

It's not immediately obvious to me why the implicit isn't getting picked up. I'll try to look into it shortly.

Copy link
Contributor

@OlivierBlanvillain OlivierBlanvillain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @imarios, this is very nicely done!


The union of `aptTypedDs2` with `aptTypedDs` uses all the fields of the caller (`aptTypedDs2`)
and expects the other (`aptTypedDs`) dataset to include all those fields.
If field names/types do not much you get a compilation error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not match

```

The union of `aptTypedDs2` with `aptTypedDs` uses all the fields of the caller (`aptTypedDs2`)
and expects the other (`aptTypedDs`) dataset to include all those fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other dataset (aptTypedDs) to ...


Frameless supports many of Spark's functions and transformations.
However, whenever a Spark function does not exist in Frameless,
calling the `.dataset` will take you back to vanilla `Dataset` where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will expose the underlying Dataset (from org.apache.spark.sql, the original Spark APIs), where you can use anything that would be missing from the Frameless API.

```

A simple way to add a column without loosing important schema information is
to project the entire source schema into a single column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe note that his is something new that is not part of the vanilla APIs


## Aggregate vs Projected columns

Vanilla `Datasets` do not distinguish between columns created from aggregate operations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would all them "Spark's Datasets" in the docs

@imarios imarios force-pushed the update_docs_for_0.5 branch from a524664 to b8e343d Compare February 12, 2018 03:30
@imarios
Copy link
Contributor Author

imarios commented Feb 12, 2018

Thanks for all the help guys! @OlivierBlanvillain @frosforever
Olivier, can you take one last look at the edits you suggested? Feel free to squash and merge if you feel everything is good. Thanks!

... don't forget the squash, I have a ton of useless commits here :)

@OlivierBlanvillain
Copy link
Contributor

LGTM 👍

@imarios imarios merged commit 968492a into typelevel:master Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants