Missing Dataset methods

Here is an exhaustive status of the API implemented by `frameless.TypeDataset` compared to Spark's `Dataset`. We are getting pretty close to 100% API coverage :smile: 

Won't fix:

- [ ] Dataset<T> alias(String alias) *inherently unsafe*
- [ ] Dataset<Row> withColumnRenamed(String existingName, String newName) *inherently unsafe*
- [ ] void createGlobalTempView(String viewName) *inherently unsafe*
- [ ] void createOrReplaceTempView(String viewName) *inherently unsafe*
- [ ] void createTempView(String viewName) *inherently unsafe*
- [ ] void registerTempTable(String tableName) *inherently unsafe*
- [ ] Dataset<T> where(String conditionExpr) *use select instead*

TODO:

- [ ] <K> KeyValueGroupedDataset<K,T> groupByKey(scala.Function1<T,K> func, Encoder<K> evidence3)
- [ ] DataFrameNaFunctions na()
- [ ] DataFrameStatFunctions stat()
- [ ] Dataset<T> dropDuplicates(String col1, String... cols)
- [ ] Dataset<Row> describe(String... cols)
- [ ] DataStreamWriter<T> writeStream() (see #232)
- [ ] Dataset<T> withWatermark(String eventTime, String delayThreshold) (see #232)
- [x] RelationalGroupedDataset cube(Column... cols) (WIP #246)
- [x] RelationalGroupedDataset rollup(String col1, String... cols) (WIP #246)


Done:

- [x] Dataset<T> sort(String sortCol, String... sortCols) (#248)
- [x] Dataset<T> sortWithinPartitions(String sortCol, String... sortCols) (#248)
- [x] Dataset<T> repartition(int numPartitions, Column... partitionExprs)
- [x] Dataset<Row> drop(String... colNames) (#209)
- [x] Dataset<Row> join(Dataset<?> right, Column joinExprs, String joinType)
- [x] Dataset<scala.Tuple2<T,U>> joinWith(Dataset other, Column condition, String joinType)
- [x] Dataset<Row> crossJoin(Dataset<?> right)
- [x] Dataset<Row> agg(Column expr, Column... exprs)
- [x] Column apply(String colName)
- [x] Dataset as(Encoder evidence2)
- [x] Dataset<T> cache()
- [x] Dataset<T> coalesce(int numPartitions)
- [x] Column col(String colName)
- [x] Object collect()
- [x] long count()
- [x] Dataset<T> distinct()
- [x] Dataset<T> except(Dataset<T> other)
- [x] void explain(boolean extended)
- [x] <A,B> Dataset<Row> explode(String inputColumn, String outputColumn, scala.Function1<A,TraversableOnce filter(Column condition)
- [x] Dataset<T> filter(scala.Function1<T,Object> func)
- [x] T first() (as firstOption)
- [x] Dataset flatMap(scala.Function1<T,TraversableOnce> func, Encoder evidence8)
- [x] void foreach(ForeachFunction<T> func)
- [x] void foreachPartition(scala.Function1<Iterator<T>,scala.runtime.BoxedUnit> f)
- [x] RelationalGroupedDataset groupBy(String col1, String... cols)
- [x] Dataset<T> intersect(Dataset<T> other)
- [x] Dataset<T> limit(int n)
- [x] Dataset map(scala.Function1<T,U> func, Encoder evidence6)
- [x] Dataset mapPartitions(MapPartitionsFunction<T,U> f, Encoder encoder)
- [x] Dataset<T> persist(StorageLevel newLevel)
- [x] void printSchema()
- [x] RDD<T> rdd()
- [x] T reduce(scala.Function2<T,T,T> func) (as reduceOption)
- [x] Dataset<T> repartition(int numPartitions)
- [x] Dataset<T> sample(boolean withReplacement, double fraction, long seed)
- [x] Dataset<Row> select(String col, String... cols)
- [x] void show(int numRows, boolean truncate)
- [x] Object take(int n)
- [x] Dataset<Row> toDF()
- [x] String toString()
- [x] Dataset transform(scala.Function1<Dataset<T>,Dataset> t)
- [x] Dataset<T> union(Dataset<T> other)
- [x] Dataset<T> unpersist(boolean blocking)
- [x] Dataset<Row> withColumn(String colName, Column col)
- [x] Dataset<T> orderBy(String sortCol, String... sortCols)
- [x] String[] columns()
- [x] org.apache.spark.sql.execution.QueryExecution queryExecution()
- [x] StructType schema()
- [x] SparkSession sparkSession()
- [x] SQLContext sqlContext()
- [x] Dataset<T> checkpoint(boolean eager)
- [x] String[] inputFiles()
- [x] boolean isLocal()
- [x] boolean isStreaming()
- [x] Dataset<T>[] randomSplit(double[] weights, long seed)
- [x] StorageLevel storageLevel()
- [x] Dataset<String> toJSON()
- [x] java.util.Iterator<T> toLocalIterator()
- [x] DataFrameWriter<T> write()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Missing Dataset methods #163

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Missing Dataset methods #163

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions