-
-
Notifications
You must be signed in to change notification settings - Fork 138
Open
Labels
Description
Here is an exhaustive status of the API implemented by frameless.TypeDataset compared to Spark's Dataset. We are getting pretty close to 100% API coverage 😄
Won't fix:
- Dataset alias(String alias) inherently unsafe
- Dataset withColumnRenamed(String existingName, String newName) inherently unsafe
- void createGlobalTempView(String viewName) inherently unsafe
- void createOrReplaceTempView(String viewName) inherently unsafe
- void createTempView(String viewName) inherently unsafe
- void registerTempTable(String tableName) inherently unsafe
- Dataset where(String conditionExpr) use select instead
TODO:
- KeyValueGroupedDataset<K,T> groupByKey(scala.Function1<T,K> func, Encoder evidence3)
- DataFrameNaFunctions na()
- DataFrameStatFunctions stat()
- Dataset dropDuplicates(String col1, String... cols)
- Dataset describe(String... cols)
- DataStreamWriter writeStream() (see Type Spark’s Structured Streaming #232)
- Dataset withWatermark(String eventTime, String delayThreshold) (see Type Spark’s Structured Streaming #232)
- RelationalGroupedDataset cube(Column... cols) (WIP Add missing Dataset.cube and rollup methods #246)
- RelationalGroupedDataset rollup(String col1, String... cols) (WIP Add missing Dataset.cube and rollup methods #246)
Done:
- Dataset sort(String sortCol, String... sortCols) (Window dense rank #248)
- Dataset sortWithinPartitions(String sortCol, String... sortCols) (Window dense rank #248)
- Dataset repartition(int numPartitions, Column... partitionExprs)
- Dataset drop(String... colNames) (I#163 dataset drop #209)
- Dataset join(Dataset<?> right, Column joinExprs, String joinType)
- Dataset<scala.Tuple2<T,U>> joinWith(Dataset other, Column condition, String joinType)
- Dataset crossJoin(Dataset<?> right)
- Dataset agg(Column expr, Column... exprs)
- Column apply(String colName)
- Dataset as(Encoder evidence2)
- Dataset cache()
- Dataset coalesce(int numPartitions)
- Column col(String colName)
- Object collect()
- long count()
- Dataset distinct()
- Dataset except(Dataset other)
- void explain(boolean extended)
- <A,B> Dataset explode(String inputColumn, String outputColumn, scala.Function1<A,TraversableOnce<B f)
- Dataset filter(Column condition)
- Dataset filter(scala.Function1<T,Object> func)
- T first() (as firstOption)
- Dataset flatMap(scala.Function1<T,TraversableOnce> func, Encoder evidence8)
- void foreach(ForeachFunction func)
- void foreachPartition(scala.Function1<Iterator,scala.runtime.BoxedUnit> f)
- RelationalGroupedDataset groupBy(String col1, String... cols)
- Dataset intersect(Dataset other)
- Dataset limit(int n)
- Dataset map(scala.Function1<T,U> func, Encoder evidence6)
- Dataset mapPartitions(MapPartitionsFunction<T,U> f, Encoder encoder)
- Dataset persist(StorageLevel newLevel)
- void printSchema()
- RDD rdd()
- T reduce(scala.Function2<T,T,T> func) (as reduceOption)
- Dataset repartition(int numPartitions)
- Dataset sample(boolean withReplacement, double fraction, long seed)
- Dataset select(String col, String... cols)
- void show(int numRows, boolean truncate)
- Object take(int n)
- Dataset toDF()
- String toString()
- Dataset transform(scala.Function1<Dataset,Dataset> t)
- Dataset union(Dataset other)
- Dataset unpersist(boolean blocking)
- Dataset withColumn(String colName, Column col)
- Dataset orderBy(String sortCol, String... sortCols)
- String[] columns()
- org.apache.spark.sql.execution.QueryExecution queryExecution()
- StructType schema()
- SparkSession sparkSession()
- SQLContext sqlContext()
- Dataset checkpoint(boolean eager)
- String[] inputFiles()
- boolean isLocal()
- boolean isStreaming()
- Dataset[] randomSplit(double[] weights, long seed)
- StorageLevel storageLevel()
- Dataset toJSON()
- java.util.Iterator toLocalIterator()
- DataFrameWriter write()
tscholak, LukaJCB and andrushaimarios, LukaJCB, andrusha, prestonph and chenharryhua