Skip to content

Missing Dataset methods #163

@OlivierBlanvillain

Description

@OlivierBlanvillain

Here is an exhaustive status of the API implemented by frameless.TypeDataset compared to Spark's Dataset. We are getting pretty close to 100% API coverage 😄

Won't fix:

  • Dataset alias(String alias) inherently unsafe
  • Dataset withColumnRenamed(String existingName, String newName) inherently unsafe
  • void createGlobalTempView(String viewName) inherently unsafe
  • void createOrReplaceTempView(String viewName) inherently unsafe
  • void createTempView(String viewName) inherently unsafe
  • void registerTempTable(String tableName) inherently unsafe
  • Dataset where(String conditionExpr) use select instead

TODO:

Done:

  • Dataset sort(String sortCol, String... sortCols) (Window dense rank #248)
  • Dataset sortWithinPartitions(String sortCol, String... sortCols) (Window dense rank #248)
  • Dataset repartition(int numPartitions, Column... partitionExprs)
  • Dataset drop(String... colNames) (I#163 dataset drop #209)
  • Dataset join(Dataset<?> right, Column joinExprs, String joinType)
  • Dataset<scala.Tuple2<T,U>> joinWith(Dataset other, Column condition, String joinType)
  • Dataset crossJoin(Dataset<?> right)
  • Dataset agg(Column expr, Column... exprs)
  • Column apply(String colName)
  • Dataset as(Encoder evidence2)
  • Dataset cache()
  • Dataset coalesce(int numPartitions)
  • Column col(String colName)
  • Object collect()
  • long count()
  • Dataset distinct()
  • Dataset except(Dataset other)
  • void explain(boolean extended)
  • <A,B> Dataset explode(String inputColumn, String outputColumn, scala.Function1<A,TraversableOnce<B f)
  • Dataset filter(Column condition)
  • Dataset filter(scala.Function1<T,Object> func)
  • T first() (as firstOption)
  • Dataset flatMap(scala.Function1<T,TraversableOnce> func, Encoder evidence8)
  • void foreach(ForeachFunction func)
  • void foreachPartition(scala.Function1<Iterator,scala.runtime.BoxedUnit> f)
  • RelationalGroupedDataset groupBy(String col1, String... cols)
  • Dataset intersect(Dataset other)
  • Dataset limit(int n)
  • Dataset map(scala.Function1<T,U> func, Encoder evidence6)
  • Dataset mapPartitions(MapPartitionsFunction<T,U> f, Encoder encoder)
  • Dataset persist(StorageLevel newLevel)
  • void printSchema()
  • RDD rdd()
  • T reduce(scala.Function2<T,T,T> func) (as reduceOption)
  • Dataset repartition(int numPartitions)
  • Dataset sample(boolean withReplacement, double fraction, long seed)
  • Dataset select(String col, String... cols)
  • void show(int numRows, boolean truncate)
  • Object take(int n)
  • Dataset toDF()
  • String toString()
  • Dataset transform(scala.Function1<Dataset,Dataset> t)
  • Dataset union(Dataset other)
  • Dataset unpersist(boolean blocking)
  • Dataset withColumn(String colName, Column col)
  • Dataset orderBy(String sortCol, String... sortCols)
  • String[] columns()
  • org.apache.spark.sql.execution.QueryExecution queryExecution()
  • StructType schema()
  • SparkSession sparkSession()
  • SQLContext sqlContext()
  • Dataset checkpoint(boolean eager)
  • String[] inputFiles()
  • boolean isLocal()
  • boolean isStreaming()
  • Dataset[] randomSplit(double[] weights, long seed)
  • StorageLevel storageLevel()
  • Dataset toJSON()
  • java.util.Iterator toLocalIterator()
  • DataFrameWriter write()

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions