Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions sql/core/src/main/scala/org/apache/spark/sql/Column.scala
Original file line number Diff line number Diff line change
Expand Up @@ -199,13 +199,13 @@ class Column(val expr: Expression) extends Logging {
/**
* Extracts a value or values from a complex type.
* The following types of extraction are supported:
*
* - Given an Array, an integer ordinal can be used to retrieve a single value.
* - Given a Map, a key of the correct type can be used to retrieve an individual value.
* - Given a Struct, a string fieldName can be used to extract that field.
* - Given an Array of Structs, a string fieldName can be used to extract filed
* of every struct in that array, and return an Array of fields
*
* <ul>
* <li>Given an Array, an integer ordinal can be used to retrieve a single value.</li>
* <li>Given a Map, a key of the correct type can be used to retrieve an individual value.</li>
* <li>Given a Struct, a string fieldName can be used to extract that field.</li>
* <li>Given an Array of Structs, a string fieldName can be used to extract filed
* of every struct in that array, and return an Array of fields.</li>
* </ul>
* @group expr_ops
* @since 1.4.0
*/
Expand Down
27 changes: 16 additions & 11 deletions sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,12 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {

/**
* Specifies the behavior when data or table already exists. Options include:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks wrongly formatted btw. Looks ul should be below of this line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crap yeah, will move it down

* - `SaveMode.Overwrite`: overwrite the existing data.
* - `SaveMode.Append`: append the data.
* - `SaveMode.Ignore`: ignore the operation (i.e. no-op).
* - `SaveMode.ErrorIfExists`: default option, throw an exception at runtime.
* <ul>
* <li>`SaveMode.Overwrite`: overwrite the existing data.</li>
* <li>`SaveMode.Append`: append the data.</li>
* <li>`SaveMode.Ignore`: ignore the operation (i.e. no-op).</li>
* <li>`SaveMode.ErrorIfExists`: default option, throw an exception at runtime.</li>
* </ul>
*
* @since 1.4.0
*/
Expand All @@ -61,10 +63,12 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {

/**
* Specifies the behavior when data or table already exists. Options include:
* - `overwrite`: overwrite the existing data.
* - `append`: append the data.
* - `ignore`: ignore the operation (i.e. no-op).
* - `error` or `errorifexists`: default option, throw an exception at runtime.
* <ul>
* <li>`overwrite`: overwrite the existing data.</li>
* <li>`append`: append the data.</li>
* <li>`ignore`: ignore the operation (i.e. no-op).</li>
* <li>`error` or `errorifexists`: default option, throw an exception at runtime.</li>
* </ul>
*
* @since 1.4.0
*/
Expand Down Expand Up @@ -163,9 +167,10 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
* Partitions the output by the given columns on the file system. If specified, the output is
* laid out on the file system similar to Hive's partitioning scheme. As an example, when we
* partition a dataset by year and then month, the directory layout would look like:
*
* - year=2016/month=01/
* - year=2016/month=02/
* <ul>
* <li>year=2016/month=01/</li>
* <li>year=2016/month=02/</li>
* </ul>
*
* Partitioning is one of the most widely used techniques to optimize physical data layout.
* It provides a coarse-grained index for skipping unnecessary data reads when queries have
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,8 +130,11 @@ abstract class ForeachWriter[T] extends Serializable {
* Called when stopping to process one partition of new data in the executor side. This is
* guaranteed to be called either `open` returns `true` or `false`. However,
* `close` won't be called in the following cases:
* - JVM crashes without throwing a `Throwable`
* - `open` throws a `Throwable`.
*
* <ul>
* <li>JVM crashes without throwing a `Throwable`</li>
* <li>`open` throws a `Throwable`.</li>
* </ul>
*
* @param errorOrNull the error thrown during processing data or null if there was no error.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,15 @@ import org.apache.spark.sql.catalyst.rules.Rule
* regarding binary compatibility and source compatibility of methods here.
*
* This current provides the following extension points:
* - Analyzer Rules.
* - Check Analysis Rules
* - Optimizer Rules.
* - Planning Strategies.
* - Customized Parser.
* - (External) Catalog listeners.
*
* <ul>
* <li>Analyzer Rules.</li>
* <li>Check Analysis Rules.</li>
* <li>Optimizer Rules.</li>
* <li>Planning Strategies.</li>
* <li>Customized Parser.</li>
* <li>(External) Catalog listeners.</li>
* </ul>
*
* The extensions can be used by calling withExtension on the [[SparkSession.Builder]], for
* example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,16 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

/**
* Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
* - `OutputMode.Append()`: only the new rows in the streaming DataFrame/Dataset will be
* written to the sink
* - `OutputMode.Complete()`: all the rows in the streaming DataFrame/Dataset will be written
* to the sink every time these is some updates
* - `OutputMode.Update()`: only the rows that were updated in the streaming DataFrame/Dataset
* will be written to the sink every time there are some updates. If
* the query doesn't contain aggregations, it will be equivalent to
* `OutputMode.Append()` mode.
* <ul>
* <li> `OutputMode.Append()`: only the new rows in the streaming DataFrame/Dataset will be
* written to the sink.</li>
* <li> `OutputMode.Complete()`: all the rows in the streaming DataFrame/Dataset will be written
* to the sink every time there are some updates.</li>
* <li> `OutputMode.Update()`: only the rows that were updated in the streaming
* DataFrame/Dataset will be written to the sink every time there are some updates.
* If the query doesn't contain aggregations, it will be equivalent to
* `OutputMode.Append()` mode.</li>
* </ul>
*
* @since 2.0.0
*/
Expand All @@ -64,13 +66,16 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

/**
* Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
* - `append`: only the new rows in the streaming DataFrame/Dataset will be written to
* the sink
* - `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink
* every time these is some updates
* - `update`: only the rows that were updated in the streaming DataFrame/Dataset will
* be written to the sink every time there are some updates. If the query doesn't
* contain aggregations, it will be equivalent to `append` mode.
* <ul>
* <li> `append`: only the new rows in the streaming DataFrame/Dataset will be written to
* the sink.</li>
* <li> `complete`: all the rows in the streaming DataFrame/Dataset will be written to the sink
* every time there are some updates.</li>
* <li> `update`: only the rows that were updated in the streaming DataFrame/Dataset will
* be written to the sink every time there are some updates. If the query doesn't
* contain aggregations, it will be equivalent to `append` mode.</li>
* </ul>
*
* @since 2.0.0
*/
def outputMode(outputMode: String): DataStreamWriter[T] = {
Expand Down Expand Up @@ -131,8 +136,10 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {
* laid out on the file system similar to Hive's partitioning scheme. As an example, when we
* partition a dataset by year and then month, the directory layout would look like:
*
* - year=2016/month=01/
* - year=2016/month=02/
* <ul>
* <li> year=2016/month=01/</li>
* <li> year=2016/month=02/</li>
* </ul>
*
* Partitioning is one of the most widely used techniques to optimize physical data layout.
* It provides a coarse-grained index for skipping unnecessary data reads when queries have
Expand Down