Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 28 additions & 29 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,14 @@ Throughout this document, we will often refer to Scala/Java Datasets of `Row`s a

The entry point into all functionality in Spark is the [`SparkSession`](api/scala/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`:

{% include_example init_session scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
{% include_example init_session scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
</div>

<div data-lang="java" markdown="1">

The entry point into all functionality in Spark is the [`SparkSession`](api/java/index.html#org.apache.spark.sql.SparkSession) class. To create a basic `SparkSession`, just use `SparkSession.builder()`:

{% include_example init_session java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
{% include_example init_session java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -105,7 +105,7 @@ from a Hive table, or from [Spark data sources](#data-sources).

As an example, the following creates a DataFrame based on the content of a JSON file:

{% include_example create_df scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
{% include_example create_df scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
</div>

<div data-lang="java" markdown="1">
Expand All @@ -114,7 +114,7 @@ from a Hive table, or from [Spark data sources](#data-sources).

As an example, the following creates a DataFrame based on the content of a JSON file:

{% include_example create_df java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
{% include_example create_df java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -155,7 +155,7 @@ Here we include some basic examples of structured data processing using Datasets

<div class="codetabs">
<div data-lang="scala" markdown="1">
{% include_example untyped_ops scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
{% include_example untyped_ops scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}

For a complete list of the types of operations that can be performed on a Dataset refer to the [API Documentation](api/scala/index.html#org.apache.spark.sql.Dataset).

Expand All @@ -164,7 +164,7 @@ In addition to simple column references and expressions, Datasets also have a ri

<div data-lang="java" markdown="1">

{% include_example untyped_ops java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
{% include_example untyped_ops java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}

For a complete list of the types of operations that can be performed on a Dataset refer to the [API Documentation](api/java/org/apache/spark/sql/Dataset.html).

Expand Down Expand Up @@ -249,13 +249,13 @@ In addition to simple column references and expressions, DataFrames also have a
<div data-lang="scala" markdown="1">
The `sql` function on a `SparkSession` enables applications to run SQL queries programmatically and returns the result as a `DataFrame`.

{% include_example run_sql scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
{% include_example run_sql scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
</div>

<div data-lang="java" markdown="1">
The `sql` function on a `SparkSession` enables applications to run SQL queries programmatically and returns the result as a `Dataset<Row>`.

{% include_example run_sql java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
{% include_example run_sql java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -287,11 +287,11 @@ the bytes back into an object.

<div class="codetabs">
<div data-lang="scala" markdown="1">
{% include_example create_ds scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
{% include_example create_ds scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
</div>

<div data-lang="java" markdown="1">
{% include_example create_ds java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
{% include_example create_ds java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
</div>
</div>

Expand All @@ -318,7 +318,7 @@ reflection and become the names of the columns. Case classes can also be nested
types such as `Seq`s or `Array`s. This RDD can be implicitly converted to a DataFrame and then be
registered as a table. Tables can be used in subsequent SQL statements.

{% include_example schema_inferring scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
{% include_example schema_inferring scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
</div>

<div data-lang="java" markdown="1">
Expand All @@ -330,7 +330,7 @@ does not support JavaBeans that contain `Map` field(s). Nested JavaBeans and `Li
fields are supported though. You can create a JavaBean by creating a class that implements
Serializable and has getters and setters for all of its fields.

{% include_example schema_inferring java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
{% include_example schema_inferring java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -385,7 +385,7 @@ by `SparkSession`.

For example:

{% include_example programmatic_schema scala/org/apache/spark/examples/sql/SparkSqlExample.scala %}
{% include_example programmatic_schema scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
</div>

<div data-lang="java" markdown="1">
Expand All @@ -403,7 +403,7 @@ by `SparkSession`.

For example:

{% include_example programmatic_schema java/org/apache/spark/examples/sql/JavaSparkSqlExample.java %}
{% include_example programmatic_schema java/org/apache/spark/examples/sql/JavaSparkSQLExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -472,11 +472,11 @@ In the simplest form, the default data source (`parquet` unless otherwise config

<div class="codetabs">
<div data-lang="scala" markdown="1">
{% include_example generic_load_save_functions scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala %}
{% include_example generic_load_save_functions scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>

<div data-lang="java" markdown="1">
{% include_example generic_load_save_functions java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java %}
{% include_example generic_load_save_functions java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -507,11 +507,11 @@ using this syntax.

<div class="codetabs">
<div data-lang="scala" markdown="1">
{% include_example manual_load_options scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala %}
{% include_example manual_load_options scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>

<div data-lang="java" markdown="1">
{% include_example manual_load_options java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java %}
{% include_example manual_load_options java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand All @@ -538,11 +538,11 @@ file directly with SQL.

<div class="codetabs">
<div data-lang="scala" markdown="1">
{% include_example direct_sql scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala %}
{% include_example direct_sql scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>

<div data-lang="java" markdown="1">
{% include_example direct_sql java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java %}
{% include_example direct_sql java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -633,11 +633,11 @@ Using the data from the above example:
<div class="codetabs">

<div data-lang="scala" markdown="1">
{% include_example basic_parquet_example scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala %}
{% include_example basic_parquet_example scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>

<div data-lang="java" markdown="1">
{% include_example basic_parquet_example java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java %}
{% include_example basic_parquet_example java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -766,11 +766,11 @@ turned it off by default starting from 1.5.0. You may enable it by
<div class="codetabs">

<div data-lang="scala" markdown="1">
{% include_example schema_merging scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala %}
{% include_example schema_merging scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>

<div data-lang="java" markdown="1">
{% include_example schema_merging java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java %}
{% include_example schema_merging java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -973,7 +973,7 @@ Note that the file that is offered as _a json file_ is not a typical JSON file.
line must contain a separate, self-contained valid JSON object. As a consequence,
a regular multi-line JSON file will most often fail.

{% include_example json_dataset scala/org/apache/spark/examples/sql/SqlDataSourceExample.scala %}
{% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>

<div data-lang="java" markdown="1">
Expand All @@ -985,7 +985,7 @@ Note that the file that is offered as _a json file_ is not a typical JSON file.
line must contain a separate, self-contained valid JSON object. As a consequence,
a regular multi-line JSON file will most often fail.

{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSqlDataSourceExample.java %}
{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>

<div data-lang="python" markdown="1">
Expand Down Expand Up @@ -1879,9 +1879,8 @@ Spark SQL and DataFrames support the following data types:

All data types of Spark SQL are located in the package `org.apache.spark.sql.types`.
You can access them by doing
{% highlight scala %}
import org.apache.spark.sql.types._
{% endhighlight %}

{% include_example data_types scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}

<table class="table">
<tr>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
// $example off:basic_parquet_example$
import org.apache.spark.sql.SparkSession;

public class JavaSqlDataSourceExample {
public class JavaSQLDataSourceExample {

// $example on:schema_merging$
public static class Square implements Serializable {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
import static org.apache.spark.sql.functions.col;
// $example off:untyped_ops$

public class JavaSparkSqlExample {
public class JavaSparkSQLExample {
// $example on:create_ds$
public static class Person implements Serializable {
private String name;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ package org.apache.spark.examples.sql

import org.apache.spark.sql.SparkSession

object SqlDataSourceExample {
object SQLDataSourceExample {

case class Person(name: String, age: Long)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
// $example off:init_session$
// $example on:programmatic_schema$
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StructType
// $example on:data_types$
import org.apache.spark.sql.types._
// $example off:data_types$
// $example off:programmatic_schema$

object SparkSqlExample {
object SparkSQLExample {

// $example on:create_ds$
// Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,
Expand Down