Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/sql-ref-syntax-dml-insert-overwrite-directory.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@ license: |
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---
### Description
The `INSERT OVERWRITE DIRECTORY` statement overwrites the existing data in the directory with the new values using Spark native format. The inserted rows can be specified by value expressions or result from a query.
The `INSERT OVERWRITE DIRECTORY` statement overwrites the existing data in the directory with the new values using a given Spark file format. The inserted rows can be specified by value expressions or result from a query.

### Syntax
{% highlight sql %}
Expand All @@ -39,13 +39,13 @@ INSERT OVERWRITE [ LOCAL ] DIRECTORY [ directory_path ]
<dl>
<dt><code><em>file_format</em></code></dt>
<dd>
Specifies the file format to use for the insert. Valid options are <code>TEXT</code>, <code>CSV</code>, <code>JSON</code>, <code>JDBC</code>, <code>PARQUET</code>, <code>ORC</code>, <code>HIVE</code>, <code>DELTA</code>, <code>LIBSVM</code>, or a fully qualified class name of a custom implementation of <code>org.apache.spark.sql.sources.DataSourceRegister</code>.
Specifies the file format to use for the insert. Valid options are <code>TEXT</code>, <code>CSV</code>, <code>JSON</code>, <code>JDBC</code>, <code>PARQUET</code>, <code>ORC</code>, <code>HIVE</code>, <code>LIBSVM</code>, or a fully qualified class name of a custom implementation of <code>org.apache.spark.sql.execution.datasources.FileFormat</code>.
</dd>
</dl>

<dl>
<dt><code><em>OPTIONS ( key = val [ , ... ] )</em></code></dt>
<dd>Specifies one or more table property key and value pairs.</dd>
<dd>Specifies one or more options for the writing of the file format.</dd>
</dl>

<dl>
Expand Down
6 changes: 3 additions & 3 deletions docs/sql-ref-syntax-dml-insert-overwrite-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ license: |
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Expand Down Expand Up @@ -178,7 +178,7 @@ INSERT OVERWRITE [ TABLE ] table_identifier [ partition_spec [ IF NOT EXISTS ] ]
| Jason Wang | 908 Bird St, Saratoga | 121212 | true |
+ -------------- + ------------------------------ + -------------- + -------------- +

INSERT OVERWRITE students;
INSERT OVERWRITE students
FROM applicants SELECT name, address, id applicants WHERE qualified = true;

SELECT * FROM students;
Expand Down
10 changes: 5 additions & 5 deletions docs/sql-ref-syntax-dml-load.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ license: |
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
Expand All @@ -20,7 +20,7 @@ license: |
---

### Description
`LOAD DATA` statement loads the data into a table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the `LOAD DATA` statement takes an optional partition specification. When a partition is specified, the data files (when input source is a directory) or the single file (when input source is a file) are loaded into the partition of the target table.
`LOAD DATA` statement loads the data into a Hive serde table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the `LOAD DATA` statement takes an optional partition specification. When a partition is specified, the data files (when input source is a directory) or the single file (when input source is a file) are loaded into the partition of the target table.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

``LOAD DATA` only works for hive table

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!


### Syntax
{% highlight sql %}
Expand Down Expand Up @@ -78,7 +78,7 @@ LOAD DATA [ LOCAL ] INPATH path [ OVERWRITE ] INTO TABLE table_identifier [ part
| Amy Smith | 123 Park Ave, San Jose | 111111 |
+ -------------- + ------------------------------ + -------------- +

CREATE TABLE test_load (name VARCHAR(64), address VARCHAR(64), student_id INT);
CREATE TABLE test_load (name VARCHAR(64), address VARCHAR(64), student_id INT) USING HIVE;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be a hive table

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @cloud-fan . This reminds me the following.

SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

@marmbrus and @gatorsmile . Do we need to revert SPARK-30098 due to its silent behavior change?

Also, cc @rxin since he is the release manager for 3.0.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed SPARK-31136 to track the discussion and the final result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost of break is mostly users can't run LOAD TABLE?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't get me wrong. You know that I loved that patch and tried to minimize the impact while embracing it. However, the new policy is designed to ban this kind of behavior change (SPARK-30098). Technically,

  1. The benefit is just saving a few word USING PARQUET or something.
  2. The downside is breaking the existing user pipelines.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the JIRA description (SPARK-31136) with @cloud-fan 's example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit is just saving a few word USING PARQUET or something.

This is a completely underestimate. Please see my comment in #27894 (comment)


-- Assuming the students table is in '/user/hive/warehouse/'
LOAD DATA LOCAL INPATH '/user/hive/warehouse/students' OVERWRITE INTO TABLE test_load;
Expand All @@ -92,7 +92,7 @@ LOAD DATA [ LOCAL ] INPATH path [ OVERWRITE ] INTO TABLE table_identifier [ part
+ -------------- + ------------------------------ + -------------- +

-- Example with partition specification.
CREATE TABLE test_partition (c1 INT, c2 INT, c3 INT) USING HIVE PARTITIONED BY (c2, c3);
CREATE TABLE test_partition (c1 INT, c2 INT, c3 INT) PARTITIONED BY (c2, c3);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used as the source so doesn't need to be a hive table.


INSERT INTO test_partition PARTITION (c2 = 2, c3 = 3) VALUES (1);

Expand Down