apache · AngersZhuuuu · Jan 4, 2021 · Apr 1, 2021 · Apr 1, 2021 · Apr 1, 2021
diff --git a/docs/_data/menu-sql.yaml b/docs/_data/menu-sql.yaml
@@ -188,6 +188,8 @@
                   url: sql-ref-syntax-qry-select-lateral-view.html
                 - text: PIVOT Clause
                   url: sql-ref-syntax-qry-select-pivot.html
+                - text: TRANSFORM Clause
+                  url: sql-ref-syntax-qry-select-transform.html
             - text: EXPLAIN
               url: sql-ref-syntax-qry-explain.html
         - text: Auxiliary Statements

diff --git a/docs/sql-ref-syntax-qry-select-transform.md b/docs/sql-ref-syntax-qry-select-transform.md
@@ -0,0 +1,235 @@
+---
+layout: global
+title: TRANSFORM
+displayTitle: TRANSFORM
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+### Description
+
+The `TRANSFORM` clause is used to specify a Hive-style transform query specification 
+to transform the inputs by running a user-specified command or script.
+
+### Syntax
+
+```sql
+SELECT TRANSFORM ( expression [ , ... ] )
+    [ ROW FORMAT row_format ]
+    [ RECORDWRITER record_writer_class ]
+    USING command_or_script [ AS ( [ col_name [ col_type ] ] [ , ... ] ) ]
+    [ ROW FORMAT row_format ]
+    [ RECORDREADER record_reader_class ]
+
+row_format:    
+    SERDE serde_class [ WITH SERDEPROPERTIES (k1=v1, k2=v2, ... ) ]
+    | DELIMITED [ FIELDS TERMINATED BY fields_terminated_char [ ESCAPED BY escaped_char ] ] 
+        [ COLLECTION ITEMS TERMINATED BY collection_items_terminated_char ] 
+        [ MAP KEYS TERMINATED BY map_key_terminated_char ]
+        [ LINES TERMINATED BY row_terminated_char ]
+        [ NULL DEFINED AS null_char ]
+```
+
+### Parameters
+
+* **expression**
+
+    Specifies a combination of one or more values, operators and SQL functions that results in a value.
+
+* **row_format**    
+
+    Otherwise, uses the `DELIMITED` clause to specify the native SerDe and state the delimiter, escape character, null character and so on.
+
+* **SERDE**
+
+    Specifies a custom SerDe for one table.
+
+* **serde_class**
+
+    Specifies a fully-qualified class name of a custom SerDe.
+
+* **DELIMITED**
+
+    The `DELIMITED` clause can be used to specify the native SerDe and state the delimiter, escape character, null character and so on.
+
+* **FIELDS TERMINATED BY**
+
+    Used to define a column separator.
+
+* **COLLECTION ITEMS TERMINATED BY**
+
+    Used to define a collection item separator.
+
+* **MAP KEYS TERMINATED BY**
+
+    Used to define a map key separator.
+
+* **LINES TERMINATED BY**
+
+    Used to define a row separator.
+
+* **NULL DEFINED AS**
+
+    Used to define the specific value for NULL.
+
+* **ESCAPED BY**
+
+    Used for escape mechanism.
+
+* **RECORDWRITER**
+
+    Specifies a fully-qualified class name of a custom RecordWriter. The default value is `org.apache.hadoop.hive.ql.exec.TextRecordWriter`.
+
+* **RECORDREADER**
+
+    Specifies a fully-qualified class name of a custom RecordReader. The default value is `org.apache.hadoop.hive.ql.exec.TextRecordReader`.
+
+* **command_or_script**
+
+    Specifies a command or a path to script to process data.
+
+### SerDe behavior
+
+Spark uses the Hive SerDe `org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe` by default, so columns will be casted
+to `STRING` and combined by tabs before feeding to the user script. All `NULL` values will be converted
+to the literal string `"\N"` in order to differentiate `NULL` values from empty strings. The standard output of the
+user script will be treated as tab-separated `STRING` columns, any cell containing only `"\N"` will be re-interpreted
+as a `NULL` value, and then the resulting STRING column will be cast to the data type specified in `col_type`. If the actual
+number of output columns is less than the number of specified output columns, insufficient output columns will be
+supplemented with `NULL`. If the actual number of output columns is more than the number of specified output columns,
+the output columns will only select the corresponding columns and the remaining part will be discarded.
+If there is no `AS` clause after `USING my_script`, an output schema will be `key: STRING, value: STRING`.
+The `key` column contains all the characters before the first tab and the `value` column contains the remaining characters after the first tab.
+If there is no enough tab, Spark will return `NULL` value. These defaults can be overridden with `ROW FORMAT SERDE` or `ROW FORMAT DELIMITED`. 
+
+### Examples
+
+```sql
+CREATE TABLE person (zip_code INT, name STRING, age INT);
+INSERT INTO person VALUES
+    (94588, 'Zen Hui', 50),
+    (94588, 'Dan Li', 18),
+    (94588, 'Anil K', 27),
+    (94588, 'John V', NULL),
+    (94511, 'David K', 42),
+    (94511, 'Aryan B.', 18),
+    (94511, 'Lalit B.', NULL);
+
+-- With specified output without data type
+SELECT TRANSFORM(zip_code, name, age)
+   USING 'cat' AS (a, b, c)
+FROM person
+WHERE zip_code > 94511;
++-------+---------+-----+
+|    a  |        b|    c|
++-------+---------+-----+
+|  94588|   Anil K|   27|
+|  94588|   John V| NULL|
+|  94588|  Zen Hui|   50|
+|  94588|   Dan Li|   18|
++-------+---------+-----+
+
+-- With specified output with data type
+SELECT TRANSFORM(zip_code, name, age)
+   USING 'cat' AS (a STRING, b STRING, c STRING)
+FROM person
+WHERE zip_code > 94511;
++-------+---------+-----+
+|    a  |        b|    c|
++-------+---------+-----+
+|  94588|   Anil K|   27|
+|  94588|   John V| NULL|
+|  94588|  Zen Hui|   50|
+|  94588|   Dan Li|   18|
++-------+---------+-----+
+
+-- Using ROW FORMAT DELIMITED
+SELECT TRANSFORM(name, age)
+    ROW FORMAT DELIMITED
+    FIELDS TERMINATED BY ','
+    LINES TERMINATED BY '\n'
+    NULL DEFINED AS 'NULL'
+    USING 'cat' AS (name_age string)
+    ROW FORMAT DELIMITED
+    FIELDS TERMINATED BY '@'
+    LINES TERMINATED BY '\n'
+    NULL DEFINED AS 'NULL'
+FROM person;
++---------------+
+|       name_age|
++---------------+
+|      Anil K,27|
+|    John V,null|
+|     ryan B.,18|
+|     David K,42|
+|     Zen Hui,50|
+|      Dan Li,18|
+|  Lalit B.,null|
++---------------+
+
+-- Using Hive Serde
+SELECT TRANSFORM(zip_code, name, age)
+    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
+    WITH SERDEPROPERTIES (
+      'field.delim' = '\t'
+    )
+    USING 'cat' AS (a STRING, b STRING, c STRING)
+    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
+    WITH SERDEPROPERTIES (
+      'field.delim' = '\t'
+    )
+FROM person
+WHERE zip_code > 94511;
++-------+---------+-----+
+|    a  |        b|    c|
++-------+---------+-----+
+|  94588|   Anil K|   27|
+|  94588|   John V| NULL|
+|  94588|  Zen Hui|   50|
+|  94588|   Dan Li|   18|
++-------+---------+-----+
+
+-- Schema-less mode
+SELECT TRANSFORM(zip_code, name, age)
+    USING 'cat'
+FROM person
+WHERE zip_code > 94500;
++-------+---------------------+
+|    key|                value|
++-------+---------------------+
+|  94588|	  Anil K    27|
+|  94588|	  John V    \N|
+|  94511|	Aryan B.    18|
+|  94511|	 David K    42|
+|  94588|	 Zen Hui    50|
+|  94588|	  Dan Li    18|
+|  94511|	Lalit B.    \N|
++-------+---------------------+
+```
+
+### Related Statements
+
+* [SELECT Main](sql-ref-syntax-qry-select.html)
+* [WHERE Clause](sql-ref-syntax-qry-select-where.html)
+* [GROUP BY Clause](sql-ref-syntax-qry-select-groupby.html)
+* [HAVING Clause](sql-ref-syntax-qry-select-having.html)
+* [ORDER BY Clause](sql-ref-syntax-qry-select-orderby.html)
+* [SORT BY Clause](sql-ref-syntax-qry-select-sortby.html)
+* [DISTRIBUTE BY Clause](sql-ref-syntax-qry-select-distribute-by.html)
+* [LIMIT Clause](sql-ref-syntax-qry-select-limit.html)
+* [CASE Clause](sql-ref-syntax-qry-select-case.html)
+* [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
+* [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
diff --git a/docs/sql-ref-syntax-qry-select.md b/docs/sql-ref-syntax-qry-select.md
@@ -41,7 +41,7 @@ select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_stat
 
 While `select_statement` is defined as
 ```sql
-SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ named_expression | regex_column_names ] [ , ... ] }
+SELECT [ hints , ... ] [ ALL | DISTINCT ] { [[ named_expression | regex_column_names ] [ , ... ] | TRANSFORM (...)) ] }
     FROM { from_item [ , ... ] }
     [ PIVOT clause ]
     [ LATERAL VIEW clause ] [ ... ] 
@@ -164,6 +164,10 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ named_expression | regex_column_na
      )
      ```
 
+* **TRANSFORM**
+
+     Specifies a hive-style transform query specification to transform the input by forking and running user-specified command or script.
+
 ### Related Statements
 
 * [WHERE Clause](sql-ref-syntax-qry-select-where.html)
@@ -187,3 +191,4 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { [ named_expression | regex_column_na
 * [CASE Clause](sql-ref-syntax-qry-select-case.html)
 * [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
 * [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
+* [TRANSFORM Clause](sql-ref-syntax-qry-select-transform.html)
diff --git a/docs/sql-ref-syntax-qry.md b/docs/sql-ref-syntax-qry.md
@@ -49,4 +49,5 @@ ability to generate logical and physical plan for a given query using
   * [CASE Clause](sql-ref-syntax-qry-select-case.html)
   * [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
   * [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
+  * [TRANSFORM Clause](sql-ref-syntax-qry-select-transform.html)
 * [EXPLAIN Statement](sql-ref-syntax-qry-explain.html)
diff --git a/docs/sql-ref-syntax.md b/docs/sql-ref-syntax.md
@@ -70,6 +70,7 @@ Spark SQL is Apache Spark's module for working with structured data. The SQL Syn
    * [CASE Clause](sql-ref-syntax-qry-select-case.html)
    * [PIVOT Clause](sql-ref-syntax-qry-select-pivot.html)
    * [LATERAL VIEW Clause](sql-ref-syntax-qry-select-lateral-view.html)
+   * [TRANSFORM Clause](sql-ref-syntax-qry-select-transform.html)
  * [EXPLAIN](sql-ref-syntax-qry-explain.html)
 
 ### Auxiliary Statements