diff --git a/docs/sql-data-sources-text.md b/docs/sql-data-sources-text.md
index c32395f8ebb1c..d72b543f54797 100644
--- a/docs/sql-data-sources-text.md
+++ b/docs/sql-data-sources-text.md
@@ -21,8 +21,6 @@ license: |
Spark SQL provides `spark.read().text("file_name")` to read a file or directory of text files into a Spark DataFrame, and `dataframe.write().text("path")` to write to a text file. When reading a text file, each line becomes each row that has string "value" column by default. The line separator can be changed as shown in the example below. The `option()` function can be used to customize the behavior of reading or writing, such as controlling behavior of the line separator, compression, and so on.
-
-
@@ -38,3 +36,36 @@ Spark SQL provides `spark.read().text("file_name")` to read a file or directory
+
+## Data Source Option
+
+Data source options of text can be set via:
+* the `.option`/`.options` methods of
+ * `DataFrameReader`
+ * `DataFrameWriter`
+ * `DataStreamReader`
+ * `DataStreamWriter`
+ * `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)
+
+
+ | Property Name | Default | Meaning | Scope |
+
+ wholetext |
+ false |
+ If true, read each file from input path(s) as a single row. |
+ read |
+
+
+ lineSep |
+ \r, \r\n, \n for reading, \n for writing |
+ Defines the line separator that should be used for reading or writing. |
+ read/write |
+
+
+ compression |
+ (none) |
+ Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate). |
+ write |
+
+
+Other generic options can be found in Generic File Source Options.
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index b9a975ffdcc51..7719d48f6ef7c 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -313,28 +313,13 @@ def text(self, paths, wholetext=False, lineSep=None, pathGlobFilter=None,
----------
paths : str or list
string, or list of strings, for input path(s).
- wholetext : str or bool, optional
- if true, read each file from input path(s) as a single row.
- lineSep : str, optional
- defines the line separator that should be used for parsing. If None is
- set, it covers all ``\\r``, ``\\r\\n`` and ``\\n``.
- pathGlobFilter : str or bool, optional
- an optional glob pattern to only include files with paths matching
- the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`.
- It does not change the behavior of
- `partition discovery `_. # noqa
- recursiveFileLookup : str or bool, optional
- recursively scan a directory for files. Using this option disables
- `partition discovery `_. # noqa
- modification times occurring before the specified time. The provided timestamp
- must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00)
- modifiedBefore (batch only) : an optional timestamp to only include files with
- modification times occurring before the specified time. The provided timestamp
- must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00)
- modifiedAfter (batch only) : an optional timestamp to only include files with
- modification times occurring after the specified time. The provided timestamp
- must be in the following format: YYYY-MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00)
+ Other Parameters
+ ----------------
+ Extra options
+ For the extra options, refer to
+ `Data Source Option `_ # noqa
+ in the version you use.
Examples
--------
@@ -1038,13 +1023,13 @@ def text(self, path, compression=None, lineSep=None):
----------
path : str
the path in any Hadoop supported file system
- compression : str, optional
- compression codec to use when saving to file. This can be one of the
- known case-insensitive shorten names (none, bzip2, gzip, lz4,
- snappy and deflate).
- lineSep : str, optional
- defines the line separator that should be used for writing. If None is
- set, it uses the default value, ``\\n``.
+
+ Other Parameters
+ ----------------
+ Extra options
+ For the extra options, refer to
+ `Data Source Option `_ # noqa
+ in the version you use.
The DataFrame must have only one column that is of string type.
Each row becomes a new line in the output file.
diff --git a/python/pyspark/sql/streaming.py b/python/pyspark/sql/streaming.py
index ad71c5041b82d..f1fbf73dce764 100644
--- a/python/pyspark/sql/streaming.py
+++ b/python/pyspark/sql/streaming.py
@@ -593,19 +593,13 @@ def text(self, path, wholetext=False, lineSep=None, pathGlobFilter=None,
----------
paths : str or list
string, or list of strings, for input path(s).
- wholetext : str or bool, optional
- if true, read each file from input path(s) as a single row.
- lineSep : str, optional
- defines the line separator that should be used for parsing. If None is
- set, it covers all ``\\r``, ``\\r\\n`` and ``\\n``.
- pathGlobFilter : str or bool, optional
- an optional glob pattern to only include files with paths matching
- the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`.
- It does not change the behavior of `partition discovery`_.
- recursiveFileLookup : str or bool, optional
- recursively scan a directory for files. Using this option
- disables
- `partition discovery `_. # noqa
+
+ Other Parameters
+ ----------------
+ Extra options
+ For the extra options, refer to
+ `Data Source Option `_ # noqa
+ in the version you use.
Notes
-----
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
index e2c9e3126c6fb..ea84785f27af8 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
@@ -773,24 +773,9 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
* spark.read().text("/path/to/spark/README.md")
* }}}
*
- * You can set the following text-specific option(s) for reading text files:
- *
- * - `wholetext` (default `false`): If true, read a file as a single row and not split by "\n".
- *
- * - `lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the line separator
- * that should be used for parsing.
- * - `pathGlobFilter`: an optional glob pattern to only include files with paths matching
- * the pattern. The syntax follows
org.apache.hadoop.fs.GlobFilter.
- * It does not change the behavior of partition discovery.
- * - `modifiedBefore` (batch only): an optional timestamp to only include files with
- * modification times occurring before the specified Time. The provided timestamp
- * must be in the following form: YYYY-MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00)
- * - `modifiedAfter` (batch only): an optional timestamp to only include files with
- * modification times occurring after the specified Time. The provided timestamp
- * must be in the following form: YYYY-MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00)
- * - `recursiveFileLookup`: recursively scan a directory for files. Using this option
- * disables partition discovery
- *
+ * You can find the text-specific options for reading text files in
+ *
+ * Data Source Option in the version you use.
*
* @param paths input paths
* @since 1.6.0
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
index 8c8def396a4d4..cb1029579aa5e 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
@@ -833,13 +833,9 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) {
* }}}
* The text files will be encoded as UTF-8.
*
- * You can set the following option(s) for writing text files:
- *
- * - `compression` (default `null`): compression codec to use when saving to file. This can be
- * one of the known case-insensitive shorten names (`none`, `bzip2`, `gzip`, `lz4`,
- * `snappy` and `deflate`).
- * - `lineSep` (default `\n`): defines the line separator that should be used for writing.
- *
+ * You can find the text-specific options for writing text files in
+ *
+ * Data Source Option in the version you use.
*
* @since 1.6.0
*/
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
index b369a0a59af3e..6c3fbaf00e2f7 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala
@@ -413,21 +413,16 @@ final class DataStreamReader private[sql](sparkSession: SparkSession) extends Lo
* spark.readStream().text("/path/to/directory/")
* }}}
*
- * You can set the following text-specific options to deal with text files:
+ * You can set the following option(s):
*
* - `maxFilesPerTrigger` (default: no max limit): sets the maximum number of new files to be
* considered in every trigger.
- * - `wholetext` (default `false`): If true, read a file as a single row and not split by "\n".
- *
- * - `lineSep` (default covers all `\r`, `\r\n` and `\n`): defines the line separator
- * that should be used for parsing.
- * - `pathGlobFilter`: an optional glob pattern to only include files with paths matching
- * the pattern. The syntax follows
org.apache.hadoop.fs.GlobFilter.
- * It does not change the behavior of partition discovery.
- * - `recursiveFileLookup`: recursively scan a directory for files. Using this option
- * disables partition discovery
*
*
+ * You can find the text-specific options for reading text files in
+ *
+ * Data Source Option in the version you use.
+ *
* @since 2.0.0
*/
def text(path: String): DataFrame = format("text").load(path)