diff --git a/docs/building-spark.md b/docs/building-spark.md
index 90a520a62a989..23d6f49a4fe81 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -284,7 +284,7 @@ If use an individual repository or a repository on GitHub Enterprise, export bel
### Related environment variables
-
+
| Variable Name | Default | Meaning |
SPARK_PROJECT_URL |
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index 119412f96094d..c2145e35f7f24 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -89,7 +89,7 @@ The [job scheduling overview](job-scheduling.html) describes this in more detail
The following table summarizes terms you'll see used to refer to cluster concepts:
-
+
| Term | Meaning |
diff --git a/docs/configuration.md b/docs/configuration.md
index 75f597fdb4c6c..b13250a7786e6 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -135,7 +135,7 @@ of the most common options to set are:
### Application Properties
-
+
| Property Name | Default | Meaning | Since Version |
spark.app.name |
@@ -528,7 +528,7 @@ Apart from these, the following properties are also available, and may be useful
### Runtime Environment
-
+
| Property Name | Default | Meaning | Since Version |
spark.driver.extraClassPath |
@@ -915,7 +915,7 @@ Apart from these, the following properties are also available, and may be useful
### Shuffle Behavior
-
+
| Property Name | Default | Meaning | Since Version |
spark.reducer.maxSizeInFlight |
@@ -1290,7 +1290,7 @@ Apart from these, the following properties are also available, and may be useful
### Spark UI
-
+
| Property Name | Default | Meaning | Since Version |
spark.eventLog.logBlockUpdates.enabled |
@@ -1682,7 +1682,7 @@ Apart from these, the following properties are also available, and may be useful
### Compression and Serialization
-
+
| Property Name | Default | Meaning | Since Version |
spark.broadcast.compress |
@@ -1880,7 +1880,7 @@ Apart from these, the following properties are also available, and may be useful
### Memory Management
-
+
| Property Name | Default | Meaning | Since Version |
spark.memory.fraction |
@@ -2005,7 +2005,7 @@ Apart from these, the following properties are also available, and may be useful
### Execution Behavior
-
+
| Property Name | Default | Meaning | Since Version |
spark.broadcast.blockSize |
@@ -2250,7 +2250,7 @@ Apart from these, the following properties are also available, and may be useful
### Executor Metrics
-
+
| Property Name | Default | Meaning | Since Version |
spark.eventLog.logStageExecutorMetrics |
@@ -2318,7 +2318,7 @@ Apart from these, the following properties are also available, and may be useful
### Networking
-
+
| Property Name | Default | Meaning | Since Version |
spark.rpc.message.maxSize |
@@ -2481,7 +2481,7 @@ Apart from these, the following properties are also available, and may be useful
### Scheduling
-
+
| Property Name | Default | Meaning | Since Version |
spark.cores.max |
@@ -2962,7 +2962,7 @@ Apart from these, the following properties are also available, and may be useful
### Barrier Execution Mode
-
+
| Property Name | Default | Meaning | Since Version |
spark.barrier.sync.timeout |
@@ -3009,7 +3009,7 @@ Apart from these, the following properties are also available, and may be useful
### Dynamic Allocation
-
+
| Property Name | Default | Meaning | Since Version |
spark.dynamicAllocation.enabled |
@@ -3151,7 +3151,7 @@ finer granularity starting from driver and executor. Take RPC module as example
like shuffle, just replace "rpc" with "shuffle" in the property names except
spark.{driver|executor}.rpc.netty.dispatcher.numThreads, which is only for RPC module.
-
+
| Property Name | Default | Meaning | Since Version |
spark.{driver|executor}.rpc.io.serverThreads |
@@ -3294,7 +3294,7 @@ External users can query the static sql config values via `SparkSession.conf` or
### Spark Streaming
-
+
| Property Name | Default | Meaning | Since Version |
spark.streaming.backpressure.enabled |
@@ -3426,7 +3426,7 @@ External users can query the static sql config values via `SparkSession.conf` or
### SparkR
-
+
| Property Name | Default | Meaning | Since Version |
spark.r.numRBackendThreads |
@@ -3482,7 +3482,7 @@ External users can query the static sql config values via `SparkSession.conf` or
### GraphX
-
+
| Property Name | Default | Meaning | Since Version |
spark.graphx.pregel.checkpointInterval |
@@ -3519,7 +3519,7 @@ copy `conf/spark-env.sh.template` to create it. Make sure you make the copy exec
The following variables can be set in `spark-env.sh`:
-
+
| Environment Variable | Meaning |
JAVA_HOME |
@@ -3656,7 +3656,7 @@ Push-based shuffle helps improve the reliability and performance of spark shuffl
### External Shuffle service(server) side configuration options
-
+
| Property Name | Default | Meaning | Since Version |
spark.shuffle.push.server.mergedShuffleFileManagerImpl |
@@ -3690,7 +3690,7 @@ Push-based shuffle helps improve the reliability and performance of spark shuffl
### Client side configuration options
-
+
| Property Name | Default | Meaning | Since Version |
spark.shuffle.push.enabled |
diff --git a/docs/css/custom.css b/docs/css/custom.css
index c4388c9650bf4..71de2b8c7803f 100644
--- a/docs/css/custom.css
+++ b/docs/css/custom.css
@@ -1111,5 +1111,18 @@ img {
table {
width: 100%;
overflow-wrap: normal;
+ border-collapse: collapse; /* Ensures that the borders collapse into a single border */
}
+table th, table td {
+ border: 1px solid #cccccc; /* Adds a border to each table header and data cell */
+ padding: 6px 13px; /* Optional: Adds padding inside each cell for better readability */
+}
+
+table tr {
+ background-color: white; /* Sets a default background color for all rows */
+}
+
+table tr:nth-child(2n) {
+ background-color: #F1F4F5; /* Sets a different background color for even rows */
+}
diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md
index d184f4fe0257c..604b3245272fc 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -703,7 +703,7 @@ others.
### Available families
-
+
| Family |
@@ -1224,7 +1224,7 @@ All output columns are optional; to exclude an output column, set its correspond
### Input Columns
-
+
| Param name |
@@ -1251,7 +1251,7 @@ All output columns are optional; to exclude an output column, set its correspond
### Output Columns
-
+
| Param name |
@@ -1326,7 +1326,7 @@ All output columns are optional; to exclude an output column, set its correspond
#### Input Columns
-
+
| Param name |
@@ -1353,7 +1353,7 @@ All output columns are optional; to exclude an output column, set its correspond
#### Output Columns (Predictions)
-
+
| Param name |
@@ -1407,7 +1407,7 @@ All output columns are optional; to exclude an output column, set its correspond
#### Input Columns
-
+
| Param name |
@@ -1436,7 +1436,7 @@ Note that `GBTClassifier` currently only supports binary labels.
#### Output Columns (Predictions)
-
+
| Param name |
diff --git a/docs/ml-clustering.md b/docs/ml-clustering.md
index 00a156b6645ce..fdb8173ce3bbe 100644
--- a/docs/ml-clustering.md
+++ b/docs/ml-clustering.md
@@ -40,7 +40,7 @@ called [kmeans||](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf).
### Input Columns
-
+
| Param name |
@@ -61,7 +61,7 @@ called [kmeans||](http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf).
### Output Columns
-
+
| Param name |
@@ -204,7 +204,7 @@ model.
### Input Columns
-
+
| Param name |
@@ -225,7 +225,7 @@ model.
### Output Columns
-
+
| Param name |
diff --git a/docs/mllib-classification-regression.md b/docs/mllib-classification-regression.md
index 10cb85e392029..b3305314abc56 100644
--- a/docs/mllib-classification-regression.md
+++ b/docs/mllib-classification-regression.md
@@ -26,7 +26,7 @@ classification](http://en.wikipedia.org/wiki/Multiclass_classification), and
[regression analysis](http://en.wikipedia.org/wiki/Regression_analysis). The table below outlines
the supported algorithms for each type of problem.
-
+
| Problem Type | Supported Methods |
diff --git a/docs/mllib-decision-tree.md b/docs/mllib-decision-tree.md
index 174255c48b699..0d9886315e288 100644
--- a/docs/mllib-decision-tree.md
+++ b/docs/mllib-decision-tree.md
@@ -51,7 +51,7 @@ The *node impurity* is a measure of the homogeneity of the labels at the node. T
implementation provides two impurity measures for classification (Gini impurity and entropy) and one
impurity measure for regression (variance).
-
+
| Impurity | Task | Formula | Description |
diff --git a/docs/mllib-ensembles.md b/docs/mllib-ensembles.md
index b1006f2730db5..fdad7ae68dd49 100644
--- a/docs/mllib-ensembles.md
+++ b/docs/mllib-ensembles.md
@@ -191,7 +191,7 @@ Note that each loss is applicable to one of classification or regression, not bo
Notation: $N$ = number of instances. $y_i$ = label of instance $i$. $x_i$ = features of instance $i$. $F(x_i)$ = model's predicted label for instance $i$.
-
+
| Loss | Task | Formula | Description |
diff --git a/docs/mllib-evaluation-metrics.md b/docs/mllib-evaluation-metrics.md
index f82f6a01136b9..30acc3dc634be 100644
--- a/docs/mllib-evaluation-metrics.md
+++ b/docs/mllib-evaluation-metrics.md
@@ -76,7 +76,7 @@ plots (recall, false positive rate) points.
**Available metrics**
-
+
| Metric | Definition |
@@ -179,7 +179,7 @@ For this section, a modified delta function $\hat{\delta}(x)$ will prove useful
$$\hat{\delta}(x) = \begin{cases}1 & \text{if $x = 0$}, \\ 0 & \text{otherwise}.\end{cases}$$
-
+
| Metric | Definition |
@@ -296,7 +296,7 @@ The following definition of indicator function $I_A(x)$ on a set $A$ will be nec
$$I_A(x) = \begin{cases}1 & \text{if $x \in A$}, \\ 0 & \text{otherwise}.\end{cases}$$
-
+
| Metric | Definition |
@@ -447,7 +447,7 @@ documents, returns a relevance score for the recommended document.
$$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{cases}$$
-
+
| Metric | Definition | Notes |
@@ -553,7 +553,7 @@ variable from a number of independent variables.
**Available metrics**
-
+
| Metric | Definition |
diff --git a/docs/mllib-linear-methods.md b/docs/mllib-linear-methods.md
index b535d2de307a9..448d881f794a5 100644
--- a/docs/mllib-linear-methods.md
+++ b/docs/mllib-linear-methods.md
@@ -72,7 +72,7 @@ training error) and minimizing model complexity (i.e., to avoid overfitting).
The following table summarizes the loss functions and their gradients or sub-gradients for the
methods `spark.mllib` supports:
-
+
| loss function $L(\wv; \x, y)$ | gradient or sub-gradient |
@@ -105,7 +105,7 @@ The purpose of the
encourage simple models and avoid overfitting. We support the following
regularizers in `spark.mllib`:
-
+
| regularizer $R(\wv)$ | gradient or sub-gradient |
diff --git a/docs/mllib-pmml-model-export.md b/docs/mllib-pmml-model-export.md
index e20d7c2fe4e17..02b5fda7a36df 100644
--- a/docs/mllib-pmml-model-export.md
+++ b/docs/mllib-pmml-model-export.md
@@ -28,7 +28,7 @@ license: |
The table below outlines the `spark.mllib` models that can be exported to PMML and their equivalent PMML model.
-
+
| spark.mllib model | PMML model |
diff --git a/docs/monitoring.md b/docs/monitoring.md
index 7336be9bb67e0..056543deb0946 100644
--- a/docs/monitoring.md
+++ b/docs/monitoring.md
@@ -69,7 +69,7 @@ The history server can be configured as follows:
### Environment Variables
-
+
| Environment Variable | Meaning |
SPARK_DAEMON_MEMORY |
@@ -145,7 +145,7 @@ Use it with caution.
Security options for the Spark History Server are covered more detail in the
[Security](security.html#web-ui) page.
-
+
| Property Name |
@@ -470,7 +470,7 @@ only for applications in cluster mode, not applications in client mode. Applicat
can be identified by their `[attempt-id]`. In the API listed below, when running in YARN cluster mode,
`[app-id]` will actually be `[base-app-id]/[attempt-id]`, where `[base-app-id]` is the YARN application ID.
-
+
| Endpoint | Meaning |
/applications |
@@ -669,7 +669,7 @@ The REST API exposes the values of the Task Metrics collected by Spark executors
of task execution. The metrics can be used for performance troubleshooting and workload characterization.
A list of the available metrics, with a short description:
-
+
| Spark Executor Task Metric name |
@@ -827,7 +827,7 @@ In addition, aggregated per-stage peak values of the executor memory metrics are
Executor memory metrics are also exposed via the Spark metrics system based on the [Dropwizard metrics library](https://metrics.dropwizard.io/4.2.0).
A list of the available metrics, with a short description:
-
+
| Executor Level Metric name |
Short description |
diff --git a/docs/rdd-programming-guide.md b/docs/rdd-programming-guide.md
index 7764f0bbb5f8f..b92b3da09c5c5 100644
--- a/docs/rdd-programming-guide.md
+++ b/docs/rdd-programming-guide.md
@@ -378,7 +378,7 @@ resulting Java objects using [pickle](https://github.com/irmen/pickle/). When sa
PySpark does the reverse. It unpickles Python objects into Java objects and then converts them to Writables. The following
Writables are automatically converted:
-
+
| Writable Type | Python Type |
| Text | str |
| IntWritable | int |
@@ -954,7 +954,7 @@ and pair RDD functions doc
[Java](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html))
for details.
-
+
| Transformation | Meaning |
| map(func) |
@@ -1069,7 +1069,7 @@ and pair RDD functions doc
[Java](api/java/index.html?org/apache/spark/api/java/JavaPairRDD.html))
for details.
-
+
| Action | Meaning |
| reduce(func) |
@@ -1214,7 +1214,7 @@ to `persist()`. The `cache()` method is a shorthand for using the default storag
which is `StorageLevel.MEMORY_ONLY` (store deserialized objects in memory). The full set of
storage levels is:
-
+
| Storage Level | Meaning |
| MEMORY_ONLY |
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index 24e0575d83e4d..cc70c025792f7 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -592,7 +592,7 @@ See the [configuration page](configuration.html) for information on Spark config
#### Spark Properties
-
+
| Property Name | Default | Meaning | Since Version |
spark.kubernetes.context |
@@ -1658,7 +1658,7 @@ See the below table for the full list of pod specifications that will be overwri
### Pod Metadata
-
+
| Pod metadata key | Modified value | Description |
| name |
@@ -1694,7 +1694,7 @@ See the below table for the full list of pod specifications that will be overwri
### Pod Spec
-
+
| Pod spec key | Modified value | Description |
| imagePullSecrets |
@@ -1747,7 +1747,7 @@ See the below table for the full list of pod specifications that will be overwri
The following affect the driver and executor containers. All other containers in the pod spec will be unaffected.
-
+
| Container spec key | Modified value | Description |
| env |
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 11ed7e9e87737..52afb178a5156 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -143,7 +143,7 @@ To use a custom metrics.properties for the application master and executors, upd
#### Spark Properties
-
+
| Property Name | Default | Meaning | Since Version |
spark.yarn.am.memory |
@@ -696,7 +696,7 @@ To use a custom metrics.properties for the application master and executors, upd
#### Available patterns for SHS custom executor log URL
-
+
| Pattern | Meaning |
| {{HTTP_SCHEME}} |
@@ -783,7 +783,7 @@ staging directory of the Spark application.
## YARN-specific Kerberos Configuration
-
+
| Property Name | Default | Meaning | Since Version |
spark.kerberos.keytab |
@@ -882,7 +882,7 @@ to avoid garbage collection issues during shuffle.
The following extra configuration options are available when the shuffle service is running on YARN:
-
+
| Property Name | Default | Meaning |
spark.yarn.shuffle.stopOnFailure |
diff --git a/docs/security.md b/docs/security.md
index 2a1105fea33fe..755c7ce8b430d 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -60,7 +60,7 @@ distributing the shared secret. Each application will use a unique shared secret
the case of YARN, this feature relies on YARN RPC encryption being enabled for the distribution of
secrets to be secure.
-
+
| Property Name | Default | Meaning | Since Version |
spark.yarn.shuffle.server.recovery.disabled |
@@ -82,7 +82,7 @@ that any user that can list pods in the namespace where the Spark application is
also see their authentication secret. Access control rules should be properly set up by the
Kubernetes admin to ensure that Spark authentication is secure.
-
+
| Property Name | Default | Meaning | Since Version |
spark.authenticate |
@@ -103,7 +103,7 @@ Kubernetes admin to ensure that Spark authentication is secure.
Alternatively, one can mount authentication secrets using files and Kubernetes secrets that
the user mounts into their pods.
-
+
| Property Name | Default | Meaning | Since Version |
spark.authenticate.secret.file |
@@ -178,7 +178,7 @@ is still required when talking to shuffle services from Spark versions older tha
The following table describes the different options available for configuring this feature.
-
+
| Property Name | Default | Meaning | Since Version |
spark.network.crypto.enabled |
@@ -249,7 +249,7 @@ encrypting output data generated by applications with APIs such as `saveAsHadoop
The following settings cover enabling encryption for data written to disk:
-
+
| Property Name | Default | Meaning | Since Version |
spark.io.encryption.enabled |
@@ -317,7 +317,7 @@ below.
The following options control the authentication of Web UIs:
-
+
| Property Name | Default | Meaning | Since Version |
spark.ui.allowFramingFrom |
@@ -421,7 +421,7 @@ servlet filters.
To enable authorization in the SHS, a few extra options are used:
-
+
| Property Name | Default | Meaning | Since Version |
spark.history.ui.acls.enable |
@@ -472,7 +472,7 @@ are inherited this way, *except* for `spark.ssl.rpc.enabled` which must be expli
The following table describes the SSL configuration namespaces:
-
+
| Config Namespace |
@@ -507,7 +507,7 @@ The following table describes the SSL configuration namespaces:
The full breakdown of available SSL options can be found below. The `${ns}` placeholder should be
replaced with one of the above namespaces.
-
+
| Property Name | Default | Meaning | Supported Namespaces |
${ns}.enabled |
@@ -726,7 +726,7 @@ Apache Spark can be configured to include HTTP headers to aid in preventing Cros
(XSS), Cross-Frame Scripting (XFS), MIME-Sniffing, and also to enforce HTTP Strict Transport
Security.
-
+
| Property Name | Default | Meaning | Since Version |
spark.ui.xXssProtection |
@@ -782,7 +782,7 @@ configure those ports.
## Standalone mode only
-
+
| From | To | Default Port | Purpose | Configuration
@@ -833,7 +833,7 @@ configure those ports.
## All cluster managers
-
+
| From | To | Default Port | Purpose | Configuration
@@ -909,7 +909,7 @@ deployment-specific page for more information.
The following options provides finer-grained control for this feature:
-
+
| Property Name | Default | Meaning | Since Version |
spark.security.credentials.${service}.enabled |
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 2ab68d2a8049f..7a89c8124bdfe 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -53,7 +53,7 @@ You should see the new node listed there, along with its number of CPUs and memo
Finally, the following configuration options can be passed to the master and worker:
-
+
| Argument | Meaning |
-h HOST, --host HOST |
@@ -116,7 +116,7 @@ Note that these scripts must be executed on the machine you want to run the Spar
You can optionally configure the cluster further by setting environment variables in `conf/spark-env.sh`. Create this file by starting with the `conf/spark-env.sh.template`, and _copy it to all your worker machines_ for the settings to take effect. The following settings are available:
-
+
| Environment Variable | Meaning |
SPARK_MASTER_HOST |
@@ -188,7 +188,7 @@ You can optionally configure the cluster further by setting environment variable
SPARK_MASTER_OPTS supports the following system properties:
-
+
| Property Name | Default | Meaning | Since Version |
spark.master.ui.port |
@@ -386,7 +386,7 @@ SPARK_MASTER_OPTS supports the following system properties:
SPARK_WORKER_OPTS supports the following system properties:
-
+
| Property Name | Default | Meaning | Since Version |
spark.worker.cleanup.enabled |
@@ -501,7 +501,7 @@ You can also pass an option `--total-executor-cores ` to control the n
Spark applications supports the following configuration properties specific to standalone mode:
-
+
| Property Name | Default Value | Meaning | Since Version |
spark.standalone.submit.waitAppCompletion |
@@ -551,7 +551,7 @@ via http://[host:port]/[version]/submissions/[action] where
version is a protocol version, v1 as of today, and
action is one of the following supported actions.
-
+
| Command | Description | HTTP METHOD | Since Version |
create |
@@ -730,7 +730,7 @@ ZooKeeper is the best way to go for production-level high availability, but if y
In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration:
-
+
| System property | Default Value | Meaning | Since Version |
spark.deploy.recoveryMode |
diff --git a/docs/sparkr.md b/docs/sparkr.md
index 8e6a98e40b680..a34a1200c4c00 100644
--- a/docs/sparkr.md
+++ b/docs/sparkr.md
@@ -77,7 +77,7 @@ sparkR.session(master = "local[*]", sparkConfig = list(spark.driver.memory = "2g
The following Spark driver properties can be set in `sparkConfig` with `sparkR.session` from RStudio:
-
+
| Property Name | Property group | spark-submit equivalent |
spark.master |
@@ -588,7 +588,7 @@ The following example shows how to save/load a MLlib model by SparkR.
{% include_example read_write r/ml/ml.R %}
# Data type mapping between R and Spark
-
+
| |
|---|