Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 26, 2024

What changes were proposed in this pull request?

This PR aims to support server-side environment variable replacement in REST Submission API.

  • For example, ephemeral Spark clusters with server-side environment variables can provide backend-resource and information without touching client-side applications and configurations.

  • The place holder pattern is {{SERVER_ENVIRONMENT_VARIABLE_NAME}} style like the following.

<code>-verbose:gc -Xloggc:/tmp/{{APP_ID}}-{{EXECUTOR_ID}}.gc</code>

"org.apache.spark.deploy.worker.DriverWrapper",
Seq("{{WORKER_URL}}", "{{USER_JAR}}", mainClass) ++ appArgs, // args to the DriverWrapper

Why are the changes needed?

A user can submits an environment variable holder like {{AWS_CA_BUNDLE}} and {{AWS_ENDPOINT_URL}} in order to use server-wide environment variables of Spark Master.

$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" \
  AWS_ENDPOINT_URL=ENDPOINT_FOR_THIS_CLUSTER \
  sbin/start-master.sh

$ sbin/start-worker.sh spark://$(hostname):7077
curl -s -k -XPOST http://localhost:6066/v1/submissions/create \
  --header "Content-Type:application/json;charset=UTF-8" \
  --data '{
          "appResource": "",
          "sparkProperties": {
            "spark.master": "spark://localhost:7077",
            "spark.app.name": "",
            "spark.submit.deployMode": "cluster",
            "spark.jars": "/Users/dongjoon/APACHE/spark-merge/examples/target/scala-2.13/jars/spark-examples_2.13-4.0.0-SNAPSHOT.jar"
          },
          "clientSparkVersion": "",
          "mainClass": "org.apache.spark.examples.SparkPi",
          "environmentVariables": {
            "AWS_ACCESS_KEY_ID": "A",
            "AWS_SECRET_ACCESS_KEY": "B",
            "AWS_ENDPOINT_URL": "{{AWS_ENDPOINT_URL}}"
          },
          "action": "CreateSubmissionRequest",
          "appArgs": [ "10000" ]
  }'

Screenshot 2024-07-26 at 16 58 26

Does this PR introduce any user-facing change?

No. This is a new feature and disabled by default via spark.master.rest.enabled (default: false)

How was this patch tested?

Pass the CIs with newly added test case.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Jul 26, 2024
@dongjoon-hyun dongjoon-hyun marked this pull request as draft July 27, 2024 00:03
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review July 27, 2024 00:12
@dongjoon-hyun
Copy link
Member Author

Could you review this when you have some time, @viirya ?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49033][CORE] Support server-side environment variable replacement in REST Submission API [SPARK-49033][CORE] Support server-side environmentVariables replacement in REST Submission API Jul 27, 2024
@dongjoon-hyun
Copy link
Member Author

Could you review this PR when you have some time, @yaooqinn ?

conf: SparkConf)
extends SubmitRequestServlet {

val envVariablePattern = "\\{\\{[A-Z_]+\\}\\}".r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, where is this envVariablePattern used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Now, it's no op. Let me clean up it.
I used it for Scala 2.12 patch.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . It's removed.

val updatedMasters = masters.map(
_.replace(s":$masterRestPort", s":$masterPort")).getOrElse(masterUrl)
val appArgs = request.appArgs
// Filter SPARK_LOCAL_(IP|HOSTNAME) environment variables from being set on the remote system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also update this comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@dongjoon-hyun
Copy link
Member Author

Thank you. Let me merge this because all CIs passed and the last two commits are only about comment and removal of unused lines. I checked the unit test result too manually.

[info] StandaloneRestSubmitSuite:
[info] - construct submit request (16 milliseconds)
[info] - create submission (479 milliseconds)
[info] - create submission with multiple masters (18 milliseconds)
[info] - create submission from main method (15 milliseconds)
[info] - kill submission (18 milliseconds)
[info] - request submission status (13 milliseconds)
[info] - create then kill (22 milliseconds)
[info] - create then request status (21 milliseconds)
[info] - create then kill then request status (27 milliseconds)
[info] - kill or request status before create (13 milliseconds)
[info] - SPARK-45819: clear (16 milliseconds)
[info] - SPARK-45843: killAll (16 milliseconds)
[info] - SPARK-46368: readyz with SC_OK (16 milliseconds)
[info] - SPARK-46368: readyz with SC_SERVICE_UNAVAILABLE (12 milliseconds)
[info] - good request paths (19 milliseconds)
[info] - good request paths, bad requests (22 milliseconds)
[info] - bad request paths (16 milliseconds)
[info] - server returns unknown fields (24 milliseconds)
[info] - client handles faulty server (23 milliseconds)
[info] - client does not send 'SPARK_ENV_LOADED' env var by default (0 milliseconds)
[info] - client does not send 'SPARK_HOME' env var by default (0 milliseconds)
[info] - client does not send 'SPARK_CONF_DIR' env var by default (1 millisecond)
[info] - SPARK-49033: Support server-side environment variable replacement in REST Submission API (2 milliseconds)
[info] - SPARK-45197: Make StandaloneRestServer add JavaModuleOptions to drivers (0 milliseconds)
[info] Run completed in 1 second, 584 milliseconds.
[info] Total number of tests run: 24
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 24, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 9 s, completed Jul 26, 2024, 9:49:45 PM

@dongjoon-hyun dongjoon-hyun deleted the SPARK-49033 branch July 27, 2024 04:51
@yaooqinn
Copy link
Member

Late LGTM, do we need a doc for this feature?

@dongjoon-hyun
Copy link
Member Author

Thank you, @yaooqinn . Yes, I'll update it together. I have another PR at this area.

@yaooqinn
Copy link
Member

@dongjoon-hyun, Thank you!

ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
…ement in REST Submission API

### What changes were proposed in this pull request?

This PR aims to support server-side environment variable replacement in REST Submission API.

- For example, ephemeral Spark clusters with server-side environment variables can provide backend-resource and information without touching client-side applications and configurations.

- The place holder pattern is `{{SERVER_ENVIRONMENT_VARIABLE_NAME}}` style like the following.

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/docs/configuration.md?plain=1#L694

https://github.com/apache/spark/blob/163e512c53208301a8511310023d930d8b77db96/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L233-L234

### Why are the changes needed?

A user can submits an environment variable holder like `{{AWS_CA_BUNDLE}}` and `{{AWS_ENDPOINT_URL}}` in order to use server-wide environment variables of Spark Master.

```
$ SPARK_MASTER_OPTS="-Dspark.master.rest.enabled=true" \
  AWS_ENDPOINT_URL=ENDPOINT_FOR_THIS_CLUSTER \
  sbin/start-master.sh

$ sbin/start-worker.sh spark://$(hostname):7077
```

```
curl -s -k -XPOST http://localhost:6066/v1/submissions/create \
  --header "Content-Type:application/json;charset=UTF-8" \
  --data '{
          "appResource": "",
          "sparkProperties": {
            "spark.master": "spark://localhost:7077",
            "spark.app.name": "",
            "spark.submit.deployMode": "cluster",
            "spark.jars": "/Users/dongjoon/APACHE/spark-merge/examples/target/scala-2.13/jars/spark-examples_2.13-4.0.0-SNAPSHOT.jar"
          },
          "clientSparkVersion": "",
          "mainClass": "org.apache.spark.examples.SparkPi",
          "environmentVariables": {
            "AWS_ACCESS_KEY_ID": "A",
            "AWS_SECRET_ACCESS_KEY": "B",
            "AWS_ENDPOINT_URL": "{{AWS_ENDPOINT_URL}}"
          },
          "action": "CreateSubmissionRequest",
          "appArgs": [ "10000" ]
  }'
```

- http://localhost:4040/environment/

![Screenshot 2024-07-26 at 16 58 26](https://github.com/user-attachments/assets/c52daf4e-02ce-4015-bda6-895fb39a39a9)

### Does this PR introduce _any_ user-facing change?

No. This is a new feature and disabled by default via `spark.master.rest.enabled (default: false)`

### How was this patch tested?

Pass the CIs with newly added test case.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47509 from dongjoon-hyun/SPARK-49033.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun added a commit that referenced this pull request Jul 30, 2024
…I server-side env variable replacements

### What changes were proposed in this pull request?

This PR aims to document the following three recent improvements.
- #47491
- #47509
- #47511

### Why are the changes needed?

To provide an updated documentation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and check the HTML manually.

<img width="926" alt="Screenshot 2024-07-29 at 14 10 40" src="https://github.com/user-attachments/assets/6c904ec0-0ece-432a-8e41-aeb88f7baab8">

<img width="932" alt="Screenshot 2024-07-29 at 13 52 20" src="https://github.com/user-attachments/assets/ca3afe9a-dcfe-4258-b455-9ff4781cb4e5">

<img width="940" alt="Screenshot 2024-07-29 at 13 52 29" src="https://github.com/user-attachments/assets/ad9635d4-c66f-4320-8b93-005443d4df2e">

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47523 from dongjoon-hyun/SPARK-49049.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants