Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29474] [CORE] [WIP] CLI support for Spark-on-Docker-on-Yarn #48018

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

retpolanne
Copy link

@retpolanne retpolanne commented Sep 6, 2024

What changes were proposed in this pull request?

Introduce four new command line arguments to spark-submit:

--yarn-docker-image
--yarn-docker-mounts
--executor-docker-image
--executor-docker-mounts

These arguments can be used to simplify the boilerplate code for using docker with YARN, such as:

--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=repo/image:tag
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro"
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=repo/image:tag
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro"

Why are the changes needed?

The reasoning behind this change is this issue. It's important to have a more user friendly interface for using docker, instead of the boilerplate aforementioned, where spark/YARN configs for docker stay hidden behind a façade.

Does this PR introduce any user-facing change?

It adds the aforementioned command line arguments, that will be available on spark-submit and subsequently spark-shell once this PR is applied.

How was this patch tested?

Tests were added, where it is tested if the configs from the boilerplace are properly applied. It was also tested using Github actions.

To do: test these changes on a real YARN cluster.

Was this patch authored or co-authored using generative AI tooling?

No.

Introduce four new command line arguments to spark-submit:

```
--yarn-docker-image
--yarn-docker-mounts
--executor-docker-image
--executor-docker-mounts
```

Together, they can hide the boilerplate for using Docker on YARN:

```sh
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=repo/image:tag
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro"
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=repo/image:tag
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/hadoop:/etc/hadoop:ro"
```

Signed-off-by: Anne Macedo <annie@retpolanne.com>
@github-actions github-actions bot added the CORE label Sep 6, 2024
@retpolanne
Copy link
Author

retpolanne commented Sep 6, 2024

Can you review, @gengliangwang, @LuciferYang @cloud-fan ? Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant