-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docker] (Kubeflow integration) Add chmod --recursive 777 /home/ray to Ray Dockerfile. #31563
[Docker] (Kubeflow integration) Add chmod --recursive 777 /home/ray to Ray Dockerfile. #31563
Conversation
Hi @ijrsvt, Would you mind giving me some pointers? Q1: I hoped to build a docker image to test this PR, and I ran the following command. ➜ ray git:(kubeflow-openshift-docker) python ./ci/build/build-docker-images.py --py-versions py37 --device-types cpu --build-type LOCAL --build-base
Building the following python versions: ['3.7']
Building images for the following devices: ['cpu']
Building base images: True
Cleaned up 0.0MB
INVALID SHA FOUND
BUILT: rayproject/base-deps:nightly-py37-cpu
Traceback (most recent call last):
File "./ci/build/build-docker-images.py", line 721, in <module>
py_versions, image_types, args.base
File "./ci/build/build-docker-images.py", line 345, in build_or_pull_base_images
build_base_images(py_versions, image_types)
File "./ci/build/build-docker-images.py", line 319, in build_base_images
build_for_all_versions("ray-deps", py_versions, image_types, no_cache=False)
File "./ci/build/build-docker-images.py", line 313, in build_for_all_versions
image_name, py_version=py_version, image_type=image_type, **kwargs
File "./ci/build/build-docker-images.py", line 211, in _build_docker_image
wheel = _get_wheel_name(build_args["PYTHON_MINOR_VERSION"])
File "./ci/build/build-docker-images.py", line 134, in _get_wheel_name
f"Found ({len(matches)}) matches for 'ray-*{PYTHON_WHL_VERSION}"
AssertionError: Found (0) matches for 'ray-*cp37*-manylinux*' instead of 1.
wheel matches: [] However, there is no Q2: The script Thank you! |
cc @DmitriGekhtman @juliusvonkohout This PR is not ready to review. I will request reviews when it is ready. Thanks! |
Try I'm not sure if there is a way to only build one version :/ |
Thank @ijrsvt! I built Docker images on my devbox successfully by the following instructions: # Download wheels
export RAY_HASH=f1b8bfd219a98da69a612558cbae32441d98ca60
export RAY_VERSION=2.2.0
./release/util/download_wheels.sh # Modify this file, and comment out unused wheels.
# Move wheels to .whl/
mv *.whl .whl
# I do not know what is this, but "build-docker-images.py" asks me to run this command.
pip download --python-version 3.8 ray==2.2.0 --only-binary=:all:
# Build Docker images
python ./ci/build/build-docker-images.py --py-versions py37 --device-types cpu --build-type HUMAN --build-base
# STDOUT
Building the following python versions: ['3.7']
Building images for the following devices: ['cpu']
Building base images: True
Provide a 'branch name'. For releases, it should be `releases/x.x.x`
Provide a SHA (used for tag value)
Please download images using:
`pip download --python-version <py_version> ray==<ray_version>
Cleaned up 0.0MB
BUILT: rayproject/base-deps:nightly-py37-cpu
Cleaned up 0.0MB
BUILT: rayproject/ray-deps:nightly-py37-cpu
Cleaned up 985.0938501358032MB
BUILT: rayproject/ray:nightly-py37-cpu
python/requirements.txt
python/requirements_test.txt
python/requirements_linters.txt
python/requirements/compat/requirements_legacy_compat.txt
python/requirements/compat/requirements_py36_compat.txt
python/requirements/data_processing/requirements.txt
python/requirements/data_processing/requirements_dataset.txt
python/requirements/ml/requirements_ml_docker.txt
python/requirements/ml/requirements_tune.txt
python/requirements/ml/requirements_train.txt
python/requirements/ml/requirements_rllib.txt
python/requirements/ml/requirements_upstream.txt
python/requirements/ml/requirements_dl.txt
python/ray/tune/requirements-dev.txt
python/ray/tests/project_files/requirements_project/requirements.txt
python/ray/tests/project_files/session-tests/invalid-config-fail/ray-project/requirements.txt
python/ray/tests/project_files/session-tests/git-repo-pass/ray-project/requirements.txt
python/ray/tests/project_files/session-tests/project-pass/ray-project/requirements.txt
python/ray/tests/project_files/project1/requirements.txt
python/ray/util/collective/requirements.txt
Cleaned up 130.6116771697998MB
Still building rayproject/ray-ml:nightly-py37-cpu after 300 seconds
Still building rayproject/ray-ml:nightly-py37-cpu after 603 seconds
BUILT: rayproject/ray-ml:nightly-py37-cpu |
@juliusvonkohout would you mind reviewing this PR? I cannot send a review request to you via GitHub. |
I think |
@juliusvonkohout Good catch! I have updated the PR.
|
579cd33
to
8a2ea31
Compare
Yes, that should be fine since you do almost everything in this single RUN command. Just for the future you could easily squash the layers of the OCI image instead of writing everything in a single command. But that is another topic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comment update, but otherwise LGTM
Is this PR ready to be merged? cc @ijrsvt @DmitriGekhtman |
Looks good to me, but check the failed py39 docker-build step. |
I cannot connect to buildkite now. I will check the error message when it becomes available. |
maybe a rebase to master is needed. |
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Co-authored-by: Ian Rodney <ian.rodney@gmail.com> Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
6afae4b
to
24ac070
Compare
The failed py39 docker-build step is fixed after I rebased to master! |
I think PR titles should indicate the change that was made rather than the problem that was solved -- I've fixed the name for this one. |
Signed-off-by: kaihsun kaihsun@anyscale.com
Why are these changes needed?
This issue is a part of the integration between KubeRay and Kubeflow. See ray-project/kuberay#750 (comment) for some context.
In KinD (a Kubernetes distribution) cluster, when we use the command
kubectl exec ..
to log in to a ray Pod, the UID will be the same as $RAY_UID (i.e. 1000) in base-deps/Dockerfile.ray/docker/base-deps/Dockerfile
Lines 16 to 27 in b15d8f3
For OpenShift (a Kubernetes distribution), a random non-root UID will be used when we log in to a ray Pod. However, only
UID=1000
has the write access of/home/ray
. Therefore, the error message ofPermission denied
will be reported. As follows, onlyray
(UID = 1000) hasrwx
(read, write, execute) access to/home/ray
. Others only haver-x
(read & execute) access to/home/ray
.> ls -l /home/ drwxr-xr-x 1 ray users 4096 Dec 7 17:18 ray
To reproduce it, we can follow instructions in pod-security.md, and add
runAsUser
andrunAsGroup
to thesecurityContext
ofray-head
in ray-cluster.pod-security.yaml.After the RayCluster is ready, use
kubectl exec ...
to log in to the head Pod. Next, execute the following commands.Related issue number
Closes #30959
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.Test