Skip to content

Conversation

@andrewsykim
Copy link
Member

Description

Replace existing KubeRay authentication guide based on kube-rbac-proxy with native Ray token authentication being introduced in Ray 2.52.0

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@andrewsykim andrewsykim requested review from a team as code owners November 18, 2025 03:52
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of updating the KubeRay authentication documentation to reflect the new token-based authentication mechanism. The instructions are much simpler and clearer now. I've made a few suggestions to improve formatting and syntax highlighting in code blocks. I also recommend re-adding the section on accessing the Ray Dashboard, as its removal leaves a gap in the user guide.

@ray-gardener ray-gardener bot added docs An issue or change related to documentation core Issues that should be addressed in Ray Core community-contribution Contributed by the community labels Nov 18, 2025
@Future-Outlier Future-Outlier self-assigned this Nov 18, 2025

```bash
export RAY_AUTH_MODE=token
export RAY_AUTH_TOKEN=$(kubectl get secrets ray-cluster-with-auth --template={{.data.auth_token}} | base64 -d)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is ray-cluster-with-auth automatically generated by the kuberay operator, and if so, how does the name get generated? consider demonstrating this step by showing the output of kubectl get secrets after creating the cluster above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the name of the RayCluster in the example. In KubeRay v1.5.1 it will be autogenerated based on the name of the RayCluste as well

## View the Ray dashboard (optional)
To view the Ray dashboard from your browser, first configure port-forwarding:
Then open `localhost:8265` in your browser. You will be prompted to provide the auth token for the cluster, which can be retrieved with:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewsykim I added back this short section on how to view the dashboard

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some inline comments & I pushed a few minor formatting changes and added back a section about how to view the dashboard. Feel free to adjust or remove if you think it's unnecessary.

Please double check that all commands work as-is when copy-pasted as an end-to-end flow. Ping me when ready to merge.

@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Nov 19, 2025
@andrewsykim
Copy link
Member Author

Here's a full run through of the guide using Kind and nightly version of Ray and KubeRay:

Starting with an empty cluster

$ kubectl get po
NAME                               READY   STATUS    RESTARTS   AGE
kuberay-operator-8c9c6466f-wrdgl   1/1     Running   0          58s

Create a cluster using new authOptions API and autoscaler enabled:

$ kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/refs/heads/master/ray-operator/config/samples/ray-cluster.auth.yaml
raycluster.ray.io/ray-cluster-with-auth created

Verify secret with token:

$ kubectl get secret ray-cluster-with-auth -o yaml
apiVersion: v1
data:
  auth_token: bXVzcW9qTUVJZm5SRHByN2lrbGplS3p2NFk5bk1rUkxPNlVBcmtHV0pqdz0=
kind: Secret
metadata:
  creationTimestamp: "2025-11-19T16:22:28Z"
  labels:
    ray.io/cluster: ray-cluster-with-auth
  name: ray-cluster-with-auth
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayCluster
    name: ray-cluster-with-auth
    uid: 2c2e4bba-a4e8-49fa-83e1-eacd432734f0
  resourceVersion: "644"
  uid: 090df544-2422-4531-a114-d641656d1d22
type: Opaque

Verify cluster is healthy:

$ kubectl get po
NAME                                             READY   STATUS    RESTARTS   AGE
kuberay-operator-8c9c6466f-wrdgl                 1/1     Running   0          2m47s
ray-cluster-with-auth-head-q97jm                 2/2     Running   0          81s
ray-cluster-with-auth-workergroup-worker-b9h24   1/1     Running   0          81s

Check unauthenticated client fails:

$ kubectl port-forward svc/ray-cluster-with-auth-head-svc 8265:8265 &
[1] 356596
Forwarding from 127.0.0.1:8265 -> 8265
Forwarding from [::1]:8265 -> 8265

$ ray job submit --address http://localhost:8265  -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
Handling connection for 8265
Traceback (most recent call last):
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/bin/ray", line 33, in <module>
    sys.exit(load_entry_point('ray', 'console_scripts', 'ray')())
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/scripts/scripts.py", line 2835, in main
    return cli()
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1363, in main
    rv = self.invoke(ctx)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/job/cli.py", line 269, in submit
    client = _get_sdk_client(
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/job/cli.py", line 34, in _get_sdk_client
    client = JobSubmissionClient(
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/job/sdk.py", line 106, in __init__
    self._check_connection_and_version(
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/dashboard_sdk.py", line 254, in _check_connection_and_version
    self._check_connection_and_version_with_url(min_version, version_error_message)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/dashboard_sdk.py", line 268, in _check_connection_and_version_with_url
    r = self._do_request("GET", url)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/dashboard_sdk.py", line 326, in _do_request
    raise RuntimeError(formatted_error)
RuntimeError: Authentication failed: Forbidden: Invalid authentication token

The authentication token you provided is invalid or incorrect.

Please provide an authentication token using one of these methods:
  1. Set the RAY_AUTH_TOKEN environment variable
  2. Set the RAY_AUTH_TOKEN_PATH environment variable (pointing to a token file)
  3. Create a token file at the default location: ~/.ray/auth_token

Set token and and submit job:

$ export RAY_AUTH_MODE=token
$ export RAY_AUTH_TOKEN=$(kubectl get secrets ray-cluster-with-auth --template={{.data.auth_token}} | base64 -d)
$ ray job submit --address http://localhost:8265  -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
Handling connection for 8265
Handling connection for 8265
Job submission server address: http://localhost:8265
Handling connection for 8265

-------------------------------------------------------
Job 'raysubmit_xchshk6GxSKd1K7N' submitted successfully
-------------------------------------------------------

Next steps
  Query the logs of the job:
    ray job logs raysubmit_xchshk6GxSKd1K7N
  Query the status of the job:
    ray job status raysubmit_xchshk6GxSKd1K7N
  Request the job to be stopped:
    ray job stop raysubmit_xchshk6GxSKd1K7N

Handling connection for 8265
Tailing logs until the job exits (disable with --no-wait):
Handling connection for 8265
2025-11-19 08:25:19,993	INFO job_manager.py:568 -- Runtime env is setting up.
Running entrypoint for job raysubmit_xchshk6GxSKd1K7N: python -c "import ray; ray.init(); print(ray.cluster_resources())"
2025-11-19 08:25:21,595	INFO worker.py:1696 -- Using address 10.244.0.6:6379 set in the environment variable RAY_ADDRESS
2025-11-19 08:25:21,602	INFO worker.py:1837 -- Connecting to existing Ray cluster at address: 10.244.0.6:6379...
2025-11-19 08:25:21,620	INFO worker.py:2014 -- Connected to Ray cluster. View the dashboard at http://10.244.0.6:8265
/home/ray/anaconda3/lib/python3.11/site-packages/ray/_private/worker.py:2062: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
{'CPU': 8.0, 'node:__internal_head__': 1.0, 'object_store_memory': 3439332556.0, 'memory': 16000000000.0, 'node:10.244.0.6': 1.0, 'node:10.244.0.7': 1.0}
Handling connection for 8265

------------------------------------------
Job 'raysubmit_xchshk6GxSKd1K7N' succeeded
------------------------------------------

Dashboard auth also worked:

Screenshot 2025-11-19 at 11 27 24 AM

@andrewsykim
Copy link
Member Author

Here's the full run through using manual method (w/ KubeRay older than v1.5.1):

$ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.27.3) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/

$ helm install kuberay-operator kuberay/kuberay-operator
NAME: kuberay-operator
LAST DEPLOYED: Wed Nov 19 16:29:54 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Create secret manually:

$ kubectl create secret generic ray-cluster-with-auth --from-literal=auth_token=$(openssl rand -base64 32)
secret/ray-cluster-with-auth created

Create cluster:

$ kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/refs/heads/master/ray-operator/config/samples/ray-cluster.auth-manual.yaml
raycluster.ray.io/ray-cluster-with-auth created

Verify auth works:

$ kubectl port-forward svc/ray-cluster-with-auth-head-svc 8265:8265 &
[1] 371349
Forwarding from 127.0.0.1:8265 -> 8265
Forwarding from [::1]:8265 -> 8265

$ ray job submit --address http://localhost:8265  -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
Handling connection for 8265
Traceback (most recent call last):
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/bin/ray", line 33, in <module>
    sys.exit(load_entry_point('ray', 'console_scripts', 'ray')())
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/scripts/scripts.py", line 2835, in main
    return cli()
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1442, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1363, in main
    rv = self.invoke(ctx)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1830, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 1226, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/myenv/lib/python3.10/site-packages/click/core.py", line 794, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/job/cli.py", line 269, in submit
    client = _get_sdk_client(
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/job/cli.py", line 34, in _get_sdk_client
    client = JobSubmissionClient(
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/job/sdk.py", line 106, in __init__
    self._check_connection_and_version(
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/dashboard_sdk.py", line 254, in _check_connection_and_version
    self._check_connection_and_version_with_url(min_version, version_error_message)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/dashboard_sdk.py", line 268, in _check_connection_and_version_with_url
    r = self._do_request("GET", url)
  File "/usr/local/google/home/andrewsy/go/src/github.com/ray-project/ray/python/ray/dashboard/modules/dashboard_sdk.py", line 326, in _do_request
    raise RuntimeError(formatted_error)
RuntimeError: Authentication failed: Forbidden: Invalid authentication token

The authentication token you provided is invalid or incorrect.

Please provide an authentication token using one of these methods:
  1. Set the RAY_AUTH_TOKEN environment variable
  2. Set the RAY_AUTH_TOKEN_PATH environment variable (pointing to a token file)
  3. Create a token file at the default location: ~/.ray/auth_token
Handling connection for 8265

Set token and retry:

$ export RAY_AUTH_MODE=token
$ export RAY_AUTH_TOKEN=$(kubectl get secrets ray-cluster-with-auth --template={{.data.auth_token}} | base64 -d)
$ ray job submit --address http://localhost:8265  -- python -c "import ray; ray.init(); print(ray.cluster_resources())"
Handling connection for 8265
Handling connection for 8265
Job submission server address: http://localhost:8265
Handling connection for 8265

-------------------------------------------------------
Job 'raysubmit_eYxXqkEzWyPsxHpB' submitted successfully
-------------------------------------------------------

Next steps
  Query the logs of the job:
    ray job logs raysubmit_eYxXqkEzWyPsxHpB
  Query the status of the job:
    ray job status raysubmit_eYxXqkEzWyPsxHpB
  Request the job to be stopped:
    ray job stop raysubmit_eYxXqkEzWyPsxHpB

Handling connection for 8265
Tailing logs until the job exits (disable with --no-wait):
Handling connection for 8265
2025-11-19 08:32:53,502	INFO job_manager.py:568 -- Runtime env is setting up.
Running entrypoint for job raysubmit_eYxXqkEzWyPsxHpB: python -c "import ray; ray.init(); print(ray.cluster_resources())"
2025-11-19 08:32:55,123	INFO worker.py:1696 -- Using address 10.244.0.6:6379 set in the environment variable RAY_ADDRESS
2025-11-19 08:32:55,130	INFO worker.py:1837 -- Connecting to existing Ray cluster at address: 10.244.0.6:6379...
2025-11-19 08:32:55,148	INFO worker.py:2014 -- Connected to Ray cluster. View the dashboard at http://10.244.0.6:8265
/home/ray/anaconda3/lib/python3.11/site-packages/ray/_private/worker.py:2062: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
  warnings.warn(
{'CPU': 8.0, 'node:__internal_head__': 1.0, 'memory': 16000000000.0, 'object_store_memory': 3380444774.0, 'node:10.244.0.6': 1.0, 'node:10.244.0.7': 1.0}
Handling connection for 8265

------------------------------------------
Job 'raysubmit_eYxXqkEzWyPsxHpB' succeeded
------------------------------------------

andrewsykim and others added 6 commits November 19, 2025 16:34
…ion in Ray 2.52.0

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes merged commit b70d990 into ray-project:master Nov 19, 2025
6 checks passed
edoakes added a commit to edoakes/ray that referenced this pull request Nov 19, 2025
…ion (ray-project#58729)

Replace existing KubeRay authentication guide based on kube-rbac-proxy
with native Ray token authentication being introduced in Ray 2.52.0

> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to

> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
aslonnie pushed a commit that referenced this pull request Nov 19, 2025
Cherry pick: #58729

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Andrew Sy Kim <andrewsy@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
400Ping pushed a commit to 400Ping/ray that referenced this pull request Nov 21, 2025
…ion (ray-project#58729)

## Description

Replace existing KubeRay authentication guide based on kube-rbac-proxy
with native Ray token authentication being introduced in Ray 2.52.0

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…ion (ray-project#58729)

## Description

Replace existing KubeRay authentication guide based on kube-rbac-proxy
with native Ray token authentication being introduced in Ray 2.52.0

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…ion (ray-project#58729)

## Description

Replace existing KubeRay authentication guide based on kube-rbac-proxy
with native Ray token authentication being introduced in Ray 2.52.0

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community core Issues that should be addressed in Ray Core docs An issue or change related to documentation go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

3 participants