[Core] handling auth_mode=token in python ray.init() calls #57835

sampan-s-nayak · 2025-10-17T06:28:52Z

Description

builds atop of #58047, this pr ensures the following when auth_mode is token:
calling ray.init() (without passing an existing cluster address) -> check if token is present, generate and store in default path if not present
calling ray.init(address="xyz") (connecting to an existing cluster) -> check if token is present, raise exception if one is not present

Signed-off-by: sampan <sampan@anyscale.com>

- Created RayAuthTokenLoader singleton class with thread-safe token caching - Loads tokens from RAY_AUTH_TOKEN env, RAY_AUTH_TOKEN_PATH, or ~/.ray/auth_token - Support for token generation with UUID (cross-platform) - Modified GrpcServer to store and pass auth token to ServerCallImpl - Updated RPC_SERVICE_HANDLER macros to pass auth token - GCS server now loads token using RayAuthTokenLoader - Removed auth_token from RayConfig (now loaded via loader) - Token precedence: env var -> path env var -> default file path Signed-off-by: sampan <sampan@anyscale.com>

- Created Python auth_token_loader module with thread-safe token caching - Loads tokens from same precedence as C++: RAY_AUTH_TOKEN, RAY_AUTH_TOKEN_PATH, ~/.ray/auth_token - Added enable_token_auth parameter to ray.init() with auto-generation support - Added --enable-token-auth flag to ray start CLI (fails if no token found) - Only pass enable_token_auth flag via system_config, not the token - Each side (C++/Python) loads tokens independently using their own loaders - ray.init() auto-generates token if not found, ray start fails with helpful error Signed-off-by: sampan <sampan@anyscale.com>

- Test token loading from RAY_AUTH_TOKEN environment variable - Test token loading from RAY_AUTH_TOKEN_PATH file - Test token loading from default ~/.ray/auth_token path - Test precedence order (env var > path env var > default file) - Test token generation with GetToken(true) - Test token caching behavior - Test thread safety with concurrent GetToken calls - Test whitespace trimming from token files - Test behavior when no token is found Signed-off-by: sampan <sampan@anyscale.com>

- Test token loading from RAY_AUTH_TOKEN environment variable - Test token loading from RAY_AUTH_TOKEN_PATH file - Test token loading from default ~/.ray/auth_token path - Test precedence order (env var > path env var > default file) - Test token generation with generate_if_not_found=True - Test token caching behavior across multiple calls - Test has_auth_token() function - Test thread safety with concurrent loads and generation - Test whitespace handling and empty values - Test file permissions on Unix systems (0600) - Test error handling for permission errors - Test integration with fixtures and cleanup Signed-off-by: sampan <sampan@anyscale.com>

Signed-off-by: sampan <sampan@anyscale.com>

sampan-s-nayak · 2025-10-27T08:00:23Z

@edoakes addressed comments

python/ray/_private/authentication_token_setup.py

edoakes · 2025-10-27T14:59:22Z

python/ray/_private/authentication/authentication_token_setup.py

+        raise
+
+
+def _get_default_token_path() -> Path:


that works for me

edoakes · 2025-10-27T15:00:28Z

python/ray/_private/authentication_token_setup.py

+        is_new_cluster: Set to True if you're starting a new local cluster, or False if you're connecting
+            to an existing cluster.


This is an abstraction leak and also not quite correct because you might be called ray start --head, which is a new cluster, but it should error. Let's just call it what it is: generate_token_if_not_found

in that case let me name this validate_token_exists(bool generate_if_not_exists=False) then depending on the various cases caller can set either generate_if_not_exists=True or False

chat gpt suggested ensure_token_if_auth_enabled which i feel is even clearer.

main purpose is to verify that token is present (generate a new one in certain cases) and fail early if not present. we also validate that enabling auth mode ray config is through env and not through system config

python/ray/_private/authentication/authentication_token_setup.py

python/ray/_private/authentication_token_setup.py

edoakes · 2025-10-27T15:02:00Z

python/ray/tests/test_token_auth_integration.py

+    env_vars_to_clean = [
+        "RAY_AUTH_TOKEN",
+        "RAY_AUTH_TOKEN_PATH",
+        "RAY_auth_mode",
+    ]
+    original_values = {}
+    for var in env_vars_to_clean:
+        original_values[var] = os.environ.get(var)
+        if var in os.environ:
+            del os.environ[var]


use monkeypatch env vars here instead of rewriting it

this causes issues as the env vars are also used by the ray cluster

Signed-off-by: sampan <sampan@anyscale.com>

edoakes · 2025-10-28T15:33:27Z

test failures

Signed-off-by: sampan <sampan@anyscale.com>

…ct#57835) ## Description builds atop of ray-project#58047, this pr ensures the following when `auth_mode` is `token`: calling `ray.init() `(without passing an existing cluster address) -> check if token is present, generate and store in default path if not present calling `ray.init(address="xyz")` (connecting to an existing cluster) -> check if token is present, raise exception if one is not present --------- Signed-off-by: sampan <sampan@anyscale.com> Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com> Co-authored-by: sampan <sampan@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…ct#57835) ## Description builds atop of ray-project#58047, this pr ensures the following when `auth_mode` is `token`: calling `ray.init() `(without passing an existing cluster address) -> check if token is present, generate and store in default path if not present calling `ray.init(address="xyz")` (connecting to an existing cluster) -> check if token is present, raise exception if one is not present --------- Signed-off-by: sampan <sampan@anyscale.com> Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com> Co-authored-by: sampan <sampan@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

sampan added 13 commits October 16, 2025 08:35

[Core] Authentication for ray core rpc calls - part 1

021df65

Signed-off-by: sampan <sampan@anyscale.com>

fix lint errors

fcd1d10

Signed-off-by: sampan <sampan@anyscale.com>

missed change

411f6f4

Signed-off-by: sampan <sampan@anyscale.com>

more lint issues

40fcdb5

Signed-off-by: sampan <sampan@anyscale.com>

fix library

223dbf5

Signed-off-by: sampan <sampan@anyscale.com>

more lint

cc89a63

Signed-off-by: sampan <sampan@anyscale.com>

move python side changes to new pr

c079298

Signed-off-by: sampan <sampan@anyscale.com>

remove unused import

34aa7a3

Signed-off-by: sampan <sampan@anyscale.com>

Merge remote-tracking branch 'upstream/grpc_auth_1' into grpc_auth_2

7bde811

Signed-off-by: sampan <sampan@anyscale.com>

sampan-s-nayak force-pushed the grpc_auth_2 branch from c804ed5 to 7bde811 Compare October 17, 2025 06:30

sampan-s-nayak changed the title ~~[Core] Authentication for ray core rpc calls - part 1~~ [Core] Authentication for ray core rpc calls - part 2 Oct 17, 2025

sampan and others added 15 commits October 17, 2025 07:51

remove generate token method from c++ code

733efca

Signed-off-by: sampan <sampan@anyscale.com>

fix lint

16fd74e

Signed-off-by: sampan <sampan@anyscale.com>

refactor code files

7094efd

Signed-off-by: sampan <sampan@anyscale.com>

fix lint

f56d5ee

Signed-off-by: sampan <sampan@anyscale.com>

fix lint

356a38e

Signed-off-by: sampan <sampan@anyscale.com>

add missing imports

899973e

Signed-off-by: sampan <sampan@anyscale.com>

refactor token loader and tests

47f2e5a

Signed-off-by: sampan <sampan@anyscale.com>

refactor token loader + fix build

d6a87e2

Signed-off-by: sampan <sampan@anyscale.com>

fix lint

e579741

Signed-off-by: sampan <sampan@anyscale.com>

Merge branch 'grpc_auth_1' into grpc_auth_2

99b7c22

fix issues + update tests

8678815

Signed-off-by: sampan <sampan@anyscale.com>

missed change

4274544

Signed-off-by: sampan <sampan@anyscale.com>

fix lint

4a5dda9

Signed-off-by: sampan <sampan@anyscale.com>

address comments - version 1

b20e1ef

Signed-off-by: sampan <sampan@anyscale.com>

fix lint

d1fe7b9

Signed-off-by: sampan <sampan@anyscale.com>

sampan and others added 5 commits October 27, 2025 02:59

fix build.bazel and imports

5910ecf

Signed-off-by: sampan <sampan@anyscale.com>

fix lint

d36e22f

Signed-off-by: sampan <sampan@anyscale.com>

fix lint issues

4063d74

Signed-off-by: sampan <sampan@anyscale.com>

Merge branch 'token_auth_2' into grpc_auth_2

358582a

address comments

9537a00

Signed-off-by: sampan <sampan@anyscale.com>

sampan-s-nayak marked this pull request as ready for review October 27, 2025 06:29

sampan-s-nayak requested a review from a team as a code owner October 27, 2025 06:29

This comment was marked as outdated.

Sign in to view

ray-gardener bot added the core Issues that should be addressed in Ray Core label Oct 27, 2025

sampan-s-nayak requested a review from edoakes October 27, 2025 08:00

edoakes force-pushed the token_auth_2 branch from 4063d74 to 22f7082 Compare October 27, 2025 14:55

edoakes reviewed Oct 27, 2025

View reviewed changes

Base automatically changed from token_auth_2 to master October 28, 2025 00:33

sampan-s-nayak added 2 commits October 28, 2025 08:05

Merge branch 'master' into grpc_auth_2

4f41e50

Merge branch 'master' into grpc_auth_2

ce6e6e2

This comment was marked as outdated.

Sign in to view

sampan and others added 2 commits October 28, 2025 08:50

fix tests

94c5cc6

Signed-off-by: sampan <sampan@anyscale.com>

Merge branch 'master' into grpc_auth_2

92b3f2e

sampan added 3 commits October 28, 2025 15:54

attempt to fix tests

bf4866a

Signed-off-by: sampan <sampan@anyscale.com>

attempt to fix test in CI

61646af

Signed-off-by: sampan <sampan@anyscale.com>

fix lint

5b3cc5b

Signed-off-by: sampan <sampan@anyscale.com>

sampan-s-nayak requested a review from edoakes October 29, 2025 14:17

edoakes approved these changes Oct 30, 2025

View reviewed changes

edoakes merged commit c0f3ee6 into master Oct 30, 2025
6 checks passed

edoakes deleted the grpc_auth_2 branch October 30, 2025 13:54

		is_new_cluster: Set to True if you're starting a new local cluster, or False if you're connecting
		to an existing cluster.

[Core] handling auth_mode=token in python ray.init() calls #57835

[Core] handling auth_mode=token in python ray.init() calls #57835

Uh oh!

Conversation

sampan-s-nayak commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

This comment was marked as outdated.

Uh oh!

sampan-s-nayak commented Oct 27, 2025

Uh oh!

Uh oh!

edoakes Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

edoakes Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sampan-s-nayak Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

sampan-s-nayak Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

sampan-s-nayak Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

edoakes Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

sampan-s-nayak Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

edoakes commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sampan-s-nayak commented Oct 17, 2025 •

edited

Loading