🔄 daily merge: master → main 2025-11-04 #669

antfin-oss · 2025-11-04T02:55:03Z

This Pull Request was created automatically to merge the latest changes from master into main branch.

📅 Created: 2025-11-04
🔀 Merge direction: master → main
🤖 Triggered by: Scheduled

Please review and merge if everything looks good.

## Description This removes an orphaned code file that was previously used by the Preprocessor User Guide. ## Related issues Corresponding User Guide was removed in ray-project#44006. Closes ray-project#57867. ## Additional details This test started failing because of the new `XGBoostTrainer` API enabled by default with Ray Train V2. Rather than update the snippet, removing this code instead. Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>

adding eslint and prettier script to precommit before getting rid of format.sh 1 step closer to replacing scripts/format.sh with pre-commit (pre-commit is currently missing eslint) tested locally: <img width="898" height="929" alt="image" src="https://github.com/user-attachments/assets/58c77fb7-bdde-47ae-ac2b-b864334b3f30" /> --------- Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

First test running on AKS cloud! --------- Signed-off-by: kevin <kevin@anyscale.com> Signed-off-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>

## Description Updating so that the module shows as `ray.train` rather than `ray.train.v2.api.exceptions` ## Testing https://anyscale-ray--57865.com.readthedocs.build/en/57865/train/api/doc/ray.train.v2.api.data_parallel_trainer.DataParallelTrainer.fit.html#ray.train.v2.api.data_parallel_trainer.DataParallelTrainer.fit <img width="960" height="302" alt="image" src="https://github.com/user-attachments/assets/02206542-54fe-4674-b2b4-1868fa7e8580" /> Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>

- Add 2 hello world tests with regular base image & custom image running on GCE --------- Signed-off-by: kevin <kevin@anyscale.com> Signed-off-by: Kevin H. Luu <kevin@anyscale.com>

## Description Bump from small to medium due to timeouts happening specifically in py3.12 tests. --------- Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>

## Why are these changes needed? Computing the `num_module_steps_trained_(lifetime)_throughput` metrics are biased due to the way how we record throughput times in a loop over module batches. This PR offers a fix to this bias. ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: simonsays1980 <simon.zehnder@gmail.com> Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com>

…orker` (ray-project#57859) ## Description The type annotation for `actor_location_tracker` is currently `ActorLocationTracker`, but it should be `ray.actor.ActorHandle[ActorLocationTracker]`. This PR fixes that issue. Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

ray-project#57834) Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>

…r'. (ray-project#57673)   ## Why are these changes needed? The type hints for `learner_connector` in `AlgorithmConfig.training` was deprecated still using the `RLModule` as parameter. This PR adjust type hints to the actual expected form of the callable. ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>

`result_of_t` is deprecated Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

…ectural Design (ray-project#57889) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

- disables java tests; ray java not supported on apple silicon yet. - skipping cpp tests that are not passing yet we already stopped releasing macos wheels for Intel silicon, the tests that are disabled or skipped were never passing on apple silicon, so nothing is regressed. Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

…ay-project#57876) ## Description ## Related issues Closes ray-project#57847 ## Additional information Signed-off-by: daiping8 <dai.ping88@zte.com.cn>

…ystem cgroup (ray-project#57864) For more details about the resource isolation project see ray-project#54703. When starting the head node, move the dashboard api server's subprocesses into the system cgroup. I updated the integration test and added a helpful error message because the test will break in the future when a new dashboard module is added. I ran the integration tests 25 times locally. > (ray2) ubuntu@devbox:~/code/ray2$ python -m pytest -s python/ray/tests/resource_isolation/test_resource_isolation_integration.py --count 25 -x ... collecting ... python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 25% ██▌ 2025-10-17 23:13:51,897 INFO worker.py:1833 -- Connecting to existing Ray cluster at address: 172.31.12.251:6379... 2025-10-17 23:13:51,905 INFO worker.py:2004 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 26% ██▋ 2025-10-17 23:13:57,592 INFO worker.py:1833 -- Connecting to existing Ray cluster at address: 172.31.12.251:6379... 2025-10-17 23:13:57,598 INFO worker.py:2004 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 98% █████████▊2025-10-17 23:19:45,417 INFO worker.py:2004 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 99% █████████▉2025-10-17 23:19:50,194 INFO worker.py:2004 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 python/ray/tests/resource_isolation/test_resource_isolation_integration.py ✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓✓ 100% ██████████ Results (366.41s): 100 passed --------- Signed-off-by: irabbani <israbbani@gmail.com>

…roject#57037) During the execution of tail_job_logs() after the job submission, if the ray head connection breaks, the tail_job_logs() will not raise any error. The error should be raised. Query the rayjob status when receiving the message, and raise error if connection closed with rayjob not in terminate stage. ## Related issue number Closes: ray-project#57002 --------- Signed-off-by: machichima <nary12321@gmail.com>

…ect#57897)

…ay-project#57802) ## Description 1. This PR added the `jax.distributed.shutdown()` for JaxBackend in order to free up any leaked resources on TPU RayTrainWorkers. 2. if `jax.distributed` is not on, it is a noop: https://docs.jax.dev/en/latest/_autosummary/jax.distributed.shutdown.html 3. Tested on Anyscale workspace. <img width="1264" height="62" alt="image" src="https://github.com/user-attachments/assets/f28102ff-f6d1-4da0-b41a-6cc785603e72" />

…ay Serve LLM (ray-project#57830) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

we are not releasing `x86_64` wheels anymore Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

…57817) Signed-off-by: dayshah <dhyey2019@gmail.com>

…igurable (ray-project#57705) Recently, when we ran performance tests with task event generation turned on. We saw some performance regression when the workloads ran on very small CPU machines. With further investigation, the overhead mainly comes from the name format convention when converting the proto message to JSON format payload in the aggregator agent. This PR adds an env var for the config to control the name conversion behavior and update the corresponding tests. Also note that, eventually we are planning to remove this config turn off the field name conversion by default after migrated all the current event usage. --------- Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>

…57861) Signed-off-by: joshlee <joshlee@anyscale.com>

It used to be in 3 different groups, now unionized in 1. Signed-off-by: kevin <kevin@anyscale.com>

…nter (ray-project#56848) * Updated preprocessors to use a callback-based approach for stat computation. This improves code organization and reduces duplication. * Added ValueCounter aggregator and value_counts method to BlockColumnAccessor. Includes implementations for both Arrow and Pandas backends.   ## Why are these changes needed?  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: cem <cem@anyscale.com> Signed-off-by: cem-anyscale <cem@anyscale.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

… only once." (ray-project#57917) This PR fixes the Ray check failure RayEventRecorder::StartExportingEvents() should be called only once.. The failure can occur in the following scenario: - The metric_agent_client successfully establishes a connection with the dashboard agent. In this case, RayEventRecorder::StartExportingEvents is correctly invoked to start sending events. - At the same time, the metric_agent_client exceeds its maximum number of connection retries. In this case, RayEventRecorder::StartExportingEvents is invoked again incorrectly, causing duplicate attempts to start exporting events. This PR introduces two fixes: - In metric_agent_client, the connection success and retry logic are now synchronized (previously they ran asynchronously, allowing both paths to trigger). - Do not call StartExportingEvents if the connection cannot be established. Test: - CI --------- Signed-off-by: Cuong Nguyen <can@anyscale.com>

## Description Ray data can't serialize zero (byte) length numpy arrays: ```python3 import numpy as np import ray.data array = np.empty((2, 0), dtype=np.int8) ds = ray.data.from_items([{"array": array}]) for batch in ds.iter_batches(batch_size=1): print(batch) ``` What I expect to see: ``` {'array': array([], shape=(1, 2, 0), dtype=int8)} ``` What I see: ``` /Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py:736: RuntimeWarning: invalid value encountered in scalar divide offsets = np.arange( 2025-10-17 17:18:09,499 WARNING arrow.py:189 -- Failed to convert column 'array' into pyarrow array due to: Error converting data to Arrow: column: 'array', shape: (1, 2, 0), dtype: int8, data: []; falling back to serialize as pickled python objects Traceback (most recent call last): File "/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py", line 672, in from_numpy return cls._from_numpy(arr) ^^^^^^^^^^^^^^^^^^^^ File "/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py", line 736, in _from_numpy offsets = np.arange( ^^^^^^^^^^ ValueError: arange: cannot compute length The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py", line 141, in convert_to_pyarrow_array return ArrowTensorArray.from_numpy( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py", line 678, in from_numpy raise ArrowConversionError(data_str) from e ray.air.util.tensor_extensions.arrow.ArrowConversionError: Error converting data to Arrow: column: 'array', shape: (1, 2, 0), dtype: int8, data: [] 2025-10-17 17:18:09,789 INFO logging.py:293 -- Registered dataset logger for dataset dataset_0_0 2025-10-17 17:18:09,815 WARNING resource_manager.py:134 -- ⚠️ Ray's object store is configured to use only 33.5% of available memory (2.0GiB out of 6.0GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable. {'array': array([array([], shape=(2, 0), dtype=int8)], dtype=object)} ``` This PR fixes the issue so that zero-length arrays are serialized correctly, and the shape and dtype is preserved. ## Additional information This is `ray==2.50.0`. --------- Signed-off-by: Chris O'Hara <cohara87@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

use awscli directly; stop installing extra dependencies Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

Signed-off-by: joshlee <joshlee@anyscale.com>

## Description Found this while reading the docs. Not sure what this "Note that" is referring to or why it is there. Signed-off-by: Max van Dijck <50382570+MaxVanDijck@users.noreply.github.com>

observability release tests on py3.10 Successful release test run: https://buildkite.com/ray-project/release/builds/65851 failing tests are set to manual (disabled): aws_cluster_launcher_release_image k8s_serve_ha_test enabling agent_stress_test.gce which is now passing --------- Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

## Description Replace `map_batches` and numpy invocations with `with_column` and arrow kernels Release test: https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043 ## Related issues > Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234". ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Goutam <goutam@anyscale.com>

…collate_fn (ray-project#58327) Signed-off-by: Gang Zhao <gang@gang-JQ62HD2C37.local> Co-authored-by: Gang Zhao <gang@gang-JQ62HD2C37.local>

## Description This fixes the symmetric-run cli workflow. Right now if you use `ray symmetric-run` on 2.51 like ``` ray symmetric-run --address 127.0.0.1:6379 -- python my_script.py ``` it will throw since the `symmetric-run` arg is not caught. This was only caught once it became part of the CLI. ## Related issues ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

…t#58247) Updating hello world release & cluster release tests to run on py3.10 Passing release tests: https://buildkite.com/ray-project/release/builds/65844 --------- Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

Fix typos Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>

The current examples describe that label bundles are written as: `[{"ray.io/accelerator-type": "H100"}* 2]`, i.e. a dict * integer. This is wrong it has to be the list that is multiplied. This PR fixes this. Signed-off-by: Daraan <github.blurry@9ox.net>

## Description In this function, `Result::from_path` is implemented in ray train v2, which reconstructs a `Result` object from the checkpoints. This implementation leverages `CheckpointManager` and refers to https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540 --------- Signed-off-by: xgui <xgui@anyscale.com> Signed-off-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com>

Add "WORKDIR /home/ray" in build-docker.sh. If "WORKDIR" is not set, it defaults to /root, causing permission issues with conda init. ``` 31.00 # >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<< 31.00 31.00 Traceback (most recent call last): 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/exception_handler.py", line 18, in __call__ 31.00 return func(*args, **kwargs) 31.00 ^^^^^^^^^^^^^^^^^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/cli/main.py", line 44, in main_subshell 31.00 context.__init__(argparse_args=pre_args) 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/base/context.py", line 517, in __init__ 31.00 self._set_search_path( 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/common/configuration.py", line 1430, in _set_search_path 31.00 self._search_path = IndexedSet(self._expand_search_path(search_path, **kwargs)) 31.00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/boltons/setutils.py", line 118, in __init__ 31.00 self.update(other) 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/boltons/setutils.py", line 351, in update 31.00 for o in other: 31.00 ^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/common/configuration.py", line 1403, in _expand_search_path 31.00 if path.is_file() and ( 31.00 ^^^^^^^^^^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/pathlib.py", line 892, in is_file 31.00 return S_ISREG(self.stat().st_mode) 31.00 ^^^^^^^^^^^ 31.00 File "/home/ray/anaconda3/lib/python3.12/pathlib.py", line 840, in stat 31.00 return os.stat(self, follow_symlinks=follow_symlinks) 31.00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 31.00 PermissionError: [Errno 13] Permission denied: '$XDG_CONFIG_HOME/conda/.condarc' 31.00 31.00 `$ /home/ray/anaconda3/bin/conda init` 31.00 31.00 environment variables: 31.00 CIO_TEST=<not set> 31.00 CONDA_ROOT=/home/ray/anaconda3 31.00 CURL_CA_BUNDLE=<not set> 31.00 HTTPS_PROXY=<set> 31.00 HTTP_PROXY=<set> 31.00 LD_LIBRARY_PATH=:/usr/local/nvidia/lib64 31.00 LD_PRELOAD=<not set> 31.00 NO_PROXY=<set> 31.00 PATH=/home/ray/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/ 31.00 bin:/sbin:/bin:/usr/local/nvidia/bin 31.00 PYTHON_VERSION=3.9 31.00 REQUESTS_CA_BUNDLE=<not set> 31.00 SSL_CERT_FILE=<not set> 31.00 http_proxy=<set> 31.00 https_proxy=<set> 31.00 no_proxy=<set> ``` Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…ct#58320) Signed-off-by: win5923 <ken89@kimo.com>

…oject#58329) Created by release automation bot. Update with commit a69004e Signed-off-by: kevin <kevin@anyscale.com>

… and GRPO. (ray-project#57961) ## Description Example for first blog in the RDT series using NIXL for GPU-GPU tensor transfers. --------- Signed-off-by: Ricardo Decal <public@ricardodecal.com> Signed-off-by: Stephanie Wang <smwang@cs.washington.edu> Co-authored-by: Ricardo Decal <public@ricardodecal.com> Co-authored-by: Stephanie Wang <smwang@cs.washington.edu> Co-authored-by: Qiaolin Yu <liin1211@outlook.com>

python 3.9 is now out of the support window all using python 3.12 wheel names for unit testing Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

we will stop releasing them Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

and move them into bazel dir. getting ready for python version upgrade Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

python 3.9 is out of support window Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

…ect#58375) Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob custom resources. Just add a mention for Yunikorn scheduler. Related to ray-project/kuberay#3948. Signed-off-by: win5923 <ken89@kimo.com>

This PR adds support for token-based authentication in the Ray bi-directional syncer, for both client and server sides. It also includes tests to verify the functionality. --------- Signed-off-by: sampan <sampan@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: sampan <sampan@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

Support token based authentication in runtime env (client and server). refactor existing dashboard head code so that the utils and midleware can be reused by runtime env agent as well --------- Signed-off-by: sampan <sampan@anyscale.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: sampan <sampan@anyscale.com> Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…un. (ray-project#58335) ## Description > Add spark master model validation to let Ray run on Spark-On-YARN mode. ## Why need this? > If we directly run Ray on a YARN cluster, we need to do more tests and integration, and also need to setup related tools and environments. If support ray-on-spark-on-yarn and we already have Spark envs setup, we don't need to do other things, can use Spark and let the user run pyspark. Signed-off-by: Cai Zhanqi <zhanqi.cai@shopee.com> Co-authored-by: Cai Zhanqi <zhanqi.cai@shopee.com>

sourcery-ai

The pull request #669 has too many files changed.

The GitHub API will only let us fetch up to 300 changed files, and this pull request has 5198.

gemini-code-assist · 2025-11-04T03:02:54Z

Summary of Changes

Hello @antfin-oss, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request performs a comprehensive synchronization of the main branch with the latest changes from master, focusing on significant enhancements to the project's build, continuous integration, and dependency management infrastructure. It introduces modern tools like uv and miniforge for Python environments, modularizes Buildkite pipelines for improved maintainability and expanded Python/CUDA support, and refines C++ API implementations. Additionally, the documentation system has been upgraded to provide a more consistent and efficient experience.

Highlights

Bazel and CI Configuration Overhaul: Significant updates to Bazel build flags, including making incompatible_strict_action_env a default, adding workspace_status_command for Linux, and new C++ compiler options for Windows. Buildkite CI pipelines have been extensively refactored, with image build steps moved to a new _images.rayci.yml file, and new dedicated build steps for ray-core, ray-dashboard, and ray-java components. Python 3.10 and CUDA 12.8 support have been expanded across various CI jobs.
Dependency Management Modernization: The dependency management system has been upgraded, transitioning from miniconda to miniforge in Dockerfiles and build scripts. A new raydepsets system has been introduced for more robust and configurable Python dependency compilation, with dedicated configuration files and pre-hook scripts.
Enhanced Linting and Code Ownership: Code quality checks have been bolstered with updates to .github/CODEOWNERS for clearer module ownership and substantial additions to .pre-commit-config.yaml. New linters such as semgrep, vale, cython-lint, and eslint have been integrated to enforce stricter code style and identify potential issues.
C++ API and Runtime Refinements: Numerous changes in the C++ codebase include fixes in actor/task callers, metric recording, and runtime configuration. The GetNodeIpAddress utility has been replaced with network_util.h for more reliable IP address resolution, and protobuf includes have been updated to use public definitions.
Documentation Infrastructure Improvements: The documentation system has received updates, including a new style guide, integration with Google Tag Manager for analytics, and the addition of a dismissable banner. The doc build process now leverages cached artifacts for faster execution.

Ignored Files

Ignored by pattern: .gemini/** (1)
- .gemini/config.yaml
Ignored by pattern: .github/workflows/** (1)
- .github/workflows/stale_pull_request.yaml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is an automated daily merge from master to main. It incorporates a very large number of changes, primarily focused on a significant refactoring of the CI/CD pipelines and the Bazel build system. Key themes include:

CI/CD Refactoring: The Buildkite pipelines have been extensively reorganized for better modularity. Build steps are broken down into smaller, more manageable pieces, and image building has been separated into its own group. A new dependency management system, raydepsets, has been introduced to handle Python dependencies more robustly.
Bazel Build System Overhaul: The root BUILD.bazel file has been significantly cleaned up, with many targets moved into sub-packages. The build now uses standard rules_pkg for artifact packaging instead of custom genrules, which is a great improvement for standardization and maintainability.
Dependency and Tooling Updates: Numerous dependencies have been updated, and the pre-commit configuration has been enhanced with more powerful linting and formatting tools like semgrep, vale, and eslint.
Platform Changes: CI for macOS is now focused on arm64, with x86_64 support being phased out.

The code changes across the repository are largely adaptations to these foundational improvements in the build and CI systems. The overall changes are very positive for the project's health, improving build reliability, developer experience, and code quality enforcement. I did not find any issues of medium severity or higher.

github-actions · 2025-11-19T01:45:42Z

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

matthewdeng and others added 30 commits October 18, 2025 04:25

[release] Hello world test for Azure (ray-project#57597)

85a7acb

First test running on AKS cloud! --------- Signed-off-by: kevin <kevin@anyscale.com> Signed-off-by: Kevin H. Luu <kevin@anyscale.com> Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>

[release] Hello world release test on GCE (ray-project#57695)

943b9ae

- Add 2 hello world tests with regular base image & custom image running on GCE --------- Signed-off-by: kevin <kevin@anyscale.com> Signed-off-by: Kevin H. Luu <kevin@anyscale.com>

[train] bump test_torch_trainer timeout (ray-project#57873)

2fc7193

## Description Bump from small to medium due to timeouts happening specifically in py3.12 tests. --------- Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>

[Core] Reschedule leases in local lease manager when draining the node (

993139e

ray-project#57834) Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>

[core] use invoke_result_t in cpp worker example (ray-project#57885)

697c7bc

`result_of_t` is deprecated Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

[serve][llm][refactor] Align Ray Serve LLM Code Structure with Archit…

de50b23

…ectural Design (ray-project#57889) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

[Doc][Serve] Import AutoscalingContext in autoscaling policy example (r…

b988ce4

…ay-project#57876) ## Description ## Related issues Closes ray-project#57847 ## Additional information Signed-off-by: daiping8 <dai.ping88@zte.com.cn>

removed adding the TaskPoolStrategy as it's not needed here (ray-proj…

b4f7a70

…ect#57897)

[docs][serve][llm] Add comprehensive architecture documentation for R…

3287523

…ay Serve LLM (ray-project#57830) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>

[release auto] remove x86_64 wheel verification (ray-project#57913)

6d51184

we are not releasing `x86_64` wheels anymore Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

[core] Kill raylet file and just keep node manager file (ray-project#…

532ac12

…57817) Signed-off-by: dayshah <dhyey2019@gmail.com>

[core] Make DrainRaylet + ShutdownRaylet Fault Tolerant (ray-project#…

2bbd13a

…57861) Signed-off-by: joshlee <joshlee@anyscale.com>

[release] Group all hello world tests together (ray-project#57920)

670151e

It used to be in 3 different groups, now unionized in 1. Signed-off-by: kevin <kevin@anyscale.com>

[ci] fix postmerge tests that require credentials (ray-project#57915)

034c54f

use awscli directly; stop installing extra dependencies Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

[core] Make ReleaseUnusedBundles Fault Tolerant (ray-project#57786)

a9065a3

Signed-off-by: joshlee <joshlee@anyscale.com>

[doc] remove "Note that" in dataset.py documentation (ray-project#57884)

f2aa5a8

## Description Found this while reading the docs. Not sure what this "Note that" is referring to or why it is there. Signed-off-by: Max van Dijck <50382570+MaxVanDijck@users.noreply.github.com>

elliot-barn and others added 20 commits October 31, 2025 13:26

[Template] Update image-search-and-classification to pass device for …

b6e6210

…collate_fn (ray-project#58327) Signed-off-by: Gang Zhao <gang@gang-JQ62HD2C37.local> Co-authored-by: Gang Zhao <gang@gang-JQ62HD2C37.local>

Fix typos (ray-project#58349)

4b64508

Fix typos Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>

[Docs][KubeRay] Add Volcano RayJob gang scheduling example (ray-proje…

91ac4c7

…ct#58320) Signed-off-by: win5923 <ken89@kimo.com>

[docker] Update latest Docker dependencies for 2.51.0 release (ray-pr…

c90aacc

…oject#58329) Created by release automation bot. Update with commit a69004e Signed-off-by: kevin <kevin@anyscale.com>

[wheel] stop uploading python 3.9 wheels on release (ray-project#58363)

a64b756

python 3.9 is now out of the support window all using python 3.12 wheel names for unit testing Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

[ci] stop verifying python 3.9 wheels (ray-project#58365)

8f466d7

we will stop releasing them Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

[bazel] rename python runtime to py39 runtime (ray-project#58362)

44e8b1d

and move them into bazel dir. getting ready for python version upgrade Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

[image] stop building python 3.9 release images (ray-project#58374)

d3d6b6b

python 3.9 is out of support window Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

antfin-oss requested review from SongGuyang and kfstorm as code owners November 4, 2025 02:55

antfin-oss added auto-generated daily-merge labels Nov 4, 2025

antfin-oss assigned ffbin Nov 4, 2025

sourcery-ai bot reviewed Nov 4, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 4, 2025

View reviewed changes

github-actions bot added the stale label Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🔄 daily merge: master → main 2025-11-04 #669

🔄 daily merge: master → main 2025-11-04 #669

Uh oh!

antfin-oss commented Nov 4, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

gemini-code-assist bot commented Nov 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

82 participants

🔄 daily merge: master → main 2025-11-04 #669

Are you sure you want to change the base?

🔄 daily merge: master → main 2025-11-04 #669

Uh oh!

Conversation

antfin-oss commented Nov 4, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot commented Nov 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

82 participants