Skip to content

Conversation

@antfin-oss
Copy link

This Pull Request was created automatically to merge the latest changes from master into main branch.

πŸ“… Created: 2025-11-04
πŸ”€ Merge direction: master β†’ main
πŸ€– Triggered by: Scheduled

Please review and merge if everything looks good.

matthewdeng and others added 30 commits October 18, 2025 04:25
## Description
This removes an orphaned code file that was previously used by the
Preprocessor User Guide.

## Related issues
Corresponding User Guide was removed in ray-project#44006.
Closes ray-project#57867.

## Additional details
This test started failing because of the new `XGBoostTrainer` API
enabled by default with Ray Train V2. Rather than update the snippet,
removing this code instead.

Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
adding eslint and prettier script to precommit before getting rid of
format.sh

1 step closer to replacing scripts/format.sh with pre-commit (pre-commit
is currently missing eslint)

tested locally: 
<img width="898" height="929" alt="image"
src="https://github.com/user-attachments/assets/58c77fb7-bdde-47ae-ac2b-b864334b3f30"
/>

---------

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
First test running on AKS cloud!

---------

Signed-off-by: kevin <kevin@anyscale.com>
Signed-off-by: Kevin H. Luu <kevin@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
- Add 2 hello world tests with regular base image & custom image running
on GCE

---------

Signed-off-by: kevin <kevin@anyscale.com>
Signed-off-by: Kevin H. Luu <kevin@anyscale.com>
## Description

Bump from small to medium due to timeouts happening specifically in
py3.12 tests.

---------

Signed-off-by: Matthew Deng <matthew.j.deng@gmail.com>
## Why are these changes needed?

Computing the `num_module_steps_trained_(lifetime)_throughput` metrics
are biased due to the way how we record throughput times in a loop over
module batches. This PR offers a fix to this bias.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com>
…orker` (ray-project#57859)

## Description

The type annotation for `actor_location_tracker` is currently
`ActorLocationTracker`, but it should be
`ray.actor.ActorHandle[ActorLocationTracker]`. This PR fixes that issue.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
…r'. (ray-project#57673)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

The type hints for `learner_connector` in `AlgorithmConfig.training` was
deprecated still using the `RLModule` as parameter. This PR adjust type
hints to the actual expected form of the callable.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run pre-commit jobs to lint the changes in this PR.
([pre-commit
setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting))
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
`result_of_t` is deprecated

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…ectural Design (ray-project#57889)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
- disables java tests; ray java not supported on apple silicon yet.    
- skipping cpp tests that are not passing yet

we already stopped releasing macos wheels for Intel silicon, the tests
that are disabled or skipped were never passing on apple silicon, so
nothing is regressed.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…ay-project#57876)

## Description

## Related issues
Closes ray-project#57847

## Additional information

Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
…ystem cgroup (ray-project#57864)

For more details about the resource isolation project see
ray-project#54703.

When starting the head node, move the dashboard api server's
subprocesses into the system cgroup. I updated the integration test and
added a helpful error message because the test will break in the future
when a new dashboard module is added.

I ran the integration tests 25 times locally. 

> (ray2) ubuntu@devbox:~/code/ray2$ python -m pytest -s
python/ray/tests/resource_isolation/test_resource_isolation_integration.py
--count 25 -x
...
collecting ... 

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 25% β–ˆβ–ˆβ–Œ 2025-10-17 23:13:51,897 INFO
worker.py:1833 -- Connecting to existing Ray cluster at address:
172.31.12.251:6379...
2025-10-17 23:13:51,905 INFO worker.py:2004 -- Connected to Ray cluster.
View the dashboard at http://127.0.0.1:8265

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 26% β–ˆβ–ˆβ–‹ 2025-10-17 23:13:57,592 INFO
worker.py:1833 -- Connecting to existing Ray cluster at address:
172.31.12.251:6379...
2025-10-17 23:13:57,598 INFO worker.py:2004 -- Connected to Ray cluster.
View the dashboard at http://127.0.0.1:8265

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“ 98% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š2025-10-17 23:19:45,417 INFO
worker.py:2004 -- Started a local Ray instance. View the dashboard at
http://127.0.0.1:8265

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“
99% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰2025-10-17 23:19:50,194 INFO worker.py:2004 -- Started a
local Ray instance. View the dashboard at http://127.0.0.1:8265

python/ray/tests/resource_isolation/test_resource_isolation_integration.py
βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“βœ“
100% β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
Results (366.41s):
     100 passed

---------

Signed-off-by: irabbani <israbbani@gmail.com>
…roject#57037)

During the execution of tail_job_logs() after the job submission, if the
ray head connection breaks, the tail_job_logs() will not raise any
error. The error should be raised.

Query the rayjob status when receiving the message, and raise error if
connection closed with rayjob not in terminate stage.

## Related issue number

Closes: ray-project#57002

---------

Signed-off-by: machichima <nary12321@gmail.com>
…ay-project#57802)

## Description

1. This PR added the `jax.distributed.shutdown()` for JaxBackend in
order to free up any leaked resources on TPU RayTrainWorkers.
2. if `jax.distributed` is not on, it is a noop:
https://docs.jax.dev/en/latest/_autosummary/jax.distributed.shutdown.html
3. Tested on Anyscale workspace.
<img width="1264" height="62" alt="image"
src="https://github.com/user-attachments/assets/f28102ff-f6d1-4da0-b41a-6cc785603e72"
/>
…ay Serve LLM (ray-project#57830)

Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
we are not releasing `x86_64` wheels anymore

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…igurable (ray-project#57705)

Recently, when we ran performance tests with task event generation
turned on. We saw some performance regression when the workloads ran on
very small CPU machines. With further investigation, the overhead mainly
comes from the name format convention when converting the proto message
to JSON format payload in the aggregator agent.

This PR adds an env var for the config to control the name conversion
behavior and update the corresponding tests.

Also note that, eventually we are planning to remove this config turn
off the field name conversion by default after migrated all the current
event usage.

---------

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
It used to be in 3 different groups, now unionized in 1.

Signed-off-by: kevin <kevin@anyscale.com>
…nter (ray-project#56848)

* Updated preprocessors to use a callback-based approach for stat
computation. This improves code organization and reduces duplication.
* Added ValueCounter aggregator and value_counts method to
BlockColumnAccessor. Includes implementations for both Arrow and Pandas
backends.

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: cem <cem@anyscale.com>
Signed-off-by: cem-anyscale <cem@anyscale.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… only once." (ray-project#57917)

This PR fixes the Ray check failure
RayEventRecorder::StartExportingEvents() should be called only once..
The failure can occur in the following scenario:
- The metric_agent_client successfully establishes a connection with the
dashboard agent. In this case, RayEventRecorder::StartExportingEvents is
correctly invoked to start sending events.
- At the same time, the metric_agent_client exceeds its maximum number
of connection retries. In this case,
RayEventRecorder::StartExportingEvents is invoked again incorrectly,
causing duplicate attempts to start exporting events.

This PR introduces two fixes:
- In metric_agent_client, the connection success and retry logic are now
synchronized (previously they ran asynchronously, allowing both paths to
trigger).
- Do not call StartExportingEvents if the connection cannot be
established.

Test:
- CI

---------

Signed-off-by: Cuong Nguyen <can@anyscale.com>
## Description

Ray data can't serialize zero (byte) length numpy arrays:

```python3
import numpy as np
import ray.data

array = np.empty((2, 0), dtype=np.int8)

ds = ray.data.from_items([{"array": array}])

for batch in ds.iter_batches(batch_size=1):
     print(batch)
```

What I expect to see:

```
{'array': array([], shape=(1, 2, 0), dtype=int8)}
```

What I see:

```
/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py:736: RuntimeWarning: invalid value encountered in scalar divide
  offsets = np.arange(
2025-10-17 17:18:09,499 WARNING arrow.py:189 -- Failed to convert column 'array' into pyarrow array due to: Error converting data to Arrow: column: 'array', shape: (1, 2, 0), dtype: int8, data: []; falling back to serialize as pickled python objects
Traceback (most recent call last):
  File "/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py", line 672, in from_numpy
    return cls._from_numpy(arr)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py", line 736, in _from_numpy
    offsets = np.arange(
              ^^^^^^^^^^
ValueError: arange: cannot compute length

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py", line 141, in convert_to_pyarrow_array
    return ArrowTensorArray.from_numpy(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chris.ohara/Downloads/.venv/lib/python3.12/site-packages/ray/air/util/tensor_extensions/arrow.py", line 678, in from_numpy
    raise ArrowConversionError(data_str) from e
ray.air.util.tensor_extensions.arrow.ArrowConversionError: Error converting data to Arrow: column: 'array', shape: (1, 2, 0), dtype: int8, data: []
2025-10-17 17:18:09,789 INFO logging.py:293 -- Registered dataset logger for dataset dataset_0_0
2025-10-17 17:18:09,815 WARNING resource_manager.py:134 -- ⚠️  Ray's object store is configured to use only 33.5% of available memory (2.0GiB out of 6.0GiB total). For optimal Ray Data performance, we recommend setting the object store to at least 50% of available memory. You can do this by setting the 'object_store_memory' parameter when calling ray.init() or by setting the RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION environment variable.
{'array': array([array([], shape=(2, 0), dtype=int8)], dtype=object)}
```

This PR fixes the issue so that zero-length arrays are serialized
correctly, and the shape and dtype is preserved.

## Additional information

This is `ray==2.50.0`.

---------

Signed-off-by: Chris O'Hara <cohara87@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
use awscli directly; stop installing extra dependencies

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
Signed-off-by: joshlee <joshlee@anyscale.com>
## Description
Found this while reading the docs. Not sure what this "Note that" is
referring to or why it is there.

Signed-off-by: Max van Dijck <50382570+MaxVanDijck@users.noreply.github.com>
elliot-barn and others added 20 commits October 31, 2025 13:26
observability release tests on py3.10

Successful release test run:
https://buildkite.com/ray-project/release/builds/65851
failing tests are set to manual (disabled):
aws_cluster_launcher_release_image
k8s_serve_ha_test

enabling agent_stress_test.gce which is now passing

---------

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
## Description
Replace `map_batches` and numpy invocations with `with_column` and arrow
kernels

Release test:
https://buildkite.com/ray-project/release/builds/66243#019a37da-4d9d-4f19-9180-e3f3dc3f8043

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <goutam@anyscale.com>
…collate_fn (ray-project#58327)

Signed-off-by: Gang Zhao <gang@gang-JQ62HD2C37.local>
Co-authored-by: Gang Zhao <gang@gang-JQ62HD2C37.local>
## Description

This fixes the symmetric-run cli workflow.

Right now if you use `ray symmetric-run` on 2.51 like 
```
 ray symmetric-run --address 127.0.0.1:6379 -- python my_script.py   
```

it will throw since the `symmetric-run` arg is not caught. This was only
caught once it became part of the CLI.

## Related issues

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
…t#58247)

Updating hello world release & cluster release tests to run on py3.10

Passing release tests:
https://buildkite.com/ray-project/release/builds/65844

---------

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Fix typos

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
The current examples describe that label bundles are written as:

`[{"ray.io/accelerator-type": "H100"}* 2]`, i.e. a dict * integer.
This is wrong it has to be the list that is multiplied.
This PR fixes this.

Signed-off-by: Daraan <github.blurry@9ox.net>
## Description

In this function, `Result::from_path` is implemented in ray train v2,
which reconstructs a `Result` object from the checkpoints. This
implementation leverages `CheckpointManager` and refers to
https://github.com/ray-project/ray/blob/master/python/ray/train/v2/_internal/execution/controller/controller.py#L512-L540

---------

Signed-off-by: xgui <xgui@anyscale.com>
Signed-off-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Add "WORKDIR /home/ray" in build-docker.sh.

If "WORKDIR" is not set, it defaults to /root, causing permission issues
with conda init.

```
31.00 # >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
31.00 
31.00     Traceback (most recent call last):
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/exception_handler.py", line 18, in __call__
31.00         return func(*args, **kwargs)
31.00                ^^^^^^^^^^^^^^^^^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/cli/main.py", line 44, in main_subshell
31.00         context.__init__(argparse_args=pre_args)
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/base/context.py", line 517, in __init__
31.00         self._set_search_path(
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/common/configuration.py", line 1430, in _set_search_path
31.00         self._search_path = IndexedSet(self._expand_search_path(search_path, **kwargs))
31.00                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/boltons/setutils.py", line 118, in __init__
31.00         self.update(other)
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/boltons/setutils.py", line 351, in update
31.00         for o in other:
31.00                  ^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/site-packages/conda/common/configuration.py", line 1403, in _expand_search_path
31.00         if path.is_file() and (
31.00            ^^^^^^^^^^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/pathlib.py", line 892, in is_file
31.00         return S_ISREG(self.stat().st_mode)
31.00                        ^^^^^^^^^^^
31.00       File "/home/ray/anaconda3/lib/python3.12/pathlib.py", line 840, in stat
31.00         return os.stat(self, follow_symlinks=follow_symlinks)
31.00                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
31.00     PermissionError: [Errno 13] Permission denied: '$XDG_CONFIG_HOME/conda/.condarc'
31.00 
31.00 `$ /home/ray/anaconda3/bin/conda init`
31.00 
31.00   environment variables:
31.00                  CIO_TEST=<not set>
31.00                CONDA_ROOT=/home/ray/anaconda3
31.00            CURL_CA_BUNDLE=<not set>
31.00               HTTPS_PROXY=<set>
31.00                HTTP_PROXY=<set>
31.00           LD_LIBRARY_PATH=:/usr/local/nvidia/lib64
31.00                LD_PRELOAD=<not set>
31.00                  NO_PROXY=<set>
31.00                      PATH=/home/ray/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/
31.00                           bin:/sbin:/bin:/usr/local/nvidia/bin
31.00            PYTHON_VERSION=3.9
31.00        REQUESTS_CA_BUNDLE=<not set>
31.00             SSL_CERT_FILE=<not set>
31.00                http_proxy=<set>
31.00               https_proxy=<set>
31.00                  no_proxy=<set>
```

Signed-off-by: my-vegetable-has-exploded <wy1109468038@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…oject#58329)

Created by release automation bot.

Update with commit a69004e

Signed-off-by: kevin <kevin@anyscale.com>
… and GRPO. (ray-project#57961)

## Description
Example for first blog in the RDT series using NIXL for GPU-GPU tensor
transfers.

---------

Signed-off-by: Ricardo Decal <public@ricardodecal.com>
Signed-off-by: Stephanie Wang <smwang@cs.washington.edu>
Co-authored-by: Ricardo Decal <public@ricardodecal.com>
Co-authored-by: Stephanie Wang <smwang@cs.washington.edu>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
python 3.9 is now out of the support window

all using python 3.12 wheel names for unit testing

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
we will stop releasing them

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
and move them into bazel dir.
getting ready for python version upgrade

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
python 3.9 is out of support window

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…ect#58375)

Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob
custom resources.
Just add a mention for Yunikorn scheduler.

Related to ray-project/kuberay#3948.

Signed-off-by: win5923 <ken89@kimo.com>
This PR adds support for token-based authentication in the Ray
bi-directional syncer, for both client and server sides. It also
includes tests to verify the functionality.

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Support token based authentication in runtime env (client and server).
refactor existing dashboard head code so that the utils and midleware
can be reused by runtime env agent as well

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…un. (ray-project#58335)

## Description
> Add spark master model validation to let Ray run on Spark-On-YARN
mode.

## Why need this?
> If we directly run Ray on a YARN cluster, we need to do more tests and
integration, and also need to setup related tools and environments. If
support ray-on-spark-on-yarn and we already have Spark envs setup, we
don't need to do other things, can use Spark and let the user run
pyspark.

Signed-off-by: Cai Zhanqi <zhanqi.cai@shopee.com>
Co-authored-by: Cai Zhanqi <zhanqi.cai@shopee.com>
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request #669 has too many files changed.

The GitHub API will only let us fetch up to 300 changed files, and this pull request has 5198.

@gemini-code-assist
Copy link

Summary of Changes

Hello @antfin-oss, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request performs a comprehensive synchronization of the main branch with the latest changes from master, focusing on significant enhancements to the project's build, continuous integration, and dependency management infrastructure. It introduces modern tools like uv and miniforge for Python environments, modularizes Buildkite pipelines for improved maintainability and expanded Python/CUDA support, and refines C++ API implementations. Additionally, the documentation system has been upgraded to provide a more consistent and efficient experience.

Highlights

  • Bazel and CI Configuration Overhaul: Significant updates to Bazel build flags, including making incompatible_strict_action_env a default, adding workspace_status_command for Linux, and new C++ compiler options for Windows. Buildkite CI pipelines have been extensively refactored, with image build steps moved to a new _images.rayci.yml file, and new dedicated build steps for ray-core, ray-dashboard, and ray-java components. Python 3.10 and CUDA 12.8 support have been expanded across various CI jobs.
  • Dependency Management Modernization: The dependency management system has been upgraded, transitioning from miniconda to miniforge in Dockerfiles and build scripts. A new raydepsets system has been introduced for more robust and configurable Python dependency compilation, with dedicated configuration files and pre-hook scripts.
  • Enhanced Linting and Code Ownership: Code quality checks have been bolstered with updates to .github/CODEOWNERS for clearer module ownership and substantial additions to .pre-commit-config.yaml. New linters such as semgrep, vale, cython-lint, and eslint have been integrated to enforce stricter code style and identify potential issues.
  • C++ API and Runtime Refinements: Numerous changes in the C++ codebase include fixes in actor/task callers, metric recording, and runtime configuration. The GetNodeIpAddress utility has been replaced with network_util.h for more reliable IP address resolution, and protobuf includes have been updated to use public definitions.
  • Documentation Infrastructure Improvements: The documentation system has received updates, including a new style guide, integration with Google Tag Manager for analytics, and the addition of a dismissable banner. The doc build process now leverages cached artifacts for faster execution.
Ignored Files
  • Ignored by pattern: .gemini/** (1)
    • .gemini/config.yaml
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/stale_pull_request.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an automated daily merge from master to main. It incorporates a very large number of changes, primarily focused on a significant refactoring of the CI/CD pipelines and the Bazel build system. Key themes include:

  • CI/CD Refactoring: The Buildkite pipelines have been extensively reorganized for better modularity. Build steps are broken down into smaller, more manageable pieces, and image building has been separated into its own group. A new dependency management system, raydepsets, has been introduced to handle Python dependencies more robustly.
  • Bazel Build System Overhaul: The root BUILD.bazel file has been significantly cleaned up, with many targets moved into sub-packages. The build now uses standard rules_pkg for artifact packaging instead of custom genrules, which is a great improvement for standardization and maintainability.
  • Dependency and Tooling Updates: Numerous dependencies have been updated, and the pre-commit configuration has been enhanced with more powerful linting and formatting tools like semgrep, vale, and eslint.
  • Platform Changes: CI for macOS is now focused on arm64, with x86_64 support being phased out.

The code changes across the repository are largely adaptations to these foundational improvements in the build and CI systems. The overall changes are very positive for the project's health, improving build reliability, developer experience, and code quality enforcement. I did not find any issues of medium severity or higher.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale label Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.