Skip to content

Conversation

@antfin-oss
Copy link

This Pull Request was created automatically to merge the latest changes from master into main branch.

πŸ“… Created: 2025-11-28
πŸ”€ Merge direction: master β†’ main
πŸ€– Triggered by: Scheduled

Please review and merge if everything looks good.

alexeykudinkin and others added 30 commits November 12, 2025 08:50
…_FACTOR` to 2 (ray-project#58262)

> Thank you for contributing to Ray! πŸš€
> Please review the [Ray Contribution
Guide](https://docs.ray.io/en/master/ray-contribute/getting-involved.html)
before opening a pull request.

> ⚠️ Remove these instructions before submitting your PR.

> πŸ’‘ Tip: Mark as draft if you want early feedback, or ready for review
when it's complete.

## Description

This was setting the value to be aligned with the previous default of 4.

However, after some consideration i've realized that 4 is too high of a
number so actually lowering this to 2

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
…y-project#58523)

## Description

This PR improves documentation consistency in the `python/ray/data`
module by converting all remaining rST-style docstrings (`:param:`,
`:return:`, etc.) to Google-style format (`Args:`, `Returns:`, etc.).

## Additional information

**Files modified:**
- `python/ray/data/preprocessors/utils.py` - Converted
`StatComputationPlan.add_callable_stat()`
- `python/ray/data/preprocessors/encoder.py` - Converted
`unique_post_fn()`
- `python/ray/data/block.py` - Converted `BlockColumnAccessor.hash()`
and `BlockColumnAccessor.is_composed_of_lists()`
- `python/ray/data/_internal/datasource/delta_sharing_datasource.py` -
Converted `DeltaSharingDatasource.setup_delta_sharing_connections()`

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
…oject#58549)

## Description

The original `test_concurrency` function combined multiple test
scenarios into a single test with complex control flow and expensive Ray
cluster initialization. This refactoring extracts the parameter
validation tests into focused, independent tests that are faster,
clearer, and easier to maintain.

Additionally, the original test included "validation" cases that tested
valid concurrency parameters but didn't actually verify that concurrency
was being limited correctlyβ€”they only checked that the output was
correct, which isn't useful for validating the concurrency feature
itself.

**Key improvements:**
- Split validation tests into `test_invalid_func_concurrency_raises` and
`test_invalid_class_concurrency_raises`
- Use parametrized tests for different invalid concurrency values
- Switch from `shutdown_only` with explicit `ray.init()` to
`ray_start_regular_shared` to eliminate cluster initialization overhead
- Minimize test data from 10 blocks to 1 element since we're only
validating parameter errors
- Remove non-validation tests that didn't verify concurrency behavior

## Related issues

N/A

## Additional information

The validation tests now execute significantly faster and provide
clearer failure messages. Each test has a single, well-defined purpose
making maintenance and debugging easier.

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
previously it was actually using 0.4.0, which is set up by the grpc
repo. the declaration in the workspace file was being shadowed..

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
## Description
Creates a ranker interface that will rank the best operator to run next
in `select_operator_to_run`. This code only refractors the existing
code. The ranking value must be something that is comparable.

## Related issues
None

## Additional information
None

---------

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…#57783)

1. JaxTrainer relying on the runtime env var "JAX_PLATFORMS" to be set
to initialize jax.distributed:
https://github.com/ray-project/ray/blob/master/python/ray/train/v2/jax/config.py#L38
2. Before this change, user will have to configure both `use_tpu=True`
in `ray.train.ScalingConfig` and passing `JAX_PLATFORMS=tpu` to be able
to start jax.distributed. `JAX_PLATFORMS` can be comma separated string.
3. If user uses other jax.distributed libraries like Orbax, sometimes,
it will leads to misleading error about distributed initialization.
4. After this change, if user sets `use_tpu=True`, we automatically add
this to env var.
5. tpu unit test is not available this time, will explore for how to
cover it later.


---------

Signed-off-by: Lehui Liu <lehui@anyscale.com>
and ask people to use that lock file for building docs.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…regression (ray-project#58390)

## Description
This PR address the performance regression introduced in the [PR to make
ray.get thread safe](ray-project#57911).
Specifically, the previous PR requires the worker to block and wait for
AsyncGet to return with a reply of the request id needed for correctly
cleaning up get requests. This additional synchronous step causes the
plasma store Get to regress in performance.

This PR moves the request id generation step to the plasma store,
removing the blocking step to fix the perf regression.

## Related issues
- [PR which introduced perf
regression](ray-project#57911)
- [PR which observed the
regression](ray-project#58175)

## Additional information
New performance of the change measured by `ray microbenchmark`.
<img width="485" height="17" alt="image"
src="https://github.com/user-attachments/assets/b96b9676-3735-4e94-9ade-aaeb7514f4d0"
/>

Original performance prior to the change. Here we focus on the
regressing `single client get calls (Plasma Store)` metric, where our
new performance returns us back to the original 10k per second range
compared to the existing sub 5k per second.
<img width="811" height="355" alt="image"
src="https://github.com/user-attachments/assets/d1fecf82-708e-48c4-9879-34c59a5e056c"
/>

---------

Signed-off-by: davik <davik@anyscale.com>
Co-authored-by: davik <davik@anyscale.com>
## Description
support token auth in ray client server by using the existing grpc
interceptors. This pr refactors the code to:
- add/rename sync and async client and server interceptors
- create grpc utils to house grpc channel and server creation logic,
python codebase is updated to use these methods
- separate tests for sync and async interceptors
- make existing authentication integration tests to run with RAY_CLIENT
mode

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…oject#58371)

## Description
Currently Ray Data has a preprocessor called `RobustScaler`. This scales
the data based on given quantiles. Calculating the quantiles involves
sorting the entire dataset by column for each column (C sorts for C
number of columns), which, for a large dataset, will require a lot of
calculations.

** MAJOR EDIT **: had to replace the original `tdigest` with `ddsketch`
as I couldn't actually find well-maintained tdigest libraries for
python. ddsketch is better maintained.

** MAJOR EDIT 2 **: discussed offline to use `ApproximateQuantile`
aggregator

## Related issues
N/A

## Additional information
N/A

---------

Signed-off-by: kyuds <kyuseung1016@gmail.com>
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
generating depsets for base extra python requirements
Installing requirements in base extra image

---------

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…#58499)

which is more accurate

also moves python constraint definitions into `bazel/` directory and
registering python 3.10 platform with hermetic toolchain

this allows performing migration from python 3.19 to python 3.10
incrementally

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
we stop supporting building with python 3.9 now

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…ct#58576)

Using GetNodeAddressAndLiveness in raylet client pool instead of the
bulkier Get, same for AsyncGetAll. Seems like it was already done in
core worker client pool, so just making the same change for raylet
client pool.

Signed-off-by: joshlee <joshlee@anyscale.com>
…bles (ray-project#58270)

## Description
- Support upserting iceberg tables for IcebergDatasink
- Update schema on APPEND and UPSERT
- Enable overwriting the entire table

Upgrades to pyicberg 0.10.0 because it now supports upsert and overwrite
functionality. Also for append, the library now handles the transaction
logic implicitly so that burden can be lifted from Ray Data.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <goutam@anyscale.com>
as the pydantic version is pinned in `requirements-doc.txt` now.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
nothing is using it anymore

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…58580)

Adding optional `include_setuptools` flag for depset configuration

If the flag is set on a depset config --unsafe-package setuptools will
not be included for depset compilation

If the flag does not exist (default false) on a depset config
--unsafe-package setuptools will be appended to the default arguments

---------

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Co-authored-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
otherwise, the newer docker client will refuse to communicate with the
docker daemon that is on an older version.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…ay-project#58542)

## What does this PR do?
   
Fixes HTTP streaming file downloads in Ray Data's download operation.
Some URIs (especially HTTP streams) require `open_input_stream` instead
of `open_input_file`.
   
   ## Changes
   
- Modified `download_bytes_threaded` in `plan_download_op.py` to try
both `open_input_file` and `open_input_stream` for each URI
- Improved error handling to distinguish between different error types
   - Failed downloads now return `None` gracefully instead of crashing
   
   ## Testing
```
import pyarrow as pa
from ray.data.context import DataContext
from ray.data._internal.planner.plan_download_op import download_bytes_threaded

# Test URLs: one valid, one 404
urls = [    
    "https://static-assets.tesla.com/configurator/compositor?context=design_studio_2?&bkba_opt=1&view=STUD_3QTR&size=600&model=my&options=$APBS,$IPB7,$PPSW,$SC04,$MDLY,$WY19P,$MTY46,$STY5S,$CPF0,$DRRH&crop=1150,647,390,180&",
]

# Create PyArrow table and call download function
table = pa.table({"url": urls})
ctx = DataContext.get_current()
results = list(download_bytes_threaded(table, ["url"], ["bytes"], ctx))

# Check results
result_table = results[0]
for i in range(result_table.num_rows):
    url = result_table['url'][i].as_py()
    bytes_data = result_table['bytes'][i].as_py()
    
    if bytes_data is None:
        print(f"Row {i}: FAILED (None) - try-catch worked βœ“")
    else:
        print(f"Row {i}: SUCCESS ({len(bytes_data)} bytes)")
    print(f"  URL: {url[:60]}...")

print("\nβœ… Test passed: Failed downloads return None instead of crashing.")
```

Before the fix:
```
TypeError: cannot set 'open_input_file' attribute of immutable type 'pyarrow._fs.FileSystem'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ray/default/test_streaming_fallback.py", line 110, in <module>
    test_download_expression_with_streaming_fallback()
  File "/home/ray/default/test_streaming_fallback.py", line 67, in test_download_expression_with_streaming_fallback
    with patch.object(pafs.FileSystem, "open_input_file", mock_open_input_file):
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/unittest/mock.py", line 1594, in __enter__
    if not self.__exit__(*sys.exc_info()):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/unittest/mock.py", line 1603, in __exit__
    setattr(self.target, self.attribute, self.temp_original)
TypeError: cannot set 'open_input_file' attribute of immutable type 'pyarrow._fs.FileSystem'
(base) ray@ip-10-0-39-21:~/default$ python test.py
2025-11-11 18:32:23,510 WARNING util.py:1059 -- Caught exception in transforming worker!
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/data/_internal/util.py", line 1048, in _run_transforming_worker
    for result in fn(input_queue_iter):
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/data/_internal/planner/plan_download_op.py", line 197, in load_uri_bytes
    yield f.read()
          ^^^^^^^^
  File "pyarrow/io.pxi", line 411, in pyarrow.lib.NativeFile.read
  File "pyarrow/io.pxi", line 263, in pyarrow.lib.NativeFile.size
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
  File "/home/ray/anaconda3/lib/python3.12/site-packages/fsspec/implementations/http.py", line 743, in seek
    raise ValueError("Cannot seek streaming HTTP file")
ValueError: Cannot seek streaming HTTP file
Traceback (most recent call last):
  File "/home/ray/default/test.py", line 16, in <module>
    results = list(download_bytes_threaded(table, ["url"], ["bytes"], ctx))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/data/_internal/planner/plan_download_op.py", line 207, in download_bytes_threaded
    uri_bytes = list(
                ^^^^^
  File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/data/_internal/util.py", line 1113, in make_async_gen
    raise item
  File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/data/_internal/util.py", line 1048, in _run_transforming_worker
    for result in fn(input_queue_iter):
                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/ray/anaconda3/lib/python3.12/site-packages/ray/data/_internal/planner/plan_download_op.py", line 197, in load_uri_bytes
    yield f.read()
          ^^^^^^^^
  File "pyarrow/io.pxi", line 411, in pyarrow.lib.NativeFile.read
  File "pyarrow/io.pxi", line 263, in pyarrow.lib.NativeFile.size
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
  File "/home/ray/anaconda3/lib/python3.12/site-packages/fsspec/implementations/http.py", line 743, in seek
    raise ValueError("Cannot seek streaming HTTP file")
ValueError: Cannot seek streaming HTTP file
```
After the fix:
```
Row 0: SUCCESS (189370 bytes)
  URL: https://static-assets.tesla.com/configurator/compositor?cont...
```
   
Tested with HTTP streaming URLs (e.g., Tesla configurator images) that
previously failed:
   - βœ… Successfully downloads HTTP stream files
   - βœ… Gracefully handles failed downloads (returns None)
   - βœ… Maintains backward compatibility with existing file downloads

---------

Signed-off-by: xyuzh <xinyzng@gmail.com>
Signed-off-by: Robert Nishihara <robertnishihara@gmail.com>
Co-authored-by: Robert Nishihara <robertnishihara@gmail.com>
## Description

We today have very little observability into pubsub. On a raylet one of
the most important states that need to be propagated through the cluster
via pubsub is cluster membership. All raylets should in an eventual BUT
timely fashion agree on the list of available nodes. This metric just
emits a simple counter to keep track of the node count.

More pubsub observability to come.

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: zac <zac@anyscale.com>
Signed-off-by: Zac Policzer <zacattackftw@gmail.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
all tests are passing

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…#58587)

also stops building python 3.9 aarch64 images

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
so that importing test.py does not always import github

github repo imports jwt, which then imports cryptography and can lead to
issues on windows.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
this makes it possible to run on a different python version than the CI
wrapper code.

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
Signed-off-by: Lonnie Liu <95255098+aslonnie@users.noreply.github.com>
…ecurity (ray-project#58591)

Migrates Ray dashboard authentication from JavaScript-managed cookies to
server-side HttpOnly cookies to enhance security against XSS attacks.
This addresses code review feedback to improve the authentication
implementation (ray-project#58368)

main changes:
- authentication middleware first looks for `Authorization` header, if
not found it then looks at cookies to look for the auth token
- new `api/authenticate` endpoint for verifying token and setting the
auth token cookie (with `HttpOnly=true`, `SameSite=Strict` and
`secure=true` (when using https))
- removed javascript based cookie manipulation utils and axios
interceptors (were previously responsible for setting cookies)
- cookies are deleted when connecting to a cluster with
`AUTH_MODE=disabled`. connecting to a different ray cluster (with
different auth token) using the same endpoint (eg due to port-forwarding
or local testing) will reshow the popup and ask users to input the right
token.

---------

Signed-off-by: sampan <sampan@anyscale.com>
Co-authored-by: sampan <sampan@anyscale.com>
add support for `ray get-auth-token` cli command + test

---------

Signed-off-by: sampan <sampan@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Sampan S Nayak <sampansnayak2@gmail.com>
Co-authored-by: sampan <sampan@anyscale.com>
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
liulehui and others added 22 commits November 26, 2025 01:51
1. Jax dependency is introduced in
ray-project#58322
2. The current test environment is for CUDA 12.1, which limit jax
version below 0.4.14.
3. jax <= 0.4.14 does not support py 3.12.
4. skip jax test if it runs against py3.12+.

Signed-off-by: Lehui Liu <lehui@anyscale.com>
…ics (ray-project#58870)

When reporting a checkpoint to Ray Train, every worker needs to form a
barrier with a `ray.train.report` call. If every worker reports an empty
checkpoint, we should notify the condition to unblock
`ray.train.get_all_reported_checkpoint` calls.

Before this fix, reporting an empty checkpoint and calling
`get_all_reported_checkpoints` would result in a hang.

---------

Signed-off-by: Timothy Seah <tseah@anyscale.com>
optimization for the case where list has only one timeseries. O(nlogn)
-> O(1)

Signed-off-by: abrar <abrar@anyscale.com>
…GNAL_HANDLER option (ray-project#58984)

Ray's Abseil failure signal handler conflicts with JVM's internal signal
handling. When PyArrow's HadoopFileSystem starts a JVM, and then Ray
installs its signal handler, JVM's normal internal operations (for
safepoint synchronization) trigger Ray's handler, causing a crash.

It seems we already have some code which disables this for java bindings
(there's a todo about in io_ray_runtime_RayNativeRuntime.cc). This is
all well and good when we realize we're running in a java runtime, but
for the bug report I'm not sure how obvious it is (maybe we should
consider doing something like this in data itself automatically?)
Unsure. For our part if we know the datasource is form hdfs, that seems
just as likely as any that this might be happening. But it's only
relevant if you're running the jvm in the same process (I was only able
to reproduce this by running everything in the same process). One of the
comments in the bug ticket points out they're able to skirt the error by
doing process isolaiton (which ended up being the hint I needed).

**What's the catch?**
When this config is set we essentially won't call WriteFailureMessage()
on a crash, which means we won't get a full trace on termination or
flush full logs. So just setting this all the time isn't a great
solution (and essentially why I've thrown it behind a config). We might
want to revisit this for HDFS users. That said. The JVM core dump info
is somewhat decent. So we're not totally in the dark.

## Related issues

Fixes ray-project#36415

---------

Signed-off-by: zac <zac@anyscale.com>
…ject#57639)

Enable zero-copy serialization for all PyTorch tensors by setting
`RAY_ENABLE_ZERO_COPY_TORCH_TENSORS=1` to accelerate serialization.

Example test script:
```python
import os

# Must be set before `import ray` to ensure that the zero-copy tensor pickle reducer
# is properly registered in driver.
os.environ["RAY_ENABLE_ZERO_COPY_TORCH_TENSORS"] = "1"

import ray
import torch
from datetime import datetime

ray.init(runtime_env={"env_vars": {"RAY_ENABLE_ZERO_COPY_TORCH_TENSORS": "1"}})

@ray.remote
def process(tensor):
    return tensor.sum()

x = torch.ones(1024, 1024, 256)

start_time = datetime.now()
x_ref = process.remote(x)
result = ray.get(x_ref)
time_diff = datetime.now() - start_time

print(f"result      : {result}")
print(f"between time: {time_diff.total_seconds()}s")
print(f"result type : {type(result)}")
```
Below are the performance gains and validation results:
<img width="1977" height="965" alt="zuizhongxiaoguo"
src="https://github.com/user-attachments/assets/e3d5210c-142d-4ec3-908c-fe590514cfc8"
/>

Closes ray-project#56740 ray-project#26229

---------

Signed-off-by: Haichuan Hu <kaisennhu@gmail.com>
Co-authored-by: Ibrahim Rabbani <irabbani@anyscale.com>
and performing some styling cleanup

Signed-off-by: Lonnie Liu <lonnie@anyscale.com>
…oughput stats (ray-project#58693)

This PR makes three improvements to Ray Data's throughput statistics:

1. **Makes `test_dataset_throughput` deterministic**: The original test
was flaky because it relied on actual task
execution timing. This PR rewrites it as unit tests
(`test_dataset_throughput_calculation` and
`test_operator_throughput_calculation`) using mocked `BlockStats`
objects, making the tests fast and reliable.

2. **Removes "Estimated single node throughput" from Dataset-level
stats**: This metric was misleading at the
dataset level since it summed wall times across all operators, which
doesn't accurately represent single-node
performance. The "Ray Data throughput" metric (total rows / total wall
time) remains and provides the meaningful
  dataset-level throughput.

3. **Renames "Estimated single node throughput" to "Estimated single
task throughput"**: At the operator level,
this metric divides total rows by the sum of task wall times. The new
name more accurately reflects what it
measuresβ€”the throughput if all work were done by a single task serially.

---------

Signed-off-by: dancingactor <s990346@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
## Description
>This PR removes the deprecated read_parquet_bulk API from Ray Data,
along with its implementation and documentation. This function was
deprecated in favor of read_parquet, which now covers all equivalent use
cases. The deprecation warning stated removal after May 2025, and that
deadline has passed β€” so this cleanup reduces maintenance burden and
prevents user confusion.

Summary of changes

- Removed read_parquet_bulk from read_api.py and __init__.py
- Deleted ParquetBulkDatasource + its file
- Removed related tests and documentation
- Updated references and docstrings mentioning the deprecated API

## Related issues
> Fixes ray-project#58969

---------

Signed-off-by: rushikesh.adhav <adhavrushikesh6@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
…8694)

## Description
The test fails intermittently with an assertion error indicating that
the internal input queue for a MapBatches operator is not empty when
it's expected to be. This suggests a race condition or timing issue in
the streaming executor's queue management.

## Related issues
Closes ray-project#58546 

## Additional information

---------

Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Co-authored-by: Balaji Veeramani <bveeramani@berkeley.edu>
…ect#58890)

## Description
RLlib is divided on its testing structure, we have the `/tests` folder
and for components `<component>/tests` folder mixing the two testing
project structures.
This PR commits to the component style of project structure moving
`tests/` files to their component folder.

## Related issues
General project structure improvements

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com>
## Description
- convert debug logs in authentication_token_loader to info so that
users are aware of where the token being used is being loaded from
- When we raise the `AuthenticationError`, if RAY_AUTH_MODE is not set
to token we should explicitly print that in the error message
- in error messages suggest storing tokens in filesystem instead of env
- add state api tests in test_token_auth_integration.py

---------

Signed-off-by: sampan <sampan@anyscale.com>
Co-authored-by: sampan <sampan@anyscale.com>
…round (ray-project#59004)

## Description
Remove the dual local/CI hook configuration for pydoclint and simplify
back to a single hook. The workaround was needed due to a bug, but this was fixed in
pydoclint `0.8.3` πŸŽ‰

- Upgrade pydoclint from `0.8.1` to `0.8.3`
- Remove separate `pydoclint-local` and `pydoclint-ci` hooks
- Simplify CI lint script to use standard pre-commit run


## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

Signed-off-by: Thomas Desrosiers <thomas@anyscale.com>
Jax dependency is introduced in
ray-project#58322
The current test environment is for CUDA 12.1, which limit jax version
below 0.4.14.
jax <= 0.4.14 does not support py 3.12.
skip jax test if it runs against py3.12+.

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Patch release fixes a post-merge bug that was causing GPU batch tests to
fail.

- Add explicit transformers>=4.57.3 dependency to ray[llm]
- Update hashes


## Related issues
anyscale#547

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
autoscaling context need expensive function evaluation, not all
autoscaling policies need the data. Lazily evaluate them to save
controller CPU

---------

Signed-off-by: abrar <abrar@anyscale.com>
## Description

`test_backpressure_e2e` occasionally fails with a bug like this:
```
[2025-11-26T17:33:36Z] PASSED[2025-11-26 17:27:35,172 E 550 12058] core_worker_process.cc:986: The core worker has already been shutdown. This happens when the language frontend accesses the Ray's worker after it is shutdown. The process will exit
```

This PR attempt to deflake it by removing an unnecessary `del`

(Long-term, we should rewrite or remove this test. This PR is a
mitigation)

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
## Description
RLlib has a tuned-examples and examples folder however we believe this
is a confusing structure which this PR merges

Therefore, this PR moves all tuned-example scripts into
`examples/algorithms` for the new stack api and
`examples/_old_api_stack/algorithms` for the old stack api.

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com>
Including openlineage-python dependency
upgrading requests from 2.32.3 -> 2.32.5

LLM serve and batch release tests:
https://buildkite.com/ray-project/release/builds/69428
Only failing test has already been disabled:
llm_serve_llama_3dot1_8B_quantized_tp1_2p6d_lmcache

Core multi test: https://buildkite.com/ray-project/release/builds/69489#

---------

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
## Description
Checking RLlib, there are a couple cases where we don't fully handle
exception cases.
I've checked every try except statement in RLlib and this PR updates all
of them that didn't log or print the error if it wasn't handled

## Related issues
Fixes ray-project#58854

---------

Signed-off-by: Mark Towers <mark@anyscale.com>
Co-authored-by: Mark Towers <mark@anyscale.com>
…jection logic (ray-project#59042)

## Description
Adds more headers to the denylist for recognising browser requests and
denying them

## Related issues
Supercedes ray-project#59040

Signed-off-by: Richo Healey <richo@anyscale.com>
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull request #691 has too many files changed.

The GitHub API will only let us fetch up to 300 changed files, and this pull request has 6188.

@gemini-code-assist
Copy link

Summary of Changes

Hello @antfin-oss, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is an automated daily merge from the master branch into main, focusing on a comprehensive modernization of the project's development infrastructure. It encompasses significant updates to the CI/CD pipelines, a major refactoring of the Bazel build system, and enhancements to code quality enforcement through new linting tools. These changes aim to improve build efficiency, standardize Python environment management, and ensure a higher level of code quality and maintainability across the project.

Highlights

  • CI/CD Modernization: The Buildkite pipelines and Dockerfiles have undergone a significant overhaul, including the adoption of Miniforge/uv for Python dependency management, streamlining the build process, and updating Python version support across various environments.
  • Bazel Build System Refinement: Extensive updates to Bazel configurations introduce stricter action environments, new packaging mechanisms using pkg_files and pkg_zip, and clearer Python toolchain definitions, enhancing build reliability and consistency.
  • Code Quality & Linting Enhancements: New pre-commit hooks (semgrep, vale, cython-lint) have been integrated, and existing ones updated, to enforce higher code quality and style standards across the codebase.
  • Python 3.9 Deprecation: Python 3.9 support has been removed from several build and test configurations, indicating a strategic shift towards newer Python versions.
  • C++ API/Runtime Adjustments: Minor but impactful changes have been made to C++ API definitions, metric handling, and runtime environment serialization, improving the robustness and clarity of the C++ components.
Ignored Files
  • Ignored by pattern: .gemini/** (1)
    • .gemini/config.yaml
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/stale_pull_request.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with πŸ‘ and πŸ‘Ž on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a large-scale refactoring and cleanup of the CI/CD and build systems. The changes are extensive, touching many configuration files, build scripts, and some parts of the C++ and Python code. The main themes of this refactoring are:

  • CI/CD Pipeline Refactoring: Many .buildkite YAML files have been modified, added, or removed to restructure the CI pipeline. This includes separating image builds, documentation builds, and dependency management into their own dedicated pipeline files, which improves modularity and clarity.
  • Bazel Build System Refactoring: The root BUILD.bazel and WORKSPACE files have been significantly refactored. Many build targets have been moved from the root BUILD.bazel to more specific locations, and the project now uses rules_pkg for packaging, which is a great improvement. The workspace name has been changed from com_github_ray_project_ray to io_ray.
  • Dependency Management Update: The project is moving from miniconda to miniforge and has introduced uv and a new custom tool raydepsets for more robust dependency management. This is a significant step towards more reproducible builds.
  • Tooling and Process Updates: The PR includes updates to the CODEOWNERS file, the pull request template, and a major overhaul of the linting and formatting setup by moving to pre-commit.
  • Code Cleanups: There are several minor cleanups and modernizations in the C++ code, such as replacing deprecated features (std::result_of_t) and improving code style.

Overall, these changes represent a significant improvement in the project's build and CI infrastructure, making it more modern, modular, and maintainable. I did not find any issues that meet the requested severity threshold of medium or higher. The changes are well-executed and consistent across the large number of modified files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.