[pull] master from ray-project:master #2625

pull · 2024-10-29T19:21:19Z

See Commits and Changes for more details.

Can you help keep this open source service alive? 💖 Please sponsor : )

## Why are these changes needed? Move `test_deployment_scheduler.py` from small to medium. Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

…plify. (#47972)

If args is not a list, an exception will be thrown in the above if statement.

…ate_failed_env_runners=False` to `True`). (#48286)

## Why are these changes needed? Implement full recursive cancellation at Serve layer. Currently, at the time that a parent request is cancelled, if: - The child request has not been assigned to a replica, _meaning_ the replica scheduler in the router is still processing that request - AND the parent request is not directly blocking and waiting on the result of the child request Then the child request will not be correctly cancelled. Note that if the second bullet point were actually flipped, i.e. the parent request _was_ waiting on the result at the time of cancellation, then an `asyncio.CancelledError` will be sent directly to the replica scheduler and the child request will get successfully cancelled. However obviously this cannot be guaranteed to be true. This PR adds a cache for asyncio tasks that are launched in the router to assign a request to a replica. When a parent request is cancelled, if there are any "pending assignment requests" in the cache, they are cancelled. Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

Stacked on: #48223 Reimplements the `.bind()` codepath to avoid using the Ray DAG codepath which adds a lot of complexity for little benefit. The DAG traversal code is much simpler, around 100 lines of self-contained code in `build_app.py`. I've also added unit tests for it. There should be no behavior change here. --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…schedule work on head node" (#48321) Reverts #48103

…joco 3.2.4, pettingzoo 1.24.3 supersuit 3.9.3)."" (#48308)

Previously the error message was injected directly as the HTTP status message Signed-off-by: Richo Healey <richo@anyscale.com>

## Why are these changes needed? fix router.py Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

…g compilation (#47757) Leaf nodes are nodes that are not output nodes and have no downstream nodes. If a leaf node raises an exception, it will not be propagated to the driver. Therefore, this PR raises an exception if a leaf node is found during compilation. Another solution: implicitly add leaf node to MultiOutputNode Currently, the function execute can return multiple CompiledDAGRefs. The UX we want to provide is to implicitly add leaf nodes to the MultiOutputNode but not return the references of the leaf nodes. For example, a MultiOutputNode is containing 3 DAG nodes (2 normal DAG nodes + 1 leaf node). x, y = compiled_dag.execute(input_vals) # We don't return the ref for the leaf node. However, the ref for leaf node will be GC(ed) in execute, and CompiledDAGRef’s del will call get if it was never called which makes execute to be a sync instead of an async operation which is not acceptable.

…#48358) Some tests are failing when we duplicate the original release tests. Marking them as unstable to be non-blocking. Will mark as stable once they are truly stable. Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

See #48228 (comment). Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

GcsServer manages several IO Contexts and their threads, and hard codes which class uses which IO Context. This is not flexible in that we can't easily declare another IO Context and adapt some classes without a pervasive code change. Introduces an IoContextProvider that manages IO Contexts based on a static, compile-time Policy, whose author can determine class-IO Context mapping. This way we can change the mappings in a central place without big chunk of changes everywhere. Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

… page. (#48285)

…ls.cache decorators on methods can lead to memory leaks (#48186) While running the pre-commit hook of flake8, the following error occurs if Python version is 3.12. It's because the version of flake8 is too old. ![image](https://github.com/user-attachments/assets/bf56370f-04d7-4b9a-aea9-e1c578547ed4) version: - python: 3.12.7 - flake8: 7.1.1 - flake8-bugbear: 24.8.19 Signed-off-by: win5923 <ken89@kimo.com>

…3FileSystem` (#48216) ## Why are these changes needed? When PyArrow does not have support for S3, accessing `pyarrow.fs.S3FileSystem` raises an exception. Fix by checking if the `S3FileSystem` is importable before doing this typecheck. --------- Signed-off-by: Charles Coulombe <ccoulombe@users.noreply.github.com> Signed-off-by: Charles Coulombe <charles.coulombe@calculquebec.ca>

…oing_requests` (#47681) (#48274) ## Why are these changes needed?  This PR modifies the actor_options used when deploying replicas. Deployment will use the configured `max_ongoing_requests` attribute of the deployment config as the replica's `max_concurrency` if the concurrency is not explicitly set. This is to prevent replica's `max_concurrency` from capping `max_ongoing_requests`. ## Related issue number  Closes #47681 Signed-off-by: akyang-anyscale <alexyang@anyscale.com>

## Why are these changes needed?  This PR moves `ProxyStatus` out of the `_private` directory, allowing it to be included in the API docs. This is the final attribute of `ServeStatus` that needs to be included in the documentation. ## Related issue number  Closes #43394 --------- Signed-off-by: akyang-anyscale <alexyang@anyscale.com>

…10.1, mujoco 3.2.4, pettingzoo 1.24.3 supersuit 3.9.3).""" (#48435) Reverts #48308

Pyarrow 18 causes errors on the MacOS build. Since Pyarrow 18 is not officially supported on Ray Data yet, we can just pin it for now. Signed-off-by: Scott Lee <sjl@anyscale.com>

#48415)   ## Why are these changes needed? I was initially confused that I couldn't join another paused task while a debugger was in "continue" mode. ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: bhuang <bhuang@anyscale.com> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>

…nccl_dag (#48433) Reduce the actors needed for test_mocked_nccl_dag test. Closes #48288. Signed-off-by: Stephanie Wang <smwang@cs.washington.edu>

Adds a metric to monitor event loop lag for `instrumented_io_context`. This is by default applied to IO Contexts in GCS only. By default every 250ms we post an async task and use std::chrono::steady_clock to record the lag. The metric is a GAUGE with the thread name recorded. I tested this on my laptop and it works - it's a ~0 on an idle cluster, and 1000ms if I use `RAY_testing_asio_delay_us=event_loop_lag_probe=1000000:1000000`. Signed-off-by: Ruiyang Wang <rywang014@gmail.com>

…). (#48442)

Signed-off-by: zhilong <zhilong.chen@mail.mcgill.ca>

CoreWorkerMemoryStore has GetAsync method which calls callbacks when the object is ready, immediately or on a Put() call. However despite the name, the call is not `Async`: it just calls on the GetAsync() or Put() call stack. If a mutex is held by any call frame in the callstack, it remains held in the callback, ripe for deadlocks. Adds another ctor argument io_context and posts any callbacks to the context to avoid those deadlocks. Note many cpp tests creates CoreWorkerMemoryStore without a io_context so I kept the option of creating an owned io_context with thread. Signed-off-by: Ruiyang Wang <rywang014@gmail.com> Signed-off-by: Ruiyang Wang <56065503+rynewang@users.noreply.github.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>

Gives clangd the ability to fully index ray's c++ code after you run ```bazel run :refresh_compile_commands```. Gives you lsp support + clang-tidy linting in ide to catch bad cpp practices in files based on rules already setup in .clang-tidy. https://github.com/ray-project/ray/blob/master/.clang-tidy A lot of these don't seem to be followed though and seem contrary to some of the style of existing code, but some are critical for not giving up performance gains off of small things. Trimming down to more important lints could be helpful. Signed-off-by: dayshah <dhyey2019@gmail.com> Co-authored-by: Ruiyang Wang <56065503+rynewang@users.noreply.github.com>

…8447) Adds dependency injection to create a custom `DeploymentHandle` type in `build_app`. This will be used to create a special type of handle for local testing mode. This also allowed me to remove the mocking/patching happening in the unit tests. --------- Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

…" (#48686) This reverts commit 4a9c424. --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

https://github.com/ray-project/ray/pull/48693/files#diff-81d7373cd5567e997c002b244c491e6b3498d206b12c093f4dc4d30e9b5848af added a test that uses tensorflow. We currently need to skip all tensorflow-related tests for python 3.12 since we don't support tensorflow for python 3.12. Also this is a test for the deprecation of tensorflow ;) Test: - CI Signed-off-by: can <can@anyscale.com>

…tensors > 2Gb) (#48629) Enabling V2 Arrow Tensor extension type by default (allowing tensors > 2Gb) --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com>

…Error (#48636) In a recent investigation, we found that when we call the `ray._private.internal_api.free()` from a task the same time as a Raylet is gracefully shutting down, the task might fail with application level Broken pipe IOError. This resulted in job failure without any task retries. However, as the Broken pipe happens because the unhealthiness of the local Raylet, the error should be a system level error and should be retried automatically. Updated changes in commit [01f5f11](01f5f11): This PR add the logic for the above bahvior: * When IOError is received in the `CoreWorker::Delete`, throw a system error exception so that the task can retry Why not add the exception check in the `free_objects` function? * It is better to add the logic in the `CoreWorker::Delete` because it can cover the case for other languages as well. * The `CoreWorker::Delete` function is intended to be open to all languages to call and is not called in other ray internal code paths. Why not crash the worker when IOError is encountered in the `WriteMessage` function? * `QuickExit()` function will directly exit the process without executing any shutdown logic for the worker. Directly calling the function in the task execution might potentially causing resource leak * At the same time, the write message function is called also on the graceful shutdown scenario and it is possible during the graceful shutdown process that the local Raylet is unreachable. Therefore, in the graceful shutdown scenario, we shouldn't exit early but let the shutdown logic finish. * At the same time, it is not clear in the code regarding the behavior of the graceful vs force shutdown. We might need some effort to make them clear. The todo is added in the PR. Updated changes in commit [2029d36](2029d36): > This PR add the logic for the above behavior: > * When IOError is received in the `free_objects()` function, throw a system error exception so that the task can retry Changes in commit ([9d57b29](9d57b29)) : > This PR add the logic for the above behavior: > * Today, the internal `free` API deletes the objects from the local Raylet object store by writing a message through a socket > * When the write failed because the local Raylet is terminated, there is already logic to quick exit the task > * However, the current termination check didn't cover the case where the local Raylet process is a Zombie process and IOError happens during write messages. > * This fix update the check criteria and fail the task when the Raylet process is terminated or the write message function returns an IOError~ Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>

## Why are these changes needed? While we calling `xxx.map_groups(..., batch_format="...")`, we may invoke sort function and creating empty blocks which still uses pyarrow by default. And, when we invoke another sort call on top of it, we will hit `AttributeError: 'DataFrame' object has no attribute 'num_rows'` since we uses first block type. (However, we may have different blocks). See more details in #46748 ## Related issue number Close #46748 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Xingyu Long <xingyulong97@gmail.com> Co-authored-by: Scott Lee <scottjlee@users.noreply.github.com>

Created by release automation bot. Update with commit e393a71 Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>

…es (#48478)   ## Why are these changes needed? When writing blocks to parquet, there might be blocks with fields that differ ONLY in nullability - by default, this would be rejected since some blocks might have a different schema than the ParquetWriter. However, we could potentially allow it to happen by tweaking the schema. This PR goes through all blocks before writing them to parquet, and merge schemas that differ only in nullability of the fields. It also casts the table to the newly merged schema so that the write could happen.  ## Related issue number Closes #48102 --------- Signed-off-by: rickyx <rickyx@anyscale.com>

…48697) ## Why are these changes needed? This makes SortAggregate more consistent by unifying the API on the SortKey object, similar to how SortTaskSpec is implemented. ## Related issue number This is related to #42776 and #42142 Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

Signed-off-by: dentiny <dentinyhao@gmail.com>

ray.util.state.get_actor currently have a type annotation specifying that the function should return an Optional[Dict] but it actually returns [ActorState](https://github.com/ray-project/ray/blob/3141dfe4031cc715515b365278cd1d6b8955154e/python/ray/util/state/common.py#L416) (as the Docstring speficies). This pull request simply changes this type annotation. Signed-off-by: Miguel Teixeira <miguel.teixeira@poli.ufrj.br>

…48746) Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>

## Why are these changes needed? We currently report `iter_total_blocked_seconds` and `iter_user_seconds` as **Gauge** while we tracking them as counters, i.e.: - For each iteration, we had a timer that sums locally for each iteration into an aggregated value (which is the sum of total blocked seconds) - When the iteration ends or the iterator GCed, the gauge metric value is currently set to 0. - This creates confusion for users as a counter value (total time blocked on a dataset) should not be going back to 0, generating charts like below. --------- Signed-off-by: rickyx <rickyx@anyscale.com>

Signed-off-by: dayshah <dhyey2019@gmail.com>

Some C++ improvements to our codebase. --------- Signed-off-by: dentiny <dentinyhao@gmail.com>

…coloring (#48473) Signed-off-by: dayshah <dhyey2019@gmail.com>

## Why are these changes needed? Adds a Sentinel value for making it possible to sort. Fixes #42142 ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

… batches first, instead of newest, BUT drop oldest batches if queue full). (#48702)

Signed-off-by: Aziz Belaweid <40893766+azayz@users.noreply.github.com>

Signed-off-by: ltbringer <amresh.venugopal@wise.com>

…ard (#48745) ## Why are these changes needed? Currently, there are some cases where the `Rows Outputted` value on the Ray Job page's `Ray Data Overview` section says "0", even after the dataset execution completes. The root cause of the bug is that we clear iteration/execution metrics after the dataset completes. This was previously used to "reset" the metrics to 0 after dataset completion, so that the last emitted value would not persist on the dashboard, even after the job finishes. Now that we display rates on the dashboard, this hack is no longer needed, and we can skip the metrics clearing. Fixed result: <img width="1860" alt="Screenshot at Nov 14 12-11-24" src="https://github.com/user-attachments/assets/35061b3f-9359-412b-8ab2-f4bcce412994"> ## Related issue number Closes #44635 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Scott Lee <sjl@anyscale.com>

## Why are these changes needed? Fixed typo  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: mohitjain2504 <87856435+mohitjain2504@users.noreply.github.com> Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Gene Der Su <gdsu@ucdavis.edu>

Signed-off-by: dayshah <dhyey2019@gmail.com>

too ancient; and there are no tests. this should remove the outdated paramiko package from the ray image Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

I randomly find lancedb issue: lancedb/lancedb#1480 which discloses a high-severity CVE Considering as lancedb, ray only has one use case for `retry` package, I took the same approach as lancedb/lancedb#1749, which names all variables better with unit and default value. --------- Signed-off-by: dentiny <dentinyhao@gmail.com>

nightly is inherently unstable, and latest is just last stable release. neither of them should block releases Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

fix NCCL_BFLOAT16 typo in TORCH_NCCL_DTYPE_MAP

also uses 1.3 syntax and heredoc for multiline commands Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

Support read from Hudi table into Ray dataset. --------- Signed-off-by: Shiyan Xu <2701446+xushiyan@users.noreply.github.com>

…s download (#48756) Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

zcin and others added 14 commits October 29, 2024 06:25

[serve] move deployment scheduler test (#48303)

a97c23c

## Why are these changes needed? Move `test_deployment_scheduler.py` from small to medium. Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

[RLlib] AutoregressiveActionsRLM overhaul to fix flaky test and sim…

a24bf07

…plify. (#47972)

[core][compiled graph] remove dead code (#48282)

43c3663

If args is not a list, an exception will be thrown in the above if statement.

[RLlib] Change config.fault_tolerance default behavior (from `recre…

44237c6

…ate_failed_env_runners=False` to `True`). (#48286)

Revert "[ci][tune][train] Update release test compute configs to not …

d80b22f

…schedule work on head node" (#48321) Reverts #48103

Revert "Revert "[RLlib] Upgrade to gymnasium 1.0.0 (ale_py 0.10.1, mu…

918bbd1

…joco 3.2.4, pettingzoo 1.24.3 supersuit 3.9.3)."" (#48308)

Fix response injection (#47810)

2e1b0f6

Previously the error message was injected directly as the HTTP status message Signed-off-by: Richo Healey <richo@anyscale.com>

Fix lint (#48322)

5e252d7

## Why are these changes needed? fix router.py Signed-off-by: Cindy Zhang <cindyzyx9@gmail.com>

[Data] Ensure all tests use shared cluster in test_csv (#48352)

40ad2da

See #48228 (comment). Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

pull bot added the ⤵️ pull label Oct 29, 2024

sven1977 and others added 15 commits October 29, 2024 22:40

[RLlib; docs] RLlib documentation do-over (new API stack): Main index…

7f52a4e

… page. (#48285)

Revert "Revert "Revert "[RLlib] Upgrade to gymnasium 1.0.0 (ale_py 0.…

0c58e87

…10.1, mujoco 3.2.4, pettingzoo 1.24.3 supersuit 3.9.3).""" (#48435) Reverts #48308

[Data] Pin pyarrow<18 (#48300)

66265ab

Pyarrow 18 causes errors on the MacOS build. Since Pyarrow 18 is not officially supported on Ray Data yet, we can just pin it for now. Signed-off-by: Scott Lee <sjl@anyscale.com>

Fix flaky test linux://python/ray/dag:tests/experimental/test_mocked_…

956310d

…nccl_dag (#48433) Reduce the actors needed for test_mocked_nccl_dag test. Closes #48288. Signed-off-by: Stephanie Wang <smwang@cs.washington.edu>

[RLlib] Quick fix for RLlib's index.html page (environment tab broken…

5984023

…). (#48442)

[Core] Better log format when show the disk size (#46869)

4587b6c

Signed-off-by: zhilong <zhilong.chen@mail.mcgill.ca>

bveeramani and others added 29 commits November 13, 2024 18:09

Revert "Revert "[Data] Add BundleQueue abstraction (#48503)" (#48612)…

4b4cdcd

…" (#48686) This reverts commit 4a9c424. --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>

[Arrow] Enabling V2 Arrow Tensor extension type by default (allowing …

9fba756

…tensors > 2Gb) (#48629) Enabling V2 Arrow Tensor extension type by default (allowing tensors > 2Gb) --------- Signed-off-by: Alexey Kudinkin <ak@anyscale.com>

[docker] Update latest Docker dependencies for 2.39.0 release (#48729)

bcee207

Created by release automation bot. Update with commit e393a71 Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>

[core] [9/N] Skip installation if already (#48731)

a2d8f63

Signed-off-by: dentiny <dentinyhao@gmail.com>

[Dashboard] Fix dashboard NodeHead._update_node_stats check failure (#…

2e953d3

…48746) Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>

[core] Stop waiting for dependencies when cancelling tasks (#48661)

4dc514e

Signed-off-by: dayshah <dhyey2019@gmail.com>

[core] Minor cpp improvement (#48734)

d05d91e

Some C++ improvements to our codebase. --------- Signed-off-by: dentiny <dentinyhao@gmail.com>

[core][compiled graphs] Rework visualize with channel info and actor …

42d101e

…coloring (#48473) Signed-off-by: dayshah <dhyey2019@gmail.com>

[RLlib] Reverse learner queue behavior of IMPALA/APPO (consume oldest…

b33fa04

… batches first, instead of newest, BUT drop oldest batches if queue full). (#48702)

[Train] fix: typo (#48653)

de80d89

Signed-off-by: Aziz Belaweid <40893766+azayz@users.noreply.github.com>

[Refactor] Remove unused KubeRayNodeProvider._url method. (#48714)

196a0ca

Signed-off-by: ltbringer <amresh.venugopal@wise.com>

[core] Change m4's to m6i's for core release tests (#48753)

107a8e3

Signed-off-by: dayshah <dhyey2019@gmail.com>

[image] remove azure packages from base image (#48762)

1618493

too ancient; and there are no tests. this should remove the outdated paramiko package from the ray image Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

[vm launcher] change nightly and latest launcher test to manual (#48774)

7a07263

nightly is inherently unstable, and latest is just last stable release. neither of them should block releases Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

fix NCCL_BFLOAT16 typo in TORCH_NCCL_DTYPE_MAP (#48590)

1576af1

fix NCCL_BFLOAT16 typo in TORCH_NCCL_DTYPE_MAP

[image] comply with constraint file in base image (#48764)

e70b37a

also uses 1.3 syntax and heredoc for multiline commands Signed-off-by: Lonnie Liu <lonnie@anyscale.com>

[data][api] implement HudiDataSource (#46273)

36682c1

Support read from Hudi table into Ray dataset. --------- Signed-off-by: Shiyan Xu <2701446+xushiyan@users.noreply.github.com>

[Core] Map aarch64 to arm64 in get_system_info function for Prometheu…

acc1b07

…s download (#48756) Signed-off-by: Dongjun Na <kmu5544616@gmail.com>

pull bot merged commit acc1b07 into miqdigital:master Nov 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ray-project:master #2625

[pull] master from ray-project:master #2625

pull bot commented Oct 29, 2024 •

edited

Loading

[pull] master from ray-project:master #2625

[pull] master from ray-project:master #2625

Conversation

pull bot commented Oct 29, 2024 • edited Loading

pull bot commented Oct 29, 2024 •

edited

Loading