Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray fails to serialize self-reference objects #1234

Closed
suquark opened this issue Nov 20, 2017 · 4 comments
Closed

Ray fails to serialize self-reference objects #1234

suquark opened this issue Nov 20, 2017 · 4 comments

Comments

@suquark
Copy link
Member

suquark commented Nov 20, 2017

System information

  • Ray installed from (source or binary): pip
  • Ray version: 0.2.2
  • Python version: 3.6.2

Describe the problem

Ray fails to serialize self-reference objects (for example, Graph objects in networkx).

I think it is because ray always tries to use pyarrow first and does not catch pyarrow.lib.ArrowNotImplementedError, see

ray/python/ray/worker.py

Lines 285 to 289 in e0360eb

try:
self.plasma_client.put(value, pyarrow.plasma.ObjectID(
object_id.id()), self.serialization_context)
break
except pyarrow.SerializationCallbackError as e:

After catching pyarrow.lib.ArrowNotImplementedError, we should not use use_dict=True as a workaround, because it will cause endless loop. A correct approach may be:

            except (pyarrow.SerializationCallbackError, pyarrow.lib.ArrowNotImplementedError) as e:
                try:
                    if isinstance(e, pyarrow.lib.ArrowNotImplementedError):
                        e.example_object = value
                        raise e  # redirect to use cloudpickle

Source code / logs

class Graph:
    def __init__(self):
        self.g = self

G = Graph()
ray.put(G)  # --> pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion depth. It may contain itself recursively.

# another example

import networkx as nx
G = nx.Graph()
    
G.add_edges_from([(1, 2), (1, 3)])
G.add_node(1)
G.add_edge(1, 2)
G.add_node("spam")  # adds node "spam"
G.add_nodes_from("spam")  # adds 4 nodes: 's', 'p', 'a', 'm'
G.add_edge(3, 'm')
ray.put(G)  # --> pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion depth. It may contain itself recursively.

@mitar

@robertnishihara
Copy link
Collaborator

Can you try ray.register_custom_serializer? The following works for me.

import ray
ray.init()

class Graph:
    def __init__(self):
        self.g = self

ray.register_custom_serializer(Graph, use_pickle=True)

G = Graph()
ray.put(G)

This is closely related to #319 and https://issues.apache.org/jira/browse/ARROW-1382.

A side comment. The original code worked for me in Python 2 because in Python 2 Graph is an old-style class and so we automatically fall back to Pickle anyway I think.

@mitar
Copy link
Member

mitar commented Nov 20, 2017

Hm, so ideally we would like to serialize networkx graphs. Because they can be quite large, I am not sure if pickling is a good approach.

@robertnishihara
Copy link
Collaborator

Custom serializers/deserializers can be registered with the same approach. Not sure what the right one would be in this case, but just as a simple example, you could do something like

import numpy as np
import ray

ray.init()

class Graph:
    def __init__(self, big_array):
        self.g = self
        self.big_array = big_array

def custom_graph_serializer(obj):
    return obj.big_array

def custom_graph_deserializer(serialized_obj):
    return Graph(serialized_obj)

ray.register_custom_serializer(Graph,
                               serializer=custom_graph_serializer,
                               deserializer=custom_graph_deserializer)

G = Graph(np.ones(100))
ray.put(G)

@edoakes
Copy link
Contributor

edoakes commented Mar 5, 2020

Stale - please open new issue if still relevant

@edoakes edoakes closed this as completed Mar 5, 2020
fishbone added a commit that referenced this issue Nov 3, 2021
## Why are these changes needed?
This is part of redis removal project. This PR is going to enable grpc based broadcasting by default.

## Related issue number

<!-- For example: "Closes #1234" -->
#19438 
## Checks
rkooo567 added a commit that referenced this issue Nov 16, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

There's one user who has an issue that one of raylets cannot schedule tasks anymore because `num_worker_not_started_by_job_config_not_exist ` > 0.

This PR adds better log messages to figure out if the root cause is the job information is not properly propagated from GCS to raylet through Redis pubsub. 

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
wuisawesome pushed a commit that referenced this issue Nov 16, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

This pin is needed to fix `test_output` on master, which broke when 4.0.0 was released. 

It may also fix the windows build (unsure). 

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
rkooo567 pushed a commit that referenced this issue Nov 17, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally). 

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(


Co-authored-by: Alex <alex@anyscale.com>
wuisawesome pushed a commit that referenced this issue Nov 17, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

This PR adds the hiredis dependency for non M1 machines. 

This removes the `redis < 4.0` pin.

Since hiredis doesn't have M1 mac wheels yet, so users there will have extra warning messages in their outputs if they use redis 4.0.
<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(


Co-authored-by: Alex Wu <alex@anyscale.com>
fishbone pushed a commit that referenced this issue Nov 18, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally). 

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(


Co-authored-by: Alex <alex@anyscale.com>
wuisawesome pushed a commit that referenced this issue Nov 20, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

There's one user who has an issue that one of raylets cannot schedule tasks anymore because `num_worker_not_started_by_job_config_not_exist ` > 0.

This PR adds better log messages to figure out if the root cause is the job information is not properly propagated from GCS to raylet through Redis pubsub. 

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
wuisawesome pushed a commit that referenced this issue Nov 20, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

This pin is needed to fix `test_output` on master, which broke when 4.0.0 was released. 

It may also fix the windows build (unsure). 

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
wuisawesome pushed a commit that referenced this issue Nov 20, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally). 

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(


Co-authored-by: Alex <alex@anyscale.com>
wuisawesome pushed a commit that referenced this issue Nov 21, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

There's one user who has an issue that one of raylets cannot schedule tasks anymore because `num_worker_not_started_by_job_config_not_exist ` > 0.

This PR adds better log messages to figure out if the root cause is the job information is not properly propagated from GCS to raylet through Redis pubsub. 

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
wuisawesome pushed a commit that referenced this issue Nov 21, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

This pin is needed to fix `test_output` on master, which broke when 4.0.0 was released. 

It may also fix the windows build (unsure). 

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
wuisawesome pushed a commit that referenced this issue Nov 21, 2021
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally). 

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(


Co-authored-by: Alex <alex@anyscale.com>
rkooo567 added a commit that referenced this issue Nov 23, 2021
…" (#20668)

This reverts commit e9132ed.

<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

Seems to break Windows build. 

```
(07:46:25) ERROR: BUILD.bazel:406:11: Compiling src/ray/common/task/task_spec.cc failed: (Exit 2): cl.exe failed: error executing command
```

<img width="487" alt="Screen Shot 2021-11-23 at 3 09 18 AM" src="https://user-images.githubusercontent.com/18510752/143013973-f157724c-4951-49a9-80c6-158d41aa4295.png">


## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
rkooo567 added a commit that referenced this issue Jun 1, 2022
This reverts commit 02f220b.

<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

Looks like this commit makes `test_ray_shutdown` way more flaky.  cc @mattip for further investigation after revert
<img width="760" alt="Screen Shot 2022-05-31 at 11 14 48 PM" src="https://user-images.githubusercontent.com/18510752/171339737-f48e6e90-391a-4235-bfac-a0aa0e563eb7.png">


## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
rkooo567 added a commit that referenced this issue Jan 16, 2023
#31454)

…28)"

This reverts commit a0c894f.

<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
andreapiso pushed a commit to andreapiso/ray that referenced this issue Jan 22, 2023
)" (ray-project#313… (ray-project#31454)

…28)"

This reverts commit a0c894f.

<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Andrea Pisoni <andreapiso@gmail.com>
jcoffi added a commit to jcoffi/ray that referenced this issue Jan 25, 2023
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
jcoffi added a commit to jcoffi/ray that referenced this issue Jan 26, 2023
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
rkooo567 pushed a commit that referenced this issue Jan 26, 2023
<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?
These flags are no longer useful because the migration has been finished. Delete them.
<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
jcoffi added a commit to jcoffi/ray that referenced this issue Feb 14, 2023
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
jcoffi added a commit to jcoffi/ray that referenced this issue Feb 15, 2023
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this issue Nov 14, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

The TFRecords release tests typically takes around 1680-1750s to
complete. Because the timeout is set to 1800s, if there's minor
variation in the job runtime, the job can timeout.

To avoid flakiness, this PR relaxes the timeout.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this issue Nov 14, 2024
…`iter_rows` (ray-project#48704)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

The `prefetch_blocks` and `prefetch_batches` parameters of `iter_rows`
have been deprecated for more than 6 months. In accordance with our API
policy, this PR removes them.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this issue Nov 14, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

We recommend `to_tf` over `iter_tf_batches`. To avoid confusion, we
shouldn’t have two similar APIs, especially if we always prefer one.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
richardliaw added a commit that referenced this issue Nov 15, 2024
## Why are these changes needed?

Adds a Sentinel value for making it possible to sort.

Fixes #42142 

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
…either assigning it to a variable or removing it. (ray-project#48118)

## Why are these changes needed?

While running the pre-commit hook of flake8, the following error occurs
if Python version is 3.12. It's because the version of flake8 is too
old.

![image](https://github.com/user-attachments/assets/7c103728-2e48-42f3-8b2f-b47ab93e560b)

version:
- python: 3.12.7
- flake8: 7.1.1
- flake8-bugbear: 24.8.19

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

Closes ray-project#48065

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: win5923 <ken89@kimo.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
…oject#48188)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->
See ray-project#47991
When running the following `flake8` command to check for errors:
```
flake8 --select E225 --extend-exclude python/ray/core/generated,python/ray/serve/generated/,python/ray/cloudpickle/,python/ray/_private/runtime_env/_clonevirtualenv.py,doc/external/,python/ray/dashboard/client/node_modules
```
the following error occurs :

![image](https://github.com/user-attachments/assets/e595a58e-677d-480f-9490-f52e62e4f0cf)

## Related issue number
Closes ray-project#48059
<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: LeoLiao123 <leoyeepaa@gmail.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

[Java] Upgrade Commons-io to 2.14

commons-io can be upgraded to 2.14.0. commons-io 2.7 is an older
version. commons-io 2.14.0 has been verified for a long time and has no
direct or indirect CVE issues.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Shilun Fan <slfan1989@apache.org>
Co-authored-by: Thomas Desrosiers <681004+thomasdesr@users.noreply.github.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

Adding IsHeadNode tag to node metrics

<img width="1823" alt="Screenshot 2024-10-24 at 6 36 57 PM"
src="https://github.com/user-attachments/assets/855919db-b08e-4966-ae50-79c6de78bd90">
<img width="1818" alt="Screenshot 2024-10-24 at 6 36 47 PM"
src="https://github.com/user-attachments/assets/cb323682-d1c5-451a-98b2-eb99aff938a1">
<img width="1818" alt="Screenshot 2024-10-24 at 6 37 28 PM"
src="https://github.com/user-attachments/assets/f783cd67-e7da-4230-9f02-fa2d625a17e3">
<img width="1824" alt="Screenshot 2024-10-24 at 6 38 08 PM"
src="https://github.com/user-attachments/assets/08998ab1-7702-4fb3-8dea-76e5c8ab5232">

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [x] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [x] Release tests
   - [x] This PR is not tested :(

---------

Signed-off-by: Vignesh Hirudayakanth <vignesh@anyscale.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
…oing_requests` (ray-project#47681) (ray-project#48274)

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->
This PR modifies the actor_options used when deploying replicas.
Deployment will use the configured `max_ongoing_requests` attribute of
the deployment config as the replica's `max_concurrency` if
the concurrency is not explicitly set. This is to prevent replica's
`max_concurrency` from capping
`max_ongoing_requests`.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->
Closes ray-project#47681

Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
…-project#48299)

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->
This PR moves `ProxyStatus` out of the `_private` directory, allowing it
to be included in the API docs. This is the final attribute of
`ServeStatus` that needs to be included in the documentation.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->
Closes ray-project#43394

---------

Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
ray-project#48415)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

I was initially confused that I couldn't join another paused task while
a debugger was in "continue" mode.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: bhuang <bhuang@anyscale.com>
Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
…block (ray-project#48266)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Currently, inside `OutputBlockBuffer` we're

1. Repeatedly copying remainder of the original block, bringing total #
of bytes copied to O(N^2) (where N is the size of the original block)
2. Creating potentially very large blocks (like in
ray-project#48236) that could overflow
underlying Arrow data types.

This change addresses both of these issues, by establishing following
protocol where

1. Finalized target blocks *are* copied, while
2. Remainder block is NOT (therefore continuing referencing original
block)

Addresses ray-project#48236

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
…DEBUG (ray-project#48301)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Currently in order to use the distributed debugger, the user has to set
`RAY_DEBUG=1`. This has two disadvantages:

1. It is disruptive to the workflow and much more overhead than just
adding the `breakpoint()` instruction and re-running the program (since
the runtime environment has to be updated and the user needs to make
sure that the driver uses the flag too e.g. by restarting the python
kernel or in the worst case the container).
2. It is very easy to forget this step and then get the impression that
the debugger is not working.

There is no reason to require `RAY_DEBUG=1` to be set (the CLI debugger
works without the flag too and in particular the flag has no impact on
performance unless the debugger is actually entered). The reason this
flag was originally introduced was as a feature flag to switch between
the CLI debugger and the UI debugger. Now that the UI debugger is
getting more mature, it is better to make it the default and let people
who want to use the CLI debugger use a `RAY_DEBUG=legacy` flag.

This PR also renames the `RAY_PDB` flag to `RAY_DEBUG_POST_MORTEM` and
unifies the usage of the flag between the old and new debugger (in
particular, with the new debugger, post mortem debugging is now off
unless the user activates it).

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Philipp Moritz <pcmoritz@gmail.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
…f Kueue (ray-project#48564)

## Why are these changes needed?

Update KubeRay + Kueue guides to use newer versions of Kueue

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [X] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
## Why are these changes needed?

Add Project operator to select_columns.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

The TFRecords release tests typically takes around 1680-1750s to
complete. Because the timeout is set to 1800s, if there's minor
variation in the job runtime, the job can timeout.

To avoid flakiness, this PR relaxes the timeout.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
…`iter_rows` (ray-project#48704)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

The `prefetch_blocks` and `prefetch_batches` parameters of `iter_rows`
have been deprecated for more than 6 months. In accordance with our API
policy, this PR removes them.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this issue Nov 15, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

We recommend `to_tf` over `iter_tf_batches`. To avoid confusion, we
shouldn’t have two similar APIs, especially if we always prefer one.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
richardliaw added a commit that referenced this issue Nov 15, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?
Fixed typo

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: mohitjain2504 <87856435+mohitjain2504@users.noreply.github.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Gene Der Su <gdsu@ucdavis.edu>
can-anyscale pushed a commit that referenced this issue Nov 19, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

Seeing the following errors for ":ray: core: flaky gpu tests" target:

```

[2024-11-15T17:50:08Z] ________ test_torch_tensor_nccl_overlap_timed[ray_start_regular1-True] _________
--
  | [2024-11-15T17:50:08Z]
  | [2024-11-15T17:50:08Z] ray_start_regular = RayContext(dashboard_url='127.0.0.1:8265', python_version='3.9.20', ray_version='3.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}')
  | [2024-11-15T17:50:08Z] overlap_gpu_communication = True
  | [2024-11-15T17:50:08Z]
  | [2024-11-15T17:50:08Z]     @pytest.mark.parametrize(
  | [2024-11-15T17:50:08Z]         "ray_start_regular, overlap_gpu_communication",
  | [2024-11-15T17:50:08Z]         [({"num_cpus": 4}, False), ({"num_cpus": 4}, True)],
  | [2024-11-15T17:50:08Z]         indirect=["ray_start_regular"],
  | [2024-11-15T17:50:08Z]     )
  | [2024-11-15T17:50:08Z]     def test_torch_tensor_nccl_overlap_timed(ray_start_regular, overlap_gpu_communication):
  | [2024-11-15T17:50:08Z]         if not USE_GPU:
  | [2024-11-15T17:50:08Z]             pytest.skip("NCCL tests require GPUs")
  | [2024-11-15T17:50:08Z]
  | [2024-11-15T17:50:08Z] >       assert (
  | [2024-11-15T17:50:08Z]             sum(node["Resources"].get("GPU", 0) for node in ray.nodes()) >= 4
  | [2024-11-15T17:50:08Z]         ), "This test requires at least 4 GPUs"
  | [2024-11-15T17:50:08Z] E       AssertionError: This test requires at least 4 GPUs
  | [2024-11-15T17:50:08Z] E       assert 2.0 >= 4
  | [2024-11-15T17:50:08Z] E        +  where 2.0 = sum(<generator object test_torch_tensor_nccl_overlap_timed.<locals>.<genexpr> at 0x7f6c8799e200>)
```

This PR makes the config consistent with ":ray: core: multi gpu tests".

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
edoakes pushed a commit that referenced this issue Nov 19, 2024
…change` RPC (#48803)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Currently, in the `LongPollHost`/`LongPollClient`, if multiple objects
are updated that a `listen_for_change` request is waiting for *before
the async task in the host can run again*, only one of those updated
objects will be returned. This is inefficient because the
`LongPollClient` will immediately do a `listen_for_change` RPC again,
and that will see outdated snapshot IDs for the updates that weren't
returned and get all of the missed updates.

This is because of an asymmetry between
https://github.com/ray-project/ray/blob/b75cb793e437aa617d61dcb13e5f5d2fcc83ee68/python/ray/serve/_private/long_poll.py#L252-L272
, which looks for *all* outdated keys, and
https://github.com/ray-project/ray/blob/b75cb793e437aa617d61dcb13e5f5d2fcc83ee68/python/ray/serve/_private/long_poll.py#L309
, which only looks at a single complete `Event`, even if multiple events
completed during the
[`wait`](https://github.com/ray-project/ray/blob/b75cb793e437aa617d61dcb13e5f5d2fcc83ee68/python/ray/serve/_private/long_poll.py#L289-L293).

To prove that the `wait` can indeed see multiple completed `Event`s, see
this example:
```python
from asyncio import wait, Event, run, create_task, FIRST_COMPLETED


async def main():
    a = Event()
    b = Event()

    wait_for_a = create_task(a.wait())
    wait_for_b = create_task(b.wait())

    a.set()
    b.set()

    done, pending = await wait([wait_for_a, wait_for_b], return_when=FIRST_COMPLETED)

    print(f"{len(done)=}")
    print(f"{len(pending)=}")

run(main())

# len(done)=2
# len(pending)=0
```

Generally this won't be a big issue because most `listen_for_change`
requests in the current Serve setup are asking for a very small number
of keys and are likely to only get one key update anyway. But, as I've
been discussing with @edoakes and @zcin on Slack, I'd like to group up
the `DeploymentHandle` `listen_for_change` RPCs under a single
`LongPollClient`, which will be requesting many keys and is therefore
more likely to hit this situation.

To complement this change, I also changed `LongPollHost.notify_changed`
so that it takes multiple updates at the same time.

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Josh Karpel <josh.karpel@gmail.com>
edoakes pushed a commit that referenced this issue Nov 20, 2024
## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->
Currently in serve.run the logging_config is not passed to controller.
This PR add this arguments into the function call so the logging_config
can be correctly specified for system-level logging.

## Related issue number
Closes #48652 
<!-- For example: "Closes #1234" -->


### Example
```
logging_config = {"log_level": "DEBUG", "logs_dir": "./mimi_debug"}
handle: DeploymentHandle = serve.run(app, logging_config=logging_config)
```

### Before
controller logs aren't saved in the specified logs_dir

<img width="326" alt="image"
src="https://github.com/user-attachments/assets/0d316428-e7a7-48e0-8d9d-1692a3045a4a">

### After
controller logs are correctly configured

<img width="325" alt="image"
src="https://github.com/user-attachments/assets/e05aba0b-75cd-4cd4-9a92-4ef8cdd84cce">

Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
rickyyx pushed a commit that referenced this issue Nov 21, 2024
…Pod's `ray.io/group` label. (#48840)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

The value of the `ray.io/group` label in the head Pod is `headgroup`,
whereas `KUBERAY_TYPE_HEAD` is `head-group`.

<img width="502" alt="image"
src="https://github.com/user-attachments/assets/9a06e643-d235-4237-a16a-ce131f3d9666">


## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: kaihsun <kaihsun@anyscale.com>
can-anyscale pushed a commit that referenced this issue Nov 22, 2024
Introduce a new Ray Train example for AWS Trainium. 

![CleanShot 2024-11-16 at 12 48

57@2x](https://github.com/user-attachments/assets/8b7d12d8-846f-497f-ba25-fd8a613f9007)

Marked it as a community example as it is something we are collaborating
with AWS Neuron team.

![CleanShot 2024-11-16 at 12 48

37@2x](https://github.com/user-attachments/assets/589d8ff3-fcb6-4b90-865d-006bcb4815a3)

Docs screenshots

<img width="1142" alt="Screenshot 2024-11-20 at 11 19 39 AM"
src="https://github.com/user-attachments/assets/aa3dadf7-96b9-46cc-8b6d-44c3e3bc3e1e">
<img width="1161" alt="Screenshot 2024-11-20 at 11 19 47 AM"
src="https://github.com/user-attachments/assets/859508fd-e47e-4758-a4c7-f15a749ece82">
<img width="1149" alt="Screenshot 2024-11-20 at 11 19 54 AM"
src="https://github.com/user-attachments/assets/28858f36-8cca-4eaa-a8ec-a1f7dda899d0">

---------

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Saihajpreet Singh <c-saihajpreet.singh@anyscale.com>
Co-authored-by: Saihajpreet Singh <c-saihajpreet.singh@anyscale.com>
zcin pushed a commit that referenced this issue Nov 22, 2024
…age and print num retries left (#48531)

## Why are these changes needed?

This change will surface the replica constructor error as soon as the
replica constructor fails for whatever reason. The exception will be
populated in the deployment status so that it's viewable from the ray
dashboard. Additionally, the number of replica constructor retries left
will also be updated in the error message. This will help users more
quickly debug a deployment that is failing to start.

## Related issue number

<!-- For example: "Closes #1234" -->
Closes #35604

Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
MortalHappiness pushed a commit to MortalHappiness/ray that referenced this issue Nov 22, 2024
…Pod's `ray.io/group` label. (ray-project#48840)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

The value of the `ray.io/group` label in the head Pod is `headgroup`,
whereas `KUBERAY_TYPE_HEAD` is `head-group`.

<img width="502" alt="image"
src="https://github.com/user-attachments/assets/9a06e643-d235-4237-a16a-ce131f3d9666">


## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: kaihsun <kaihsun@anyscale.com>
MortalHappiness pushed a commit to MortalHappiness/ray that referenced this issue Nov 22, 2024
…age and print num retries left (ray-project#48531)

## Why are these changes needed?

This change will surface the replica constructor error as soon as the
replica constructor fails for whatever reason. The exception will be
populated in the deployment status so that it's viewable from the ray
dashboard. Additionally, the number of replica constructor retries left
will also be updated in the error message. This will help users more
quickly debug a deployment that is failing to start.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->
Closes ray-project#35604

Signed-off-by: akyang-anyscale <alexyang@anyscale.com>
richardliaw pushed a commit that referenced this issue Nov 23, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

This is a follow-up to a recent change upgrading minimal supported
PyArrow version from 6.0.1 to 9.0.0

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Alexey Kudinkin <ak@anyscale.com>
rickyyx added a commit that referenced this issue Nov 24, 2024
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

Adds `idle_timeout_s` as a field to `node_type_configs`, enabling the v2
autoscaler to configure idle termination per worker type.

This PR depends on a change in KubeRay to the RayCluster CRD, since we
want to support passing `idleTimeoutSeconds` to individual worker groups
such that they can specify a custom idle duration:
ray-project/kuberay#2558

## Related issue number

Closes #36888

<!-- For example: "Closes #1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [x] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: ryanaoleary <ryanaoleary@google.com>
Signed-off-by: ryanaoleary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org>
Co-authored-by: Ricky Xu <xuchen727@hotmail.com>
rickyyx pushed a commit that referenced this issue Nov 24, 2024
…r container's stdout (#48905)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

* The Autoscaler container doesn't display information like `print("The
Ray head is ready. Starting the autoscaler.")` in STDOUT/STDERR for some
reason. To display logs to STDOUT/STDERR, we need to explicitly specify
`flush` in `print()` or use the logging module. I don't know why the
flush isn't triggered. The default end of `print` is `\n`, which should
trigger a line-buffered flush.

* Change `logging.warn` to `logging.warning` because `logging.warn` is
deprecated. See [this
doc](https://docs.python.org/3/library/logging.html#logging.Logger.warning)
for more details.

<img width="794" alt="image"
src="https://github.com/user-attachments/assets/12796aaa-ae7e-4986-96c8-94a0a42591b6">

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: kaihsun <kaihsun@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants