-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][compiled-graphs] Don't persist input_nodes in _CollectiveOperation to avoid wrong understanding about DAGs #48463
Merged
rkooo567
merged 2 commits into
ray-project:master
from
kevin85421:remove-input-nodes-from-collective-nodes
Nov 4, 2024
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only scan certain fields for
dag_node.py::_collect_upstream_nodes()
, say not all dict entries ofother_args_to_resolve
, but excludeCOLLECTIVE_OPERATION_KEY
?I think in any case we should mention in the docstring of
dag_node.py::_collect_upstream_nodes()
its assumptions. Currently the assumption is all nodes appear in the following are considered as upstream nodes:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on this. It sounds ok to not scan everything in
other_args_to_resolve
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered skipping
COLLECTIVE_OPERATION_KEY
. My concern is that it seems a bit odd for a basic class (DAGNode
) to implement logic from other classes built on top of it. Additionally, the code path applies to both DAG and ADAG. I am a bit worried about the complexity in the future if we add more and more ADAG-specific logic inside the shared code path. HDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you two think skipping the key is better, I will update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps add a new field
_bound_other_args_not_to_resolve
and avoid scanning it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the expected upstream nodes for a
CollectiveOutputNode
? Are they all the input nodes from all the actors, or simply the only one input node from the same actor?What are the potential issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chatted with Kaihsun a bit yesterday, but +1 to Weixin's question.
I think the key issue is what's the definition of "upstream nodes", especially in the special case of collectives mentioned above. This definition needs to make sense based on how we use them in DAG and ADAG. Once this is clarified, we know what should be the right thing to do. @kevin85421 Can we define that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now the upstream nodes for a
CollectiveOutputNode
should be theargs
of theDAGNode
so that DAG and ADAG can have the same understanding for the same graph.For example,
compiled_dag_node.py
sets up the upstream/downstream relationship inside preprocess by treatingargs
as aDAGNode
's upstream nodes.However, in
dag_node.py
, allDAGNode
s insideself._bound_args
,self._bound_kwargs
, andself._bound_other_args_to_resolve
are considered as the upstream nodes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync offline with @ruisearch42 : update the comments, and open an issue to track the progress #48520.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our conclusion is introducing a new field only when we observe more and more issues are caused by the inconsistency.