Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/n] Add graph function to list DataPipes from graph #888

Closed
wants to merge 2 commits into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Nov 7, 2022

Add a list_dps function to list DataPipes from the graph.

  • It's similar to get_all_graph_pipes from pytorch core
  • An extra argument of exclude_dps to exclude the DataPipe and its prior graph from the result.

Reason to add this function:

  • It's required to set random states differently for DataPipe before/after sharding_filter
graph = traverse_dps(datapipe)
sf_dps = find_dps(graph, ShardingFilter)

# DataPipes prior to `sharding_filter`
p_dps = []
for sf_dp in sf_dps:
    p_dps.extend(list_dps(traverse_dps(sf_dp)))

# DataPipes after `sharding_filter`
a_dps = list_dps(graph, exclude_dps=sf_dps)

Step 1 for #885

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2022
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

torchdata/dataloader2/graph/utils.py Show resolved Hide resolved
for dp_id, (dp, src_graph) in g.items():
if dp_id not in cache:
cache.add(dp_id)
dps.append(dp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, this doesn't guarantee any order and returns a list. This should suffice for the current use case.

I wonder if people will want a Dict (basically just traverse_dps without the IDs). To be clear, this seems fine to me for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we treat the graph as a tree, there can be in different orders like how do we traverse a tree. Pre-order; post-order, etc. It highly depends on users' preference especially our graph can be more complicated than a tree. I would say let's wait until we have received any use case.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan added the topic: new feature topic category label Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants