Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renumber utility function for sampling output #3707

Merged
merged 24 commits into from
Jul 24, 2023

Conversation

seunghwak
Copy link
Contributor

@seunghwak seunghwak commented Jul 13, 2023

Implements a utility function to renumber sampling function (e.g. uniform_neighbor_sample) outputs based on DGL and PyG requirements.

closes #3718

@seunghwak seunghwak requested a review from a team as a code owner July 13, 2023 21:18
@seunghwak
Copy link
Contributor Author

Define the function API first. Several questions to be resolved.

  1. Does multi-GPU implementation make sense? If we are processing only one batch, sampled output will be small and it will easily fit within a single GPU. If we are processing many batches, different batches will be independently processed in different GPUs. I don't see a practical multi-GPU use case. Any thoughts?
  2. Say edgelist_srcs is [0, 5, 5] and edgelist_hops is [1, 0, 2]. Then, should we associate hop # 0 to vertex 5 or hop # 2 to vertex 5? (so [0, 5] should be renumbered to [1, 0] or [0,1]?). If vertex 5 is associated with hop # 0 & 2, this API assumed that vertex 5 is associated with the minimum hop # (i.e. 0). Any concerns with this?

Besides the questions above, do you have any other concerns?

@ChuckHastings @tingyu66 @VibhuJawa @alexbarghi-nv

@seunghwak seunghwak self-assigned this Jul 13, 2023
@seunghwak seunghwak added feature request New feature or request non-breaking Non-breaking change labels Jul 13, 2023
@seunghwak seunghwak added this to the 23.08 milestone Jul 13, 2023
@alexbarghi-nv
Copy link
Member

Define the function API first. Several questions to be resolved.

  1. Does multi-GPU implementation make sense? If we are processing only one batch, sampled output will be small and it will easily fit within a single GPU. If we are processing many batches, different batches will be independently processed in different GPUs. I don't see a practical multi-GPU use case. Any thoughts?
  2. Say edgelist_srcs is [0, 5, 5] and edgelist_hops is [1, 0, 2]. Then, should we associate hop # 0 to vertex 5 or hop # 2 to vertex 5? (so [0, 5] should be renumbered to [1, 0] or [0,1]?). If vertex 5 is associated with hop # 0 & 2, this API assumed that vertex 5 is associated with the minimum hop # (i.e. 0). Any concerns with this?

Besides the questions above, do you have any other concerns?

@ChuckHastings @tingyu66 @VibhuJawa @alexbarghi-nv

For (1): We are already partitioning, sending all samples for a single batch to a single GPU. So single GPU implementation is ok. There should be no comms/transfer between GPUs. Each worker should just run this function independent of all other workers.

For (2): I think we always want the minimum hop. So this behavior is fine.

@alexbarghi-nv
Copy link
Member

@seunghwak this API looks good to me 👍

@tingyu66
Copy link
Member

tingyu66 commented Jul 13, 2023

Define the function API first. Several questions to be resolved.

  1. Does multi-GPU implementation make sense? If we are processing only one batch, sampled output will be small and it will easily fit within a single GPU. If we are processing many batches, different batches will be independently processed in different GPUs. I don't see a practical multi-GPU use case. Any thoughts?
  2. Say edgelist_srcs is [0, 5, 5] and edgelist_hops is [1, 0, 2]. Then, should we associate hop # 0 to vertex 5 or hop # 2 to vertex 5? (so [0, 5] should be renumbered to [1, 0] or [0,1]?). If vertex 5 is associated with hop # 0 & 2, this API assumed that vertex 5 is associated with the minimum hop # (i.e. 0). Any concerns with this?

Besides the questions above, do you have any other concerns?

@ChuckHastings @tingyu66 @VibhuJawa @alexbarghi-nv

2). Yes, a src vertex should be associated with its minimum hop, so [0, 5] should be renumbered to [1, 0] in that case.

@VibhuJawa
Copy link
Member

2. Say edgelist_srcs is [0, 5, 5] and edgelist_hops is [1, 0, 2]. Then, should we associate hop # 0 to vertex 5 or hop # 2 to vertex 5? (so [0, 5] should be renumbered to [1, 0] or [0,1]?). If vertex 5 is associated with hop # 0 & 2, this API assumed that vertex 5 is associated with the minimum hop # (i.e. 0). Any concerns with this?

Yes, Agreed it should be associated with the minimum hop.

API looks good to me.

@seunghwak
Copy link
Contributor Author

OK, I will drop the multi_gpu flag and implement this.

@seunghwak seunghwak requested a review from a team as a code owner July 17, 2023 23:16
Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, just needs a C API. I assume @ChuckHastings is working on that?

@@ -0,0 +1,49 @@
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2023

Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, working.

@ChuckHastings
Copy link
Collaborator

/merge

@rapids-bot rapids-bot bot merged commit 4a1537b into rapidsai:branch-23.08 Jul 24, 2023
56 checks passed
@seunghwak seunghwak deleted the fea_mfg branch July 24, 2023 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
None yet
5 participants