-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] dask.uniform_neighbor_sample gives different results depending on the number of GPUs used #2761
Comments
VibhuJawa
added
bug
Something isn't working
? - Needs Triage
Need team to review and classify
labels
Sep 29, 2022
Will recheck when PR 2751 is ready. Thanks for reporting so I can check this condition as well. |
24 tasks
rapids-bot bot
pushed a commit
that referenced
this issue
Sep 30, 2022
This PR fixes rapidsai/graph_dl#27 This PR fixes rapidsai/graph_dl#43 This PR fixes rapidsai/graph_dl#39 **Tests Added:** Single GPU: - [x] APIs like num_nodes, num_edges - [x] test_sampling_basic - [x] test_sampling_homogeneous_gs_in_dir - [x] test_sampling_homogeneous_gs_out_dir - [x] test_sampling_gs_homogeneous_neg_one_fanout - [x] test_sampling_gs_heterogeneous_in_dir - [x] test_sampling_gs_heterogeneous_out_dir - [x] test_sampling_gs_heterogeneous_neg_one_fanout Multi GPU: - [x] APIs like num_nodes, num_edges - [x] test_sampling_basic - [x] test_sampling_homogeneous_gs_in_dir - [x] test_sampling_homogeneous_gs_out_dir - [x] test_sampling_gs_homogeneous_neg_one_fanout - [x] test_sampling_gs_heterogeneous_in_dir - [x] test_sampling_gs_heterogeneous_out_dir - [x] test_sampling_gs_heterogeneous_neg_one_fanout Bugs to reproduce: - [ ] Repro heterogeneous single gpu hang outside pytest - [x] Repro hetrogenous multi gpu incorrect results for with_replace=False #2760 - [x] Repro hetrogenous incorrect results for different amount of GPUs #2761 Tests that depend upon #2523 - [x] Add minimal example to PR to ensure it gets fixed Added comment here: #2523 (review) - [x] test_get_node_storage_gs (Failing cause of a PG bug) - [x] test_get_edge_storage_gs (Failing cause of a PG bug) - [x] test_get_node_storage_gs (Failing cause of a PG bug) - [x] test_get_edge_storage_gs (Failing cause of a PG bug) Authors: - Vibhu Jawa (https://github.com/VibhuJawa) Approvers: - Rick Ratzel (https://github.com/rlratzel) - Brad Rees (https://github.com/BradReesWork) URL: #2592
Same analysis as #2760 |
closed by PR #2765 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
dask.uniform_neighbor_sample
gives different results depending on the number of GPUs used. Results for a single GPU are correct while others seem to be incorrectSee below MRE
Correct
Steps/Code to reproduce bug
MRE:
Cluster Stup Code
Expected behavior
I would expect correct results indepent of the number of GPUS being used.
Environment details
Additional context
Probably related Issue: #2760
Probably related PR that can fix this: #2751
CC: @ChuckHastings
CC: @alexbarghi-nv , @rlratzel
The text was updated successfully, but these errors were encountered: