-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docker to Torch 2.1.0+CUDA11.8 to resolve multi-sampler issue #377
Conversation
Did you test it with multiple sampler? |
Can you also update the README and wiki? |
@classicsong Yes! Tried with ogbn-mag dataset and it worked. But didn't see any performance improvement probably because the dataset is too small. |
You can put the update of README and rst files in the same PR. |
@classicsong Any reason we are still using DGL 1.0.4 not the latest release 1.1.1? |
does GraphStorm work with DGL 1.1.1? |
@zheng-da I have been using DGL 1.1.1 for GSF for a while now. Is there any breaking point? |
We don't know. Let's run our regression test with DGL 1.1.1 |
If this works, could you also revise this rst file to give a proper Torch and DGL installation commands? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plese update rst file too. LGTM
Resolves issue #199
Updating the torch version from
torch==1.13
totorch==2.1.0
in the docker file. Torch versions later than1.12
had a bug which did not allow us to usenum_samplers
> 0. In Pytorch 2.1.0 release the bug is resolved. We have verified the solution through the following experiments.Experiment setup:
Dataset: ogbn-mag (partitioned into 2)
DGL versions: '1.0.4+cu117' and '1.1.1+cu113'
Torch versions: '2.1.0+cu118'
Experiment 1:
1 trainer and 4 samplers
Output:
Experiment 2:
4 trainers and 4 samplers:
Output:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.