Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pt: explicitly set device #3307

Merged
merged 8 commits into from
Feb 21, 2024
Merged

pt: explicitly set device #3307

merged 8 commits into from
Feb 21, 2024

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Feb 20, 2024

Forcely requires setting the device to env.DEVICE or cpu explicitly for functions like torch.tensor, torch.zeros, torch.ones, torch.rand, torch.eye, LayerNorm, and data loader.
This ensures that no OP runs on the wrong device. The trick here is torch.set_default_device("cuda:9999999") in the tests, so errors will be thrown if the default device is used.

Tips:
(1) Avoid torch.zeros(...).to(device=...). This first initializes memory on CPUs and copies it to GPUs.
(2) Use with torch.device(...) for a module that cannot set a device (i.e., the data loader).

Force requires setting device to `env.DEVICE` or `cpu` explictly for functions like `torch.tensor`, `torch.zeros`, `torch.ones`, `torch.rand`, `torch.eye`, and dataloader. This ensures that no OPs that should be run on GPUs run on CPUs.
The trick here is `torch.set_default_device("cuda:9999999")` in the tests, so errors will be thrown if no device is set.

Tips:
(1) Avoid `torch.zeros(...).to(device=...)`. This firstly initlizes memory on CPUs and copy it to GPUs.
(2) Use `with torch.device(...)` for third-party modules.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Copy link

codecov bot commented Feb 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8b1ed14) 75.29% compared to head (6ba2914) 75.29%.

Additional details and impacted files
@@           Coverage Diff           @@
##            devel    #3307   +/-   ##
=======================================
  Coverage   75.29%   75.29%           
=======================================
  Files         398      398           
  Lines       33684    33694   +10     
  Branches     1604     1604           
=======================================
+ Hits        25361    25371   +10     
  Misses       7462     7462           
  Partials      861      861           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Feb 21, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Feb 21, 2024
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz enabled auto-merge February 21, 2024 02:37
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz njzjz added this pull request to the merge queue Feb 21, 2024
Merged via the queue into deepmodeling:devel with commit 139721f Feb 21, 2024
48 checks passed
@njzjz njzjz deleted the pt-set-device branch February 21, 2024 04:02
@njzjz njzjz mentioned this pull request Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants