pt: explicitly set device #3307

njzjz · 2024-02-20T04:08:31Z

Forcely requires setting the device to env.DEVICE or cpu explicitly for functions like torch.tensor, torch.zeros, torch.ones, torch.rand, torch.eye, LayerNorm, and data loader.
This ensures that no OP runs on the wrong device. The trick here is torch.set_default_device("cuda:9999999") in the tests, so errors will be thrown if the default device is used.

Tips:
(1) Avoid torch.zeros(...).to(device=...). This first initializes memory on CPUs and copies it to GPUs.
(2) Use with torch.device(...) for a module that cannot set a device (i.e., the data loader).

Force requires setting device to `env.DEVICE` or `cpu` explictly for functions like `torch.tensor`, `torch.zeros`, `torch.ones`, `torch.rand`, `torch.eye`, and dataloader. This ensures that no OPs that should be run on GPUs run on CPUs. The trick here is `torch.set_default_device("cuda:9999999")` in the tests, so errors will be thrown if no device is set. Tips: (1) Avoid `torch.zeros(...).to(device=...)`. This firstly initlizes memory on CPUs and copy it to GPUs. (2) Use `with torch.device(...)` for third-party modules. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

codecov · 2024-02-20T04:15:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8b1ed14) 75.29% compared to head (6ba2914) 75.29%.

Additional details and impacted files

@@           Coverage Diff           @@
##            devel    #3307   +/-   ##
=======================================
  Coverage   75.29%   75.29%           
=======================================
  Files         398      398           
  Lines       33684    33694   +10     
  Branches     1604     1604           
=======================================
+ Hits        25361    25371   +10     
  Misses       7462     7462           
  Partials      861      861

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

deepmd/pt/train/training.py

deepmd/pt/utils/dataloader.py

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz requested review from iProzd, wanghan-iapcm and anyangml February 20, 2024 04:08

github-actions bot added the Python label Feb 20, 2024

njzjz added 4 commits February 19, 2024 23:26

fix trainer.get_data

51ea91e

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

set cpu

dbd567a

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

Merge branch 'devel' into pt-set-device

3400329

fix test_polar

884b566

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

wanghan-iapcm approved these changes Feb 20, 2024

View reviewed changes

fix tests

77231db

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

iProzd reviewed Feb 20, 2024

View reviewed changes

deepmd/pt/train/training.py Show resolved Hide resolved

iProzd reviewed Feb 20, 2024

View reviewed changes

deepmd/pt/utils/dataloader.py Show resolved Hide resolved

anyangml approved these changes Feb 20, 2024

View reviewed changes

iProzd approved these changes Feb 20, 2024

View reviewed changes

wanghan-iapcm added this pull request to the merge queue Feb 21, 2024

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Feb 21, 2024

Merge branch 'devel' into pt-set-device

2958b73

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz enabled auto-merge February 21, 2024 02:37

fix new as_tensor

6ba2914

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz added this pull request to the merge queue Feb 21, 2024

Merged via the queue into deepmodeling:devel with commit 139721f Feb 21, 2024
48 checks passed

njzjz deleted the pt-set-device branch February 21, 2024 04:02

njzjz mentioned this pull request Apr 2, 2024

[TYPO] #3635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pt: explicitly set device #3307

pt: explicitly set device #3307

njzjz commented Feb 20, 2024 •

edited

Loading

codecov bot commented Feb 20, 2024 •

edited

Loading

pt: explicitly set device #3307

pt: explicitly set device #3307

Conversation

njzjz commented Feb 20, 2024 • edited Loading

codecov bot commented Feb 20, 2024 • edited Loading

Codecov Report

njzjz commented Feb 20, 2024 •

edited

Loading

codecov bot commented Feb 20, 2024 •

edited

Loading