-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GPU UTs #3203
Fix GPU UTs #3203
Conversation
for more information, see https://pre-commit.ci
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## devel #3203 +/- ##
==========================================
+ Coverage 74.22% 74.32% +0.09%
==========================================
Files 313 344 +31
Lines 27343 31867 +4524
Branches 908 1592 +684
==========================================
+ Hits 20296 23685 +3389
- Misses 6510 7257 +747
- Partials 537 925 +388 ☔ View full report in Codecov by Sentry. |
This PR still has problems: when data preprocess is on GPU, some UTs will stop (e.g. test_LKF.py, test_saveload_dpa1.py, ...), I'm working on this. |
* throw errors when PyTorch CXX11 ABI is different from TensorFlow (deepmodeling#3201) If so, throw the following error: ``` -- PyTorch CXX11 ABI: 0 CMake Error at CMakeLists.txt:162 (message): PyTorch CXX11 ABI mismatch TensorFlow: 0 != 1 ``` Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> * allow disabling TensorFlow backend during Python installation (deepmodeling#3200) Fix deepmodeling#3120. One can disable building the TensorFlow backend during `pip install` by setting `DP_ENABLE_TENSORFLOW=0`. --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> * breaking: pt: add dp model format and refactor pt impl for the fitting net. (deepmodeling#3199) - add dp model format (backend independent definition) for the fitting - refactor torch support, compatible with dp model format - fix mlp issue: the idt should only be used when a skip connection is available. - add tools `to_numpy_array` and `to_torch_tensor`. --------- Co-authored-by: Han Wang <wang_han@iapcm.ac.cn> * remove duplicated fitting output check. fix codeql (deepmodeling#3202) Co-authored-by: Han Wang <wang_han@iapcm.ac.cn> --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> Co-authored-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> Co-authored-by: Han Wang <92130845+wanghan-iapcm@users.noreply.github.com> Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>
This reverts commit cb4cc67.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
I got the following errors on my local machine:
|
I found it's a PyTorch bug (pytorch/pytorch#110940) and has been fixed in v2.2.0 (released 4 hours ago). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has resolved the PT issues. I can run pytest source/tests/pt
with no problem after upgrading PyTorch to 2.2.
Another issue is that when we run TF and PT tests together (i.e., pytest source/tests
), the OOM error will be thrown. The reason might be set_memory_growth
is not set for some sessions. It can be resolved in another PR.
This PR fixes GPU UTs;
Delete the PREPROCESS_DEVICE in torch data preprocess and use training DEVICE instead, which will be removed after the dataset is refomated.