Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pt: support dpa2 model parallel inference #3657

Merged
merged 104 commits into from
Apr 30, 2024
Merged
Changes from 1 commit
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
ae0f799
init
CaRoLZhangxy Apr 7, 2024
96c9309
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 7, 2024
bd1927f
init
CaRoLZhangxy Apr 8, 2024
8350372
fix
CaRoLZhangxy Apr 8, 2024
28ae599
finish
CaRoLZhangxy Apr 8, 2024
1afd8fc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 8, 2024
7f6632a
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 15, 2024
29d1bec
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 17, 2024
2a7db1e
use google cuda define
CaRoLZhangxy Apr 17, 2024
6af0d63
update forward api
CaRoLZhangxy Apr 17, 2024
3020781
remove frozen model
CaRoLZhangxy Apr 17, 2024
420868f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2024
c779828
Merge branch 'dis' of https://github.com/CaRoLZhangxy/deepmd-kit into…
CaRoLZhangxy Apr 17, 2024
7591dd3
be able to compile without mpi
CaRoLZhangxy Apr 17, 2024
3d0f14d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2024
14a5fed
type to fix mpich
CaRoLZhangxy Apr 17, 2024
313a4b1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2024
a39aff3
remove unused code
CaRoLZhangxy Apr 17, 2024
31a4f0d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2024
cbb2916
update model
CaRoLZhangxy Apr 18, 2024
05686e8
upload smaller model
CaRoLZhangxy Apr 18, 2024
5dcf5f0
hack to resolve border_op problem
njzjz Apr 18, 2024
fd9177a
update dpa model
CaRoLZhangxy Apr 19, 2024
37989c7
use gpu memcpy
CaRoLZhangxy Apr 19, 2024
a13934b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
06208d2
update ut data
CaRoLZhangxy Apr 19, 2024
48b9833
Merge branch 'dis' of https://github.com/CaRoLZhangxy/deepmd-kit into…
CaRoLZhangxy Apr 19, 2024
761c1c8
update dpa model
CaRoLZhangxy Apr 19, 2024
0ed6116
update ut data
CaRoLZhangxy Apr 19, 2024
6df987c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
305245c
update ut data
CaRoLZhangxy Apr 19, 2024
8e5e41c
rollback ut and only apply new api to dpa2 model
CaRoLZhangxy Apr 19, 2024
44e0e6a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
7bc66be
update ut data
CaRoLZhangxy Apr 19, 2024
3da55f3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
50c0f46
add comments
CaRoLZhangxy Apr 19, 2024
38bcdd6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
b49d91d
add ut file
CaRoLZhangxy Apr 19, 2024
46911c1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
66303a0
fix bug
CaRoLZhangxy Apr 19, 2024
c9bc208
Merge branch 'dis' of https://github.com/CaRoLZhangxy/deepmd-kit into…
CaRoLZhangxy Apr 19, 2024
dca1202
fix type bug
CaRoLZhangxy Apr 19, 2024
60605a9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
048c2af
try to fix mpich compile error
CaRoLZhangxy Apr 19, 2024
1ff60b2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2024
b7a19cf
fix ut data
CaRoLZhangxy Apr 19, 2024
6eb03f4
low requirement at float
CaRoLZhangxy Apr 19, 2024
3dcb4ba
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 19, 2024
0b485f9
skip no balance test
CaRoLZhangxy Apr 19, 2024
7dc5815
update ut data
CaRoLZhangxy Apr 21, 2024
303644f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 21, 2024
7867e13
update lmp test data
CaRoLZhangxy Apr 21, 2024
e0a08f3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 21, 2024
993f05a
Update source/op/pt/comm.cc
CaRoLZhangxy Apr 22, 2024
535ade4
Update source/op/pt/comm.cc
CaRoLZhangxy Apr 22, 2024
f4b4481
Update source/op/pt/comm.cc
CaRoLZhangxy Apr 22, 2024
ffbc4db
Update source/op/pt/comm.cc
CaRoLZhangxy Apr 22, 2024
acf841d
Update source/op/pt/comm.cc
CaRoLZhangxy Apr 22, 2024
9473606
throw error when compiled with mpi without cuda support
CaRoLZhangxy Apr 22, 2024
bc02345
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 22, 2024
1952784
support mpich
CaRoLZhangxy Apr 22, 2024
fc2d61b
include errors.h
CaRoLZhangxy Apr 22, 2024
67b68aa
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 22, 2024
bc5f092
apply memcpy when cuda-aware = 0
CaRoLZhangxy Apr 23, 2024
e534e99
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 23, 2024
05cdd92
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 23, 2024
ff17514
Merge branch 'dis' of https://github.com/CaRoLZhangxy/deepmd-kit into…
CaRoLZhangxy Apr 23, 2024
8b23ebb
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 24, 2024
14b43aa
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 24, 2024
ffd89cd
fix no cuda error
CaRoLZhangxy Apr 24, 2024
09cc940
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 24, 2024
e406e6a
fix compile error
CaRoLZhangxy Apr 24, 2024
3959f19
print log.lammps to screen in test_cuda if failed
njzjz Apr 25, 2024
521aa28
skip dpa test on cuda ,add todo and fix codeql
CaRoLZhangxy Apr 25, 2024
1704230
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 25, 2024
a06a49c
make pre-commit.ci pass
njzjz Apr 25, 2024
63123de
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 25, 2024
f356ca5
add doc
CaRoLZhangxy Apr 25, 2024
5baefcc
Merge branch 'devel' of https://github.com/deepmodeling/deepmd-kit in…
CaRoLZhangxy Apr 26, 2024
51125b1
add doc
CaRoLZhangxy Apr 26, 2024
b09e857
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2024
32ba778
try runs-on t4
CaRoLZhangxy Apr 26, 2024
f3b55b4
Update deepmd/pt/model/descriptor/repformers.py
CaRoLZhangxy Apr 26, 2024
92ffb35
rename
CaRoLZhangxy Apr 26, 2024
8ed31c8
Merge branch 'dis' of https://github.com/CaRoLZhangxy/deepmd-kit into…
CaRoLZhangxy Apr 26, 2024
0269b81
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2024
124c2d6
fix
CaRoLZhangxy Apr 26, 2024
189fc6b
run c++ test only
CaRoLZhangxy Apr 26, 2024
0adc34c
deal with mpi not init
CaRoLZhangxy Apr 26, 2024
d368ccc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2024
26b63e0
fix doc format
CaRoLZhangxy Apr 26, 2024
0c76246
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2024
86757bf
try to fix
CaRoLZhangxy Apr 26, 2024
ecdef3e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2024
47242af
fix
CaRoLZhangxy Apr 26, 2024
d051e4b
Merge branch 'dis' of https://github.com/CaRoLZhangxy/deepmd-kit into…
CaRoLZhangxy Apr 26, 2024
8de1785
init mpi_init = 0
CaRoLZhangxy Apr 26, 2024
17b35e0
add world_size
CaRoLZhangxy Apr 26, 2024
74b21e4
add low version support
CaRoLZhangxy Apr 26, 2024
f784505
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 26, 2024
5fe1fd1
fix error
CaRoLZhangxy Apr 26, 2024
33c2798
add &
CaRoLZhangxy Apr 26, 2024
273a446
reset test.yml
CaRoLZhangxy Apr 26, 2024
beba142
add doc str in python
CaRoLZhangxy Apr 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
use gpu memcpy
CaRoLZhangxy committed Apr 19, 2024
commit 37989c7ac6587a192e91d1d5b380b9e504cdbaab
15 changes: 7 additions & 8 deletions source/op/pt/comm.cc
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
// SPDX-License-Identifier: LGPL-3.0-or-later
#ifdef GOOGLE_CUDA
CaRoLZhangxy marked this conversation as resolved.
Show resolved Hide resolved
#include <cuda.h>
#include <cuda_runtime_api.h>
#include "device.h"
#endif
#include <torch/torch.h>
#ifdef USE_MPI
@@ -90,8 +89,8 @@ class Border : public torch::autograd::Function<Border> {
} else {
#endif
#ifdef GOOGLE_CUDA
CaRoLZhangxy marked this conversation as resolved.
Show resolved Hide resolved
cudaMemcpy(recv_g1, send_g1, nsend * tensor_size * sizeof(FPTYPE),
cudaMemcpyDeviceToDevice);
gpuMemcpy(recv_g1, send_g1, nsend * tensor_size * sizeof(FPTYPE),
Fixed Show fixed Hide fixed
Fixed Show fixed Hide fixed
gpuMemcpyDeviceToDevice);
#else
memcpy(recv_g1, send_g1, nsend * tensor_size * sizeof(FPTYPE));
#endif
@@ -107,7 +106,7 @@ class Border : public torch::autograd::Function<Border> {
torch::autograd::AutogradContext* ctx,
torch::autograd::variable_list grad_output) {
#ifdef GOOGLE_CUDA
CaRoLZhangxy marked this conversation as resolved.
Show resolved Hide resolved
cudaDeviceSynchronize();
gpuDeviceSynchronize();
#endif

torch::autograd::variable_list saved_variables = ctx->get_saved_variables();
@@ -194,8 +193,8 @@ class Border : public torch::autograd::Function<Border> {
#endif
if (nrecv) {
#ifdef GOOGLE_CUDA
CaRoLZhangxy marked this conversation as resolved.
Show resolved Hide resolved
cudaMemcpy(recv_g1, send_g1, nrecv * tensor_size * sizeof(FPTYPE),
cudaMemcpyDeviceToDevice);
gpuMemcpy(recv_g1, send_g1, nrecv * tensor_size * sizeof(FPTYPE),
Fixed Show fixed Hide fixed
Fixed Show fixed Hide fixed
gpuMemcpyDeviceToDevice);
#else
memcpy(recv_g1, send_g1, nrecv * tensor_size * sizeof(FPTYPE));
#endif
@@ -209,7 +208,7 @@ class Border : public torch::autograd::Function<Border> {
}
}
#ifdef GOOGLE_CUDA
CaRoLZhangxy marked this conversation as resolved.
Show resolved Hide resolved
cudaDeviceSynchronize();
gpuDeviceSynchronize();
#endif

return {torch::Tensor(), torch::Tensor(), torch::Tensor(),