-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when running LAMMPS in the devel branch #4161
Comments
@wujing81 Apologies for the confusion during installation; I faced the same issue while debugging. The problem arises because DPA2 requires the But why is this option False in default? @njzjz @CaRoLZhangxy To my understanding, users who want to use dpa2 model with lammps must need this option. BTW, the doc mentioned this option here may be not so clear? https://docs.deepmodeling.com/projects/deepmd/en/latest/install/install-from-source.html#envvar-DP_ENABLE_PYTORCH |
xref: #3891 (comment) I am not going to change the default option to True until PyTorch fixes pytorch/pytorch#78530. |
Fix deepmodeling#4161. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Fix #4161. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added installation requirements for the DPA-2 model in the documentation, including customized OP library instructions. - **Improvements** - Enhanced error messaging in the `border_op` function for better user guidance. - Clarified parameter handling and documentation in the `DescrptBlockRepformers` class. - Improved logic for processing input tensors and neighbor lists in the `forward` method. - Strengthened input statistics handling in the `compute_input_stats` method. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Summary
I created a container node registry.dp.tech/dptech/deepmd-kit:3.0.0b3-cuda12.1 using the Bourium platform. Then I installed the devel branch of DeepMD-kit with:
conda create -n deepmd-dev python=3.10
source activate deepmd-dev
pip install git+https://github.com/deepmodeling/deepmd-kit.git@devel
rsync -a --ignore-existing /opt/deepmd-kit-3.0.0b3/envs/deepmd-dev/ /opt/deepmd-kit-3.0.0b3/
The command /opt/deepmd-kit-3.0.0b3/bin/dp --version displays: DeePMD-kit v3.0.0b4.dev56+g0b72dae3.
I trained a model using this version of dp, and the training input file is attached. I used dp --pt freeze to get a .pth file. Then, I used this model to run MD simulations with the command /opt/deepmd-kit-3.0.0b3/bin/lmp -i lammps.in. The input.lammps and conf.lmp files are attached.
An error occurs:
[bohrium-11849-1195151:01982] mca_base_component_repository_open: unable to open mca_btl_openib: librdmacm.so.1: cannot open shared object file: No such file or directory (ignored)
LAMMPS (2 Aug 2023)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.11.0
2024-09-24 15:37:29.837816: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-24 15:37:29.837871: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-24 15:37:29.837882: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Loaded 1 plugins from /opt/deepmd-kit-3.0.0b3/lib/deepmd_lmp
Reading data file ...
triclinic box = (0 0 0) to (12.4447 12.4447 12.4447) with tilt (0 0 0)
1 by 1 by 1 MPI processor grid
reading atoms ...
192 atoms
read_data CPU = 0.003 seconds
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Summary of lammps deepmd module ...
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
update: every = 10 steps, delay = 0 steps, check = no
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 6.5
ghost atom cutoff = 6.5
binsize = 3.25, bins = 4 4 4
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair deepmd, perpetual
attributes: full, newton on
pair build: full/bin/atomonly
stencil: full/bin/3d
bin: standard
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.0005
ERROR on proc 0: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend JIT error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/deepmd/pt/model/model/ener_model.py", line 56, in forward_lower
comm_dict: Optional[Dict[str, Tensor]]=None) -> Dict[str, Tensor]:
_5 = (self).need_sorted_nlist_for_lower()
model_ret = (self).forward_common_lower(extended_coord, extended_atype, nlist, mapping, fparam, aparam, do_atomic_virial, comm_dict, _5, )
~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_6 = (self).get_fitting_net()
model_predict = annotate(Dict[str, Tensor], {})
File "code/torch/deepmd/pt/model/model/ener_model.py", line 213, in forward_common_lower
cc_ext, _36, fp, ap, input_prec, = _35
atomic_model = self.atomic_model
atomic_ret = (atomic_model).forward_common_atomic(cc_ext, extended_atype, nlist0, mapping, fp, ap, comm_dict, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_37 = (self).atomic_output_def()
training = self.training
File "code/torch/deepmd/pt/model/atomic_model/energy_atomic_model.py", line 50, in forward_common_atomic
ext_atom_mask = (self).make_atom_mask(extended_atype, )
_3 = torch.where(ext_atom_mask, extended_atype, 0)
ret_dict = (self).forward_atomic(extended_coord, _3, nlist, mapping, fparam, aparam, comm_dict, )
~~~~~~~~~~~~~~~~~~~~ <--- HERE
ret_dict0 = (self).apply_out_stat(ret_dict, atype, )
_4 = torch.slice(torch.slice(ext_atom_mask), 1, None, nloc)
File "code/torch/deepmd/pt/model/atomic_model/energy_atomic_model.py", line 93, in forward_atomic
pass
descriptor = self.descriptor
_16 = (descriptor).forward(extended_coord, extended_atype, nlist, mapping, comm_dict, )
~~~~~~~~~~~~~~~~~~~ <--- HERE
descriptor0, rot_mat, g2, h2, sw, = _16
fitting_net = self.fitting_net
File "code/torch/deepmd/pt/model/descriptor/dpa2.py", line 98, in forward
repformers1 = self.repformers
_17 = nlist_dict[_1(_16, (repformers1).get_nsel(), )]
_18 = (repformers).forward(_17, extended_coord, extended_atype, g13, mapping0, comm_dict0, )
~~~~~~~~~~~~~~~~~~~ <--- HERE
g14, g2, h2, rot_mat, sw, = _18
concat_output_tebd = self.concat_output_tebd
File "code/torch/deepmd/pt/model/descriptor/repformers.py", line 364, in forward
_65 = "border_op is not available since customized PyTorch OP library is not built when freezing the model."
_66 = uninitialized(Tensor)
ops.prim.RaiseException(_65, "builtins.NotImplementedError")
The text was updated successfully, but these errors were encountered: