-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] TypeError from training NvNMD QNN model (-s s2) with float precision #3961
Labels
Comments
LiuGroupHNU
pushed a commit
to LiuGroupHNU/deepmd-kit
that referenced
this issue
Jul 12, 2024
LiuGroupHNU
pushed a commit
to LiuGroupHNU/deepmd-kit
that referenced
this issue
Jul 14, 2024
github-merge-queue bot
pushed a commit
that referenced
this issue
Jul 18, 2024
fix float precision problem of se_atten in line 217. fix the bug: the different energy between qnn and lammps <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Improved energy calculation methods for more accurate results in the `wrap` module. - Introduced new parameters for enhanced configurability in energy-related computations. - **Improvements** - Enhanced handling and processing of energy shift arrays for better performance and accuracy. - Updated array manipulation and calculation methods for various wrapping functionalities. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: LiuGroupHNU <liujie123@HNU> Co-authored-by: MoPinghui <mopinghui1020@gmail.com> Co-authored-by: Han Wang <92130845+wanghan-iapcm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pinghui Mo <pinghui_mo@outlook.com>
mtaillefumier
pushed a commit
to mtaillefumier/deepmd-kit
that referenced
this issue
Sep 18, 2024
… (deepmodeling#3978) fix float precision problem of se_atten in line 217. fix the bug: the different energy between qnn and lammps <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Improved energy calculation methods for more accurate results in the `wrap` module. - Introduced new parameters for enhanced configurability in energy-related computations. - **Improvements** - Enhanced handling and processing of energy shift arrays for better performance and accuracy. - Updated array manipulation and calculation methods for various wrapping functionalities. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: LiuGroupHNU <liujie123@HNU> Co-authored-by: MoPinghui <mopinghui1020@gmail.com> Co-authored-by: Han Wang <92130845+wanghan-iapcm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pinghui Mo <pinghui_mo@outlook.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Bug summary
When training NvNMD QNN model (-s s2) in version 2.2.11 trained with float precision (export DP_INTERFACE_PREC=low), the log showed that the data type of g_t is float64.
DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)
It seems that this variable doesn't do the data type conversion.
DeePMD-kit Version
v2.2.11
Backend and its version
TensorFlow v2.14.0
How did you download the software?
docker
Input Files, Running Commands, Error Log, etc.
DEEPMD INFO training without frame parameter
DEEPMD INFO data stating... (this step may take long time)
2024-07-10 02:45:59.699397: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
DEEPMD INFO built lr
DEEPMD INFO the range of s is [-0.0, 6.388733386993408]
DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)
Traceback (most recent call last):
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 551, in _ExtractInputsAndAttrs
values = ops.convert_to_tensor(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 698, in convert_to_tensor
return tensor_conversion_registry.convert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 209, in convert
return overload(dtype, name) # pylint: disable=not-callable
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor.py", line 762, in tf_tensor
raise ValueError(
ValueError: w: Tensor conversion requested dtype float32 for Tensor with dtype float64: <tf.Tensor 'filter_type_all/g_t/EnsureShape:0' shape=(?, 32) dtype=float64>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 92, in main
train_nvnmd(**dict_args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/entrypoints/train.py", line 187, in train_nvnmd
train(**jdata)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 308, in build
self._build_network(data, suffix)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 385, in _build_network
self.model_pred = self.model.build(
^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/ener.py", line 222, in build
dout = self.build_descrpt(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/model.py", line 290, in build_descrpt
dout = self.descrpt.build(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 626, in build
self.dout, self.qmat = self._pass_filter(
^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 685, in _pass_filter
layer, qmat = self._filter(
^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/common.py", line 258, in wrapper
returned_tensor = func(
^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1269, in _filter
xyz_scatter_1 = self._filter_lower(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1104, in _filter_lower
return filter_lower_R42GR(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/descriptor/se_atten.py", line 217, in filter_lower_R42GR
G = op_module.mul_flt_nvnmd(G, two_embd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 2276, in mul_flt_nvnmd
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 778, in _apply_op_helper
_ExtractInputsAndAttrs(op_type_name, op_def, allowed_list_attr_map,
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 589, in _ExtractInputsAndAttrs
raise TypeError(
TypeError: Input 'w' of 'MulFltNvnmd' Op has type float64 that does not match type float32 of argument 'x'.
Steps to Reproduce
export DP_INTERFACE_PREC=low; export OMP_NUM_THREADS=8; dp train-nvnmd cnn.json --skip-neighbor-stat -s s1 >> train.log 2>&1 ; dp train-nvnmd qnn.json --skip-neighbor-stat -s s2 >> train.log 2>&1
Further Information, Files, and Links
No response
The text was updated successfully, but these errors were encountered: