[BUG] TypeError from training NvNMD QNN model (-s s2) with float precision #3961

jiongwalai · 2024-07-10T01:58:23Z

Bug summary

When training NvNMD QNN model (-s s2) in version 2.2.11 trained with float precision (export DP_INTERFACE_PREC=low), the log showed that the data type of g_t is float64.

DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)

It seems that this variable doesn't do the data type conversion.

DeePMD-kit Version

v2.2.11

Backend and its version

TensorFlow v2.14.0

How did you download the software?

docker

Input Files, Running Commands, Error Log, etc.

DEEPMD INFO training without frame parameter
DEEPMD INFO data stating... (this step may take long time)
2024-07-10 02:45:59.699397: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
DEEPMD INFO built lr
DEEPMD INFO the range of s is [-0.0, 6.388733386993408]
DEEPMD DEBUG #u: Tensor("u/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #rji: Tensor("rji/EnsureShape:0", shape=(?, 3), dtype=float32)
DEEPMD DEBUG #s_s: Tensor("s_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h_s: Tensor("h_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #s: Tensor("s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #h: Tensor("h/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #Rxyz: Tensor("Rxyz/FltNvnmd:0", dtype=float32)
DEEPMD INFO use the compressible model with stripped type embedding
DEEPMD DEBUG #g_s: Tensor("filter_type_all/g_s/FltNvnmd:0", dtype=float32)
DEEPMD DEBUG #g_t: Tensor("filter_type_all/g_t/FltNvnmd:0", dtype=float64)
Traceback (most recent call last):
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 551, in _ExtractInputsAndAttrs
values = ops.convert_to_tensor(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 698, in convert_to_tensor
return tensor_conversion_registry.convert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 209, in convert
return overload(dtype, name) # pylint: disable=not-callable
^^^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/tensor.py", line 762, in tf_tensor
raise ValueError(
ValueError: w: Tensor conversion requested dtype float32 for Tensor with dtype float64: <tf.Tensor 'filter_type_all/g_t/EnsureShape:0' shape=(?, 32) dtype=float64>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd_utils/main.py", line 657, in main
deepmd_main(args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/main.py", line 92, in main
train_nvnmd(**dict_args)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/entrypoints/train.py", line 187, in train_nvnmd
train(**jdata)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 168, in train
_do_work(jdata, run_opt, is_compress)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/entrypoints/train.py", line 280, in _do_work
model.build(train_data, stop_batch, origin_type_map=origin_type_map)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 308, in build
self._build_network(data, suffix)
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/train/trainer.py", line 385, in _build_network
self.model_pred = self.model.build(
^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/ener.py", line 222, in build
dout = self.build_descrpt(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/model/model.py", line 290, in build_descrpt
dout = self.descrpt.build(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 626, in build
self.dout, self.qmat = self._pass_filter(
^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 685, in _pass_filter
layer, qmat = self._filter(
^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/common.py", line 258, in wrapper
returned_tensor = func(
^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1269, in _filter
xyz_scatter_1 = self._filter_lower(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/descriptor/se_atten.py", line 1104, in _filter_lower
return filter_lower_R42GR(
^^^^^^^^^^^^^^^^^^^
File "/opt/deepmd-kit/lib/python3.11/site-packages/deepmd/nvnmd/descriptor/se_atten.py", line 217, in filter_lower_R42GR
G = op_module.mul_flt_nvnmd(G, two_embd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 2276, in mul_flt_nvnmd
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 778, in _apply_op_helper
_ExtractInputsAndAttrs(op_type_name, op_def, allowed_list_attr_map,
File "/opt/deepmd-kit/lib/python3.11/site-packages/tensorflow/python/framework/op_def_library.py", line 589, in _ExtractInputsAndAttrs
raise TypeError(
TypeError: Input 'w' of 'MulFltNvnmd' Op has type float64 that does not match type float32 of argument 'x'.

Steps to Reproduce

export DP_INTERFACE_PREC=low; export OMP_NUM_THREADS=8; dp train-nvnmd cnn.json --skip-neighbor-stat -s s1 >> train.log 2>&1 ; dp train-nvnmd qnn.json --skip-neighbor-stat -s s2 >> train.log 2>&1

Further Information, Files, and Links

No response

The text was updated successfully, but these errors were encountered:

…ing#3961)

fix float precision problem of se_atten in line 217. fix the bug: the different energy between qnn and lammps  ## Summary by CodeRabbit - **New Features** - Improved energy calculation methods for more accurate results in the `wrap` module. - Introduced new parameters for enhanced configurability in energy-related computations. - **Improvements** - Enhanced handling and processing of energy shift arrays for better performance and accuracy. - Updated array manipulation and calculation methods for various wrapping functionalities.  --------- Co-authored-by: LiuGroupHNU <liujie123@HNU> Co-authored-by: MoPinghui <mopinghui1020@gmail.com> Co-authored-by: Han Wang <92130845+wanghan-iapcm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pinghui Mo <pinghui_mo@outlook.com>

… (deepmodeling#3978) fix float precision problem of se_atten in line 217. fix the bug: the different energy between qnn and lammps  ## Summary by CodeRabbit - **New Features** - Improved energy calculation methods for more accurate results in the `wrap` module. - Introduced new parameters for enhanced configurability in energy-related computations. - **Improvements** - Enhanced handling and processing of energy shift arrays for better performance and accuracy. - Updated array manipulation and calculation methods for various wrapping functionalities.  --------- Co-authored-by: LiuGroupHNU <liujie123@HNU> Co-authored-by: MoPinghui <mopinghui1020@gmail.com> Co-authored-by: Han Wang <92130845+wanghan-iapcm@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pinghui Mo <pinghui_mo@outlook.com>

jiongwalai added the bug label Jul 10, 2024

njzjz assigned LiuGroupHNU Jul 11, 2024

LiuGroupHNU pushed a commit to LiuGroupHNU/deepmd-kit that referenced this issue Jul 12, 2024

nvnmd, fix float precision problem of se_atten in line 217 (deepmodel…

9d56ee8

…ing#3961)

LiuGroupHNU pushed a commit to LiuGroupHNU/deepmd-kit that referenced this issue Jul 14, 2024

nvnmd, fix float precision problem of se_atten in line 217 (deepmodel…

50f57b6

…ing#3961)

wanghan-iapcm linked a pull request Jul 15, 2024 that will close this issue

fix float precision problem of se_atten in line 217 (#3961) #3978

Merged

njzjz closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] TypeError from training NvNMD QNN model (-s s2) with float precision #3961

[BUG] TypeError from training NvNMD QNN model (-s s2) with float precision #3961

jiongwalai commented Jul 10, 2024

[BUG] TypeError from training NvNMD QNN model (-s s2) with float precision #3961

[BUG] TypeError from training NvNMD QNN model (-s s2) with float precision #3961

Comments

jiongwalai commented Jul 10, 2024

Bug summary

DeePMD-kit Version

Backend and its version

How did you download the software?

Input Files, Running Commands, Error Log, etc.

Steps to Reproduce

Further Information, Files, and Links