one error to compress a model #795

baoqinfu · 2021-06-24T07:31:35Z

*Summary
I use "dp compress -i SiClda1.pb input.json -o SiCldaC1.pb" to compress a model and it goes wrong. How can I adress this problem?

-----------------------------------------------------------------------------------------------------------------------------------------
wanrun dp_lda1_test $ dp compress -i SiClda1.pb input.json -o SiCldaC1.pb
WARNING:tensorflow:From /home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-papython/compat/v2_compat.py:61: disable_resource_variables (from tensorflow.python.ops.varprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
DEEPMD INFO    |-> deepmd.entrypoints.compress                   


DEEPMD INFO    |-> deepmd.entrypoints.compress                   stage 1: train or refinebulation
DEEPMD INFO    |-> deepmd.entrypoints.train                       _____               ___           _     _  _   
DEEPMD INFO    |-> deepmd.entrypoints.train                      |  __ \             |  _\         | |   (_)| |  
DEEPMD INFO    |-> deepmd.entrypoints.train                      | |  | |  ___   ___ | |_ | ______ | | __ _ | |_ 
DEEPMD INFO    |-> deepmd.entrypoints.train                      | |  | | / _ \ / _ \|  _ ||______|| |/ /| || __|
DEEPMD INFO    |-> deepmd.entrypoints.train                      | |__| ||  __/|  __/| |  |        |   < | || |_ 
DEEPMD INFO    |-> deepmd.entrypoints.train                      |_____/  \___| \___||_| /         |_|\_\|_| \__|
DEEPMD INFO    |-> deepmd.entrypoints.train                      Please read and cite:
DEEPMD INFO    |-> deepmd.entrypoints.train                      Wang, Zhang, Han and E, 228, 178-184 (2018)
DEEPMD INFO    |-> deepmd.entrypoints.train                      installed to:         /t0mzgx2n/_skbuild/linux-x86_64-3.7/cmake-install
DEEPMD INFO    |-> deepmd.entrypoints.train                      source :              v1dirty
DEEPMD INFO    |-> deepmd.entrypoints.train                      source brach:         ap
DEEPMD INFO    |-> deepmd.entrypoints.train                      source commit:        5d
DEEPMD INFO    |-> deepmd.entrypoints.train                      source commit at:     20+0800
DEEPMD INFO    |-> deepmd.entrypoints.train                      build float prec:     fl
DEEPMD INFO    |-> deepmd.entrypoints.train                      build with tf inc:    /h/dp-api/tensorflow_venv/lib/python3.7/site-packages/tensorflow/include;/home/wanrun/denghow_venv/lib/python3.7/site-packages/tensorflow/include
DEEPMD INFO    |-> deepmd.entrypoints.train                      build with tf lib:    
DEEPMD INFO    |-> deepmd.run_options                            ---Summary of the traini-----------------------
DEEPMD INFO    |-> deepmd.run_options                            running on:           lo
DEEPMD INFO    |-> deepmd.run_options                            CUDA_VISIBLE_DEVICES: un
DEEPMD INFO    |-> deepmd.run_options                            num_intra_threads:    0
DEEPMD INFO    |-> deepmd.run_options                            num_inter_threads:    0
DEEPMD INFO    |-> deepmd.run_options                            -----------------------------------------------
2021-06-24 14:25:13.080366: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed KNOWN ERROR (303)
/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utils/dataerWarning: system ../data.init/C_atomSpin2/dpmd required batch size is larger than the si../data.init/C_atomSpin2/dpmd/set.000 (32 > 1)
  (self.system_dirs[ii], chk_ret[0], self.batch_size[ii], chk_ret[1]))
/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utils/dataerWarning: system ../data.init/C_atomSpin2/dpmd required test size is larger than the siz./data.init/C_atomSpin2/dpmd/set.000 (2 > 1)
  (self.system_dirs[ii], chk_ret[0], self.test_size[ii], chk_ret[1]))
/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utils/dataerWarning: system ../data.init/Si_atomSpin2/dpmd required batch size is larger than the s ../data.init/Si_atomSpin2/dpmd/set.000 (32 > 1)
  (self.system_dirs[ii], chk_ret[0], self.batch_size[ii], chk_ret[1]))
/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utils/dataerWarning: system ../data.init/Si_atomSpin2/dpmd required test size is larger than the si../data.init/Si_atomSpin2/dpmd/set.000 (2 > 1)
  (self.system_dirs[ii], chk_ret[0], self.test_size[ii], chk_ret[1]))
DEEPMD INFO    |-> deepmd.utils.data_system                      ---Summary of DataSystem------------------------------
DEEPMD INFO    |-> deepmd.utils.data_system                      found 36 system(s):
DEEPMD INFO    |-> deepmd.utils.data_system                                                
DEEPMD INFO    |-> deepmd.utils.data_system                      natoms  bch_sz  n_bch   
DEEPMD INFO    |-> deepmd.utils.data_system                                   ../data.ini       1      32       1       2  0.000
DEEPMD INFO    |-> deepmd.utils.data_system                                  ../data.init       1      32       1       2  0.000
DEEPMD INFO    |-> deepmd.utils.data_system                      -- H-4.02x02x02/02.md/sy      32       1     600       2  0.208
DEEPMD INFO    |-> deepmd.utils.data_system                      -- C-2.03x03x03/02.md/sy      54       1     600       2  0.208
DEEPMD INFO    |-> deepmd.utils.data_system                      -- H-8.02x02x01/02.md/sy      32       1     600       2  0.208
DEEPMD INFO    |-> deepmd.utils.data_system                      -- -12.02x02x01/02.md/sy      48       1     600       2  0.208
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       7       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1      21       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      17       2  0.006
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      19       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1       8       2  0.003
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       5       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1       8       2  0.003
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      15       2  0.005
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1      12       2  0.004
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      10       2  0.003
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      12       2  0.004
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      16       2  0.006
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       6       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       5       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      22       2  0.008
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1      34       2  0.012
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      18       2  0.006
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      19       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       6       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1       6       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       6       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1       5       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      27       2  0.009
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1      19       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      15       2  0.005
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      20       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      39       2  0.014
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1       9       2  0.003
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      30       2  0.010
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      40       2  0.014
DEEPMD INFO    |-> deepmd.utils.data_system                      ------------------------------------------------------

DEEPMD INFO    |-> deepmd.trainer                                training without frame p
2021-06-24 14:25:15.792888: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] : Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was nnt XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To s active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via et the envvar XLA_FLAGS=--xla_hlo_profile.
Traceback (most recent call last):
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/bin/dp", line 8, in <module>
    sys.exit(main())
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/main main
    compress(**dict_args)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/en.py", line 102, in compress
    log_path=log_path,
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/en", line 211, in train
    _do_work(jdata, run_opt)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/en", line 291, in _do_work
    model.build(data, stop_batch)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/tr4, in build
    = self.neighbor_stat.get_stat(data)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utpy", line 85, in get_stat
    dt = np.min(dt)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/numpy/cor line 2618, in amin
    initial=initial)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/numpy/cor line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
-----------------------------------------------------------------------------------------------------------------------------------------
The used deepmd version info:
DEEPMD INFO    |-> deepmd.entrypoints.train                      source :              v1.2.2-382-g5d21c7f-dirty
DEEPMD INFO    |-> deepmd.entrypoints.train                      source brach:         api-summit
DEEPMD INFO    |-> deepmd.entrypoints.train                      source commit:        5d21c7f
DEEPMD INFO    |-> deepmd.entrypoints.train                      source commit at:     2021-04-04 17:57:36 +0800

Deepmd-kit version, installation way, input file, running commands, error log, etc.

Steps to Reproduce

Further Information, Files, and Links

The text was updated successfully, but these errors were encountered:

denghuilu · 2021-06-25T07:19:51Z

Which version of deepmd-kit was used for the model training?

baoqinfu · 2021-06-25T08:29:55Z

Which version of deepmd-kit was used for the model training?

Sorry to the late reply, the version info of the deepmd-kit used for the model training is :

DEEPMD: ---Summary of the training---------------------------------------

DEEPMD: installed to: /tmp/pip-req-build-gek426j1/_skbuild/linux-x86_64-3.8/cmake-install

DEEPMD: source : v1.2.2-85-gb96112e-dirty

DEEPMD: source brach: devel

DEEPMD: source commit: `b96112e`

DEEPMD: source commit at: 2020-11-26 14:12:51 +0800

baoqinfu · 2021-07-13T10:09:40Z

I used 'dp convert-from -I old.pb -o new.pb ‘1.2’ ' to transform the old model into the new model (2.0 support). then used "/opt/deepmd-kit-2.0.0.b3/bin/dp compress -i SiClda1_c3.pb -o SiClda1_com.pb input2.json " to compress the model, but I have get the similar error:

2021-07-13 17:39:10.288795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
DEEPMD INFO

DEEPMD INFO stage 1: train or refine the model with tabulation
DEEPMD INFO ___ _ ___ _ _ _
DEEPMD INFO | \ | \ | / || \ | | ()| |
DEEPMD INFO | | | | _ _ | |) || \ / || | | | ____ | | _ | |
DEEPMD INFO | | | | / _ \ / _ | / | |/| || | | |||| |/ /| || |
DEEPMD INFO | || || /| /| | | | | || || | | < | || |
DEEPMD INFO |/ _| _||| || |_||____/ ||_|| __|
DEEPMD INFO Please read and cite:
DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO installed to: /tmp/pip-req-build-qqv2ggzp/_skbuild/linux-x86_64-3.9/cmake-install
DEEPMD INFO source : v2.0.0.b3
DEEPMD INFO source brach: HEAD
DEEPMD INFO source commit: `de428e3`
DEEPMD INFO source commit at: 2021-07-04 22:12:13 +0800
DEEPMD INFO build float prec: double
DEEPMD INFO build with tf inc: /opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/tensorflow/include;/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/tensorflow/include
DEEPMD INFO build with tf lib:
DEEPMD INFO ---Summary of the training---------------------------------------
DEEPMD INFO running on: login0
DEEPMD INFO CUDA_VISIBLE_DEVICES: unset
DEEPMD INFO num_intra_threads: 0
DEEPMD INFO num_inter_threads: 0
DEEPMD INFO -----------------------------------------------------------------
2021-07-13 17:39:18.657117: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-13 17:39:18.658367: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2021-07-13 17:39:18.658392: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-07-13 17:39:18.658412: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (login0): /proc/driver/nvidia/version does not exist
2021-07-13 17:39:18.658425: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/utils/data_system.py:156: UserWarning: system ../data/data.init/C_atomSpin2/dpmd required batch size is larger than the size of the dataset ../data/data.init/C_atomSpin2/dpmd/set.000 (32 > 1)
warnings.warn("system %s required batch size is larger than the size of the dataset %s (%d > %d)" %
/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/utils/data_system.py:156: UserWarning: system ../data/data.init/Si_atomSpin2/dpmd required batch size is larger than the size of the dataset ../data/data.init/Si_atomSpin2/dpmd/set.000 (32 > 1)
warnings.warn("system %s required batch size is larger than the size of the dataset %s (%d > %d)" %
DEEPMD INFO ---Summary of DataSystem: training -----------------------------------------------
DEEPMD INFO found 36 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO -- H-4.02x02x02/02.md/sys-0016-0016/deepmd 32 1 600 0.208 T
DEEPMD INFO -- C-2.03x03x03/02.md/sys-0027-0027/deepmd 54 1 600 0.208 T
DEEPMD INFO -- H-8.02x02x01/02.md/sys-0016-0016/deepmd 32 1 600 0.208 T
DEEPMD INFO -- -12.02x02x01/02.md/sys-0024-0024/deepmd 48 1 600 0.208 T
DEEPMD INFO ../data/data.init/C_atomSpin2/dpmd 1 32 1 0.000 T
DEEPMD INFO ../data/data.init/Si_atomSpin2/dpmd 1 32 1 0.000 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.000 32 1 7 0.002 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.010 54 1 21 0.007 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.020 32 1 17 0.006 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.030 48 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.012 54 1 8 0.003 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.022 32 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.032 48 1 8 0.003 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.003 32 1 15 0.005 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.013 54 1 12 0.004 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.023 32 1 10 0.003 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.033 48 1 12 0.004 T
DEEPMD INFO -- a/data.iters/iter.000004/02.fp/data.004 32 1 16 0.006 T
DEEPMD INFO -- a/data.iters/iter.000004/02.fp/data.024 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000005/02.fp/data.005 32 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.006 32 1 22 0.008 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.016 54 1 34 0.012 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.026 32 1 18 0.006 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.036 48 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000007/02.fp/data.007 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000007/02.fp/data.017 54 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000008/02.fp/data.008 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000008/02.fp/data.038 48 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.009 32 1 27 0.009 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.019 54 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.029 32 1 15 0.005 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.039 48 1 20 0.007 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.009 32 1 39 0.014 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.019 54 1 9 0.003 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.029 32 1 30 0.010 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.039 48 1 40 0.014 T
DEEPMD INFO --------------------------------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: validation -----------------------------------------------
DEEPMD INFO found 36 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO -- H-4.02x02x02/02.md/sys-0016-0016/deepmd 32 1 600 0.208 T
DEEPMD INFO -- C-2.03x03x03/02.md/sys-0027-0027/deepmd 54 1 600 0.208 T
DEEPMD INFO -- H-8.02x02x01/02.md/sys-0016-0016/deepmd 32 1 600 0.208 T
DEEPMD INFO -- -12.02x02x01/02.md/sys-0024-0024/deepmd 48 1 600 0.208 T
DEEPMD INFO ../data/data.init/C_atomSpin2/dpmd 1 1 1 0.000 T
DEEPMD INFO ../data/data.init/Si_atomSpin2/dpmd 1 1 1 0.000 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.000 32 1 7 0.002 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.010 54 1 21 0.007 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.020 32 1 17 0.006 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.030 48 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.012 54 1 8 0.003 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.022 32 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.032 48 1 8 0.003 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.003 32 1 15 0.005 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.013 54 1 12 0.004 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.023 32 1 10 0.003 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.033 48 1 12 0.004 T
DEEPMD INFO -- a/data.iters/iter.000004/02.fp/data.004 32 1 16 0.006 T
DEEPMD INFO -- a/data.iters/iter.000004/02.fp/data.024 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000005/02.fp/data.005 32 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.006 32 1 22 0.008 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.016 54 1 34 0.012 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.026 32 1 18 0.006 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.036 48 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000007/02.fp/data.007 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000007/02.fp/data.017 54 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000008/02.fp/data.008 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000008/02.fp/data.038 48 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.009 32 1 27 0.009 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.019 54 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.029 32 1 15 0.005 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.039 48 1 20 0.007 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.009 32 1 39 0.014 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.019 54 1 9 0.003 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.029 32 1 30 0.010 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.039 48 1 40 0.014 T
DEEPMD INFO --------------------------------------------------------------------------------------
DEEPMD INFO training without frame parameter
2021-07-13 17:39:21.475630: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-07-13 17:39:21.476746: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500000000 Hz
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #216: KMP_AFFINITY: cpuid leaf 11 not supported.
OMP: Info #216: KMP_AFFINITY: decoding legacy APIC ids.
OMP: Info #157: KMP_AFFINITY: 4 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #192: KMP_AFFINITY: 1 socket x 2 cores/socket x 2 threads/core (2 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 5763 thread 0 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6478 thread 1 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6480 thread 3 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6479 thread 2 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 5764 thread 4 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6485 thread 7 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6484 thread 6 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6482 thread 5 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 3346 thread 8 bound to OS proc set 0
2021-07-13 17:39:22.998743: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 5762 thread 9 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 7314 thread 10 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 7315 thread 11 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 7316 thread 12 bound to OS proc set 0
Traceback (most recent call last):
File "/opt/deepmd-kit-2.0.0.b3/bin/dp", line 10, in
sys.exit(main())
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 429, in main
compress(dict_args)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/compress.py", line 97, in compress
train(
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 212, in train
_do_work(jdata, run_opt)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 262, in _do_work
model.build(train_data, stop_batch)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/train/trainer.py", line 306, in build
= self.neighbor_stat.get_stat(data)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/utils/neighbor_stat.py", line 85, in get_stat
dt = np.min(dt)
File "<array_function internals>", line 5, in amin
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2858, in amin
return _wrapreduction(a, np.minimum, 'min', axis, None, out,
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity

nicklin96 · 2021-07-13T10:34:36Z

I noticed the warning:
required batch size is larger than the size of the dataset ../data/data.init/C_atomSpin2/dpmd/set.000 (32 > 1)
maybe try reducing batch size to 1?

baoqinfu · 2021-07-13T14:14:21Z

I noticed the warning:
required batch size is larger than the size of the dataset ../data/data.init/C_atomSpin2/dpmd/set.000 (32 > 1)
maybe try reducing batch size to 1?

but the batch was set "auto" in the input file for dp-compress.
I have set "batch_size" to 1 and run again, the similar error info was shown.

......
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #216: KMP_AFFINITY: cpuid leaf 11 not supported.
OMP: Info #216: KMP_AFFINITY: decoding legacy APIC ids.
OMP: Info #157: KMP_AFFINITY: 4 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #192: KMP_AFFINITY: 1 socket x 2 cores/socket x 2 threads/core (2 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 20922 thread 0 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21800 thread 1 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21801 thread 2 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21802 thread 3 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 20920 thread 4 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21804 thread 5 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21805 thread 6 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21806 thread 7 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 16836 thread 8 bound to OS proc set 0
2021-07-13 22:03:17.100625: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Traceback (most recent call last):
File "/opt/deepmd-kit-2.0.0.b3/bin/dp", line 10, in
sys.exit(main())
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 429, in main
compress(dict_args)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/compress.py", line 97, in compress
train(
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 212, in train
_do_work(jdata, run_opt)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 262, in _do_work
model.build(train_data, stop_batch)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/train/trainer.py", line 306, in build
= self.neighbor_stat.get_stat(data)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/utils/neighbor_stat.py", line 85, in get_stat
dt = np.min(dt)
File "<array_function internals>", line 5, in amin
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2858, in amin
return _wrapreduction(a, np.minimum, 'min', axis, None, out,
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity

njzjz · 2021-07-13T17:17:35Z

Can you provide your input file?

baoqinfu · 2021-07-14T01:13:19Z

Can you provide your input file?

the below is my input file:

{
"_comment": "v2.0",
"model": {
"type_map": [
"Si",
"C"
],
"descriptor": {
"type": "se_e2_a",
"sel": [
300,
300
],
"rcut_smth": 0.5,
"rcut": 9.0,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 12,
"seed": 678568530
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true,
"seed": 4111181373
}
},
"loss": {
"start_pref_e": 0.02,
"limit_pref_e": 2,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0.01,
"limit_pref_v": 1
},
"learning_rate": {
"start_lr": 0.001,
"decay_steps": 20000,
"_decay_rate": 0.95
},
"training": {
"training_data":{
"systems": "../data/",
"batch_size": "auto"
},
"validation_data":{
"systems": "../data/",
"batch_size": "auto",
"numb_btch": 4,
"_comment": "that's all"
},
"numb_steps": 4000000,
"seed": 2061570774,
"_comment": "that's all",
"disp_file": "lcurve.out",
"disp_freq": 2000,
"numb_test": 1,
"save_freq": 2000,
"save_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json"
}
}

njzjz · 2021-07-19T18:15:56Z

#869 reported a same error.

njzjz · 2021-07-27T19:50:27Z

Fixed in #882.

Merge devel to master

baoqinfu added the bug label Jun 24, 2021

amcadmus assigned nicklin96 Jul 5, 2021

njzjz linked a pull request Jul 23, 2021 that will close this issue

Fix the empty neighbor distance array in neighbor_stat.py #882

Merged

njzjz closed this as completed Jul 27, 2021

njzjz pushed a commit to njzjz/deepmd-kit that referenced this issue Sep 21, 2023

Merge pull request deepmodeling#795 from AnguseZhang/master

8dea29e

Merge devel to master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

one error to compress a model #795

one error to compress a model #795

baoqinfu commented Jun 24, 2021 •

edited

Loading

denghuilu commented Jun 25, 2021

baoqinfu commented Jun 25, 2021

baoqinfu commented Jul 13, 2021

nicklin96 commented Jul 13, 2021

baoqinfu commented Jul 13, 2021

njzjz commented Jul 13, 2021

baoqinfu commented Jul 14, 2021

njzjz commented Jul 19, 2021

njzjz commented Jul 27, 2021

one error to compress a model #795

one error to compress a model #795

Comments

baoqinfu commented Jun 24, 2021 • edited Loading

denghuilu commented Jun 25, 2021

baoqinfu commented Jun 25, 2021

Sorry to the late reply, the version info of the deepmd-kit used for the model training is :

DEEPMD: ---Summary of the training---------------------------------------

DEEPMD: installed to: /tmp/pip-req-build-gek426j1/_skbuild/linux-x86_64-3.8/cmake-install

DEEPMD: source : v1.2.2-85-gb96112e-dirty

DEEPMD: source brach: devel

DEEPMD: source commit: b96112e

DEEPMD: source commit at: 2020-11-26 14:12:51 +0800

baoqinfu commented Jul 13, 2021

I used 'dp convert-from -I old.pb -o new.pb ‘1.2’ ' to transform the old model into the new model (2.0 support). then used "/opt/deepmd-kit-2.0.0.b3/bin/dp compress -i SiClda1_c3.pb -o SiClda1_com.pb input2.json " to compress the model, but I have get the similar error:

nicklin96 commented Jul 13, 2021

baoqinfu commented Jul 13, 2021

but the batch was set "auto" in the input file for dp-compress. I have set "batch_size" to 1 and run again, the similar error info was shown.

njzjz commented Jul 13, 2021

baoqinfu commented Jul 14, 2021

njzjz commented Jul 19, 2021

njzjz commented Jul 27, 2021

baoqinfu commented Jun 24, 2021 •

edited

Loading

DEEPMD: source commit: `b96112e`

but the batch was set "auto" in the input file for dp-compress.
I have set "batch_size" to 1 and run again, the similar error info was shown.