Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one error to compress a model #795

Closed
baoqinfu opened this issue Jun 24, 2021 · 9 comments · Fixed by #882
Closed

one error to compress a model #795

baoqinfu opened this issue Jun 24, 2021 · 9 comments · Fixed by #882
Assignees
Labels

Comments

@baoqinfu
Copy link

baoqinfu commented Jun 24, 2021

*Summary
I use "dp compress -i SiClda1.pb input.json -o SiCldaC1.pb" to compress a model and it goes wrong. How can I adress this problem?

-----------------------------------------------------------------------------------------------------------------------------------------
wanrun dp_lda1_test $ dp compress -i SiClda1.pb input.json -o SiCldaC1.pb
WARNING:tensorflow:From /home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-papython/compat/v2_compat.py:61: disable_resource_variables (from tensorflow.python.ops.varprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
DEEPMD INFO    |-> deepmd.entrypoints.compress                   


DEEPMD INFO    |-> deepmd.entrypoints.compress                   stage 1: train or refinebulation
DEEPMD INFO    |-> deepmd.entrypoints.train                       _____               ___           _     _  _   
DEEPMD INFO    |-> deepmd.entrypoints.train                      |  __ \             |  _\         | |   (_)| |  
DEEPMD INFO    |-> deepmd.entrypoints.train                      | |  | |  ___   ___ | |_ | ______ | | __ _ | |_ 
DEEPMD INFO    |-> deepmd.entrypoints.train                      | |  | | / _ \ / _ \|  _ ||______|| |/ /| || __|
DEEPMD INFO    |-> deepmd.entrypoints.train                      | |__| ||  __/|  __/| |  |        |   < | || |_ 
DEEPMD INFO    |-> deepmd.entrypoints.train                      |_____/  \___| \___||_| /         |_|\_\|_| \__|
DEEPMD INFO    |-> deepmd.entrypoints.train                      Please read and cite:
DEEPMD INFO    |-> deepmd.entrypoints.train                      Wang, Zhang, Han and E, 228, 178-184 (2018)
DEEPMD INFO    |-> deepmd.entrypoints.train                      installed to:         /t0mzgx2n/_skbuild/linux-x86_64-3.7/cmake-install
DEEPMD INFO    |-> deepmd.entrypoints.train                      source :              v1dirty
DEEPMD INFO    |-> deepmd.entrypoints.train                      source brach:         ap
DEEPMD INFO    |-> deepmd.entrypoints.train                      source commit:        5d
DEEPMD INFO    |-> deepmd.entrypoints.train                      source commit at:     20+0800
DEEPMD INFO    |-> deepmd.entrypoints.train                      build float prec:     fl
DEEPMD INFO    |-> deepmd.entrypoints.train                      build with tf inc:    /h/dp-api/tensorflow_venv/lib/python3.7/site-packages/tensorflow/include;/home/wanrun/denghow_venv/lib/python3.7/site-packages/tensorflow/include
DEEPMD INFO    |-> deepmd.entrypoints.train                      build with tf lib:    
DEEPMD INFO    |-> deepmd.run_options                            ---Summary of the traini-----------------------
DEEPMD INFO    |-> deepmd.run_options                            running on:           lo
DEEPMD INFO    |-> deepmd.run_options                            CUDA_VISIBLE_DEVICES: un
DEEPMD INFO    |-> deepmd.run_options                            num_intra_threads:    0
DEEPMD INFO    |-> deepmd.run_options                            num_inter_threads:    0
DEEPMD INFO    |-> deepmd.run_options                            -----------------------------------------------
2021-06-24 14:25:13.080366: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed KNOWN ERROR (303)
/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utils/dataerWarning: system ../data.init/C_atomSpin2/dpmd required batch size is larger than the si../data.init/C_atomSpin2/dpmd/set.000 (32 > 1)
  (self.system_dirs[ii], chk_ret[0], self.batch_size[ii], chk_ret[1]))
/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utils/dataerWarning: system ../data.init/C_atomSpin2/dpmd required test size is larger than the siz./data.init/C_atomSpin2/dpmd/set.000 (2 > 1)
  (self.system_dirs[ii], chk_ret[0], self.test_size[ii], chk_ret[1]))
/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utils/dataerWarning: system ../data.init/Si_atomSpin2/dpmd required batch size is larger than the s ../data.init/Si_atomSpin2/dpmd/set.000 (32 > 1)
  (self.system_dirs[ii], chk_ret[0], self.batch_size[ii], chk_ret[1]))
/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utils/dataerWarning: system ../data.init/Si_atomSpin2/dpmd required test size is larger than the si../data.init/Si_atomSpin2/dpmd/set.000 (2 > 1)
  (self.system_dirs[ii], chk_ret[0], self.test_size[ii], chk_ret[1]))
DEEPMD INFO    |-> deepmd.utils.data_system                      ---Summary of DataSystem------------------------------
DEEPMD INFO    |-> deepmd.utils.data_system                      found 36 system(s):
DEEPMD INFO    |-> deepmd.utils.data_system                                                
DEEPMD INFO    |-> deepmd.utils.data_system                      natoms  bch_sz  n_bch   
DEEPMD INFO    |-> deepmd.utils.data_system                                   ../data.ini       1      32       1       2  0.000
DEEPMD INFO    |-> deepmd.utils.data_system                                  ../data.init       1      32       1       2  0.000
DEEPMD INFO    |-> deepmd.utils.data_system                      -- H-4.02x02x02/02.md/sy      32       1     600       2  0.208
DEEPMD INFO    |-> deepmd.utils.data_system                      -- C-2.03x03x03/02.md/sy      54       1     600       2  0.208
DEEPMD INFO    |-> deepmd.utils.data_system                      -- H-8.02x02x01/02.md/sy      32       1     600       2  0.208
DEEPMD INFO    |-> deepmd.utils.data_system                      -- -12.02x02x01/02.md/sy      48       1     600       2  0.208
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       7       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1      21       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      17       2  0.006
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      19       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1       8       2  0.003
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       5       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1       8       2  0.003
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      15       2  0.005
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1      12       2  0.004
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      10       2  0.003
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      12       2  0.004
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      16       2  0.006
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       6       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       5       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      22       2  0.008
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1      34       2  0.012
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      18       2  0.006
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      19       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       6       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1       6       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1       6       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1       5       2  0.002
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      27       2  0.009
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1      19       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      15       2  0.005
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      20       2  0.007
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      39       2  0.014
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      54       1       9       2  0.003
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      32       1      30       2  0.010
DEEPMD INFO    |-> deepmd.utils.data_system                        ../data.iters/iter.000      48       1      40       2  0.014
DEEPMD INFO    |-> deepmd.utils.data_system                      ------------------------------------------------------

DEEPMD INFO    |-> deepmd.trainer                                training without frame p
2021-06-24 14:25:15.792888: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] : Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was nnt XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To s active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via et the envvar XLA_FLAGS=--xla_hlo_profile.
Traceback (most recent call last):
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/bin/dp", line 8, in <module>
    sys.exit(main())
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/main main
    compress(**dict_args)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/en.py", line 102, in compress
    log_path=log_path,
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/en", line 211, in train
    _do_work(jdata, run_opt)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/en", line 291, in _do_work
    model.build(data, stop_batch)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/tr4, in build
    = self.neighbor_stat.get_stat(data)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/deepmd/utpy", line 85, in get_stat
    dt = np.min(dt)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/numpy/cor line 2618, in amin
    initial=initial)
  File "/home/wanrun/denghui/dp-api/tensorflow_venv/lib/python3.7/site-packages/numpy/cor line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
-----------------------------------------------------------------------------------------------------------------------------------------
The used deepmd version info:
DEEPMD INFO    |-> deepmd.entrypoints.train                      source :              v1.2.2-382-g5d21c7f-dirty
DEEPMD INFO    |-> deepmd.entrypoints.train                      source brach:         api-summit
DEEPMD INFO    |-> deepmd.entrypoints.train                      source commit:        5d21c7f
DEEPMD INFO    |-> deepmd.entrypoints.train                      source commit at:     2021-04-04 17:57:36 +0800

Deepmd-kit version, installation way, input file, running commands, error log, etc.

Steps to Reproduce

Further Information, Files, and Links

@baoqinfu baoqinfu added the bug label Jun 24, 2021
@denghuilu
Copy link
Member

Which version of deepmd-kit was used for the model training?

@baoqinfu
Copy link
Author

Which version of deepmd-kit was used for the model training?

Sorry to the late reply, the version info of the deepmd-kit used for the model training is :

DEEPMD: ---Summary of the training---------------------------------------

DEEPMD: installed to: /tmp/pip-req-build-gek426j1/_skbuild/linux-x86_64-3.8/cmake-install

DEEPMD: source : v1.2.2-85-gb96112e-dirty

DEEPMD: source brach: devel

DEEPMD: source commit: b96112e

DEEPMD: source commit at: 2020-11-26 14:12:51 +0800


@baoqinfu
Copy link
Author

I used 'dp convert-from -I old.pb -o new.pb ‘1.2’ ' to transform the old model into the new model (2.0 support). then used "/opt/deepmd-kit-2.0.0.b3/bin/dp compress -i SiClda1_c3.pb -o SiClda1_com.pb input2.json " to compress the model, but I have get the similar error:

2021-07-13 17:39:10.288795: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
WARNING:tensorflow:From /opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0
WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0
DEEPMD INFO

DEEPMD INFO stage 1: train or refine the model with tabulation
DEEPMD INFO _____ _____ __ __ _____ _ _ _
DEEPMD INFO | __ \ | __ \ | / || __ \ | | ()| |
DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |

DEEPMD INFO | | | | / _ \ / _ | / | |/| || | | |||| |/ /| || |
DEEPMD INFO | || || /| /| | | | | || || | | < | || |
DEEPMD INFO |
/ _| _||| || |_||____/ ||_|| __|
DEEPMD INFO Please read and cite:
DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
DEEPMD INFO installed to: /tmp/pip-req-build-qqv2ggzp/_skbuild/linux-x86_64-3.9/cmake-install
DEEPMD INFO source : v2.0.0.b3
DEEPMD INFO source brach: HEAD
DEEPMD INFO source commit: de428e3
DEEPMD INFO source commit at: 2021-07-04 22:12:13 +0800
DEEPMD INFO build float prec: double
DEEPMD INFO build with tf inc: /opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/tensorflow/include;/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/tensorflow/include
DEEPMD INFO build with tf lib:
DEEPMD INFO ---Summary of the training---------------------------------------
DEEPMD INFO running on: login0
DEEPMD INFO CUDA_VISIBLE_DEVICES: unset
DEEPMD INFO num_intra_threads: 0
DEEPMD INFO num_inter_threads: 0
DEEPMD INFO -----------------------------------------------------------------
2021-07-13 17:39:18.657117: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-13 17:39:18.658367: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:
2021-07-13 17:39:18.658392: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-07-13 17:39:18.658412: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (login0): /proc/driver/nvidia/version does not exist
2021-07-13 17:39:18.658425: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/utils/data_system.py:156: UserWarning: system ../data/data.init/C_atomSpin2/dpmd required batch size is larger than the size of the dataset ../data/data.init/C_atomSpin2/dpmd/set.000 (32 > 1)
warnings.warn("system %s required batch size is larger than the size of the dataset %s (%d > %d)" %
/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/utils/data_system.py:156: UserWarning: system ../data/data.init/Si_atomSpin2/dpmd required batch size is larger than the size of the dataset ../data/data.init/Si_atomSpin2/dpmd/set.000 (32 > 1)
warnings.warn("system %s required batch size is larger than the size of the dataset %s (%d > %d)" %
DEEPMD INFO ---Summary of DataSystem: training -----------------------------------------------
DEEPMD INFO found 36 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO -- H-4.02x02x02/02.md/sys-0016-0016/deepmd 32 1 600 0.208 T
DEEPMD INFO -- C-2.03x03x03/02.md/sys-0027-0027/deepmd 54 1 600 0.208 T
DEEPMD INFO -- H-8.02x02x01/02.md/sys-0016-0016/deepmd 32 1 600 0.208 T
DEEPMD INFO -- -12.02x02x01/02.md/sys-0024-0024/deepmd 48 1 600 0.208 T
DEEPMD INFO ../data/data.init/C_atomSpin2/dpmd 1 32 1 0.000 T
DEEPMD INFO ../data/data.init/Si_atomSpin2/dpmd 1 32 1 0.000 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.000 32 1 7 0.002 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.010 54 1 21 0.007 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.020 32 1 17 0.006 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.030 48 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.012 54 1 8 0.003 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.022 32 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.032 48 1 8 0.003 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.003 32 1 15 0.005 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.013 54 1 12 0.004 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.023 32 1 10 0.003 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.033 48 1 12 0.004 T
DEEPMD INFO -- a/data.iters/iter.000004/02.fp/data.004 32 1 16 0.006 T
DEEPMD INFO -- a/data.iters/iter.000004/02.fp/data.024 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000005/02.fp/data.005 32 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.006 32 1 22 0.008 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.016 54 1 34 0.012 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.026 32 1 18 0.006 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.036 48 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000007/02.fp/data.007 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000007/02.fp/data.017 54 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000008/02.fp/data.008 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000008/02.fp/data.038 48 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.009 32 1 27 0.009 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.019 54 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.029 32 1 15 0.005 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.039 48 1 20 0.007 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.009 32 1 39 0.014 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.019 54 1 9 0.003 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.029 32 1 30 0.010 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.039 48 1 40 0.014 T
DEEPMD INFO --------------------------------------------------------------------------------------
DEEPMD INFO ---Summary of DataSystem: validation -----------------------------------------------
DEEPMD INFO found 36 system(s):
DEEPMD INFO system natoms bch_sz n_bch prob pbc
DEEPMD INFO -- H-4.02x02x02/02.md/sys-0016-0016/deepmd 32 1 600 0.208 T
DEEPMD INFO -- C-2.03x03x03/02.md/sys-0027-0027/deepmd 54 1 600 0.208 T
DEEPMD INFO -- H-8.02x02x01/02.md/sys-0016-0016/deepmd 32 1 600 0.208 T
DEEPMD INFO -- -12.02x02x01/02.md/sys-0024-0024/deepmd 48 1 600 0.208 T
DEEPMD INFO ../data/data.init/C_atomSpin2/dpmd 1 1 1 0.000 T
DEEPMD INFO ../data/data.init/Si_atomSpin2/dpmd 1 1 1 0.000 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.000 32 1 7 0.002 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.010 54 1 21 0.007 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.020 32 1 17 0.006 T
DEEPMD INFO -- a/data.iters/iter.000000/02.fp/data.030 48 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.012 54 1 8 0.003 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.022 32 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000002/02.fp/data.032 48 1 8 0.003 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.003 32 1 15 0.005 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.013 54 1 12 0.004 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.023 32 1 10 0.003 T
DEEPMD INFO -- a/data.iters/iter.000003/02.fp/data.033 48 1 12 0.004 T
DEEPMD INFO -- a/data.iters/iter.000004/02.fp/data.004 32 1 16 0.006 T
DEEPMD INFO -- a/data.iters/iter.000004/02.fp/data.024 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000005/02.fp/data.005 32 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.006 32 1 22 0.008 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.016 54 1 34 0.012 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.026 32 1 18 0.006 T
DEEPMD INFO -- a/data.iters/iter.000006/02.fp/data.036 48 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000007/02.fp/data.007 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000007/02.fp/data.017 54 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000008/02.fp/data.008 32 1 6 0.002 T
DEEPMD INFO -- a/data.iters/iter.000008/02.fp/data.038 48 1 5 0.002 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.009 32 1 27 0.009 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.019 54 1 19 0.007 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.029 32 1 15 0.005 T
DEEPMD INFO -- a/data.iters/iter.000009/02.fp/data.039 48 1 20 0.007 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.009 32 1 39 0.014 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.019 54 1 9 0.003 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.029 32 1 30 0.010 T
DEEPMD INFO -- a/data.iters/iter.000010/02.fp/data.039 48 1 40 0.014 T
DEEPMD INFO --------------------------------------------------------------------------------------
DEEPMD INFO training without frame parameter
2021-07-13 17:39:21.475630: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:196] None of the MLIR optimization passes are enabled (registered 0 passes)
2021-07-13 17:39:21.476746: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500000000 Hz
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #216: KMP_AFFINITY: cpuid leaf 11 not supported.
OMP: Info #216: KMP_AFFINITY: decoding legacy APIC ids.
OMP: Info #157: KMP_AFFINITY: 4 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #192: KMP_AFFINITY: 1 socket x 2 cores/socket x 2 threads/core (2 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 5763 thread 0 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6478 thread 1 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6480 thread 3 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6479 thread 2 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 5764 thread 4 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6485 thread 7 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6484 thread 6 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 6482 thread 5 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 3346 thread 8 bound to OS proc set 0
2021-07-13 17:39:22.998743: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 5762 thread 9 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 7314 thread 10 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 7315 thread 11 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 3346 tid 7316 thread 12 bound to OS proc set 0
Traceback (most recent call last):
File "/opt/deepmd-kit-2.0.0.b3/bin/dp", line 10, in
sys.exit(main())
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 429, in main
compress(**dict_args)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/compress.py", line 97, in compress
train(
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 212, in train
_do_work(jdata, run_opt)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 262, in _do_work
model.build(train_data, stop_batch)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/train/trainer.py", line 306, in build
= self.neighbor_stat.get_stat(data)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/utils/neighbor_stat.py", line 85, in get_stat
dt = np.min(dt)
File "<array_function internals>", line 5, in amin
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2858, in amin
return _wrapreduction(a, np.minimum, 'min', axis, None, out,
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity

@nicklin96
Copy link
Collaborator

I noticed the warning:
required batch size is larger than the size of the dataset ../data/data.init/C_atomSpin2/dpmd/set.000 (32 > 1)
maybe try reducing batch size to 1?

@baoqinfu
Copy link
Author

I noticed the warning:
required batch size is larger than the size of the dataset ../data/data.init/C_atomSpin2/dpmd/set.000 (32 > 1)
maybe try reducing batch size to 1?

but the batch was set "auto" in the input file for dp-compress.
I have set "batch_size" to 1 and run again, the similar error info was shown.

......
OMP: Info #155: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #216: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #216: KMP_AFFINITY: cpuid leaf 11 not supported.
OMP: Info #216: KMP_AFFINITY: decoding legacy APIC ids.
OMP: Info #157: KMP_AFFINITY: 4 available OS procs
OMP: Info #158: KMP_AFFINITY: Uniform topology
OMP: Info #287: KMP_AFFINITY: topology layer "LL cache" is equivalent to "socket".
OMP: Info #192: KMP_AFFINITY: 1 socket x 2 cores/socket x 2 threads/core (2 total cores)
OMP: Info #218: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #172: KMP_AFFINITY: OS proc 0 maps to socket 0 core 0 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 1 maps to socket 0 core 0 thread 1
OMP: Info #172: KMP_AFFINITY: OS proc 2 maps to socket 0 core 1 thread 0
OMP: Info #172: KMP_AFFINITY: OS proc 3 maps to socket 0 core 1 thread 1
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 20922 thread 0 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21800 thread 1 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21801 thread 2 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21802 thread 3 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 20920 thread 4 bound to OS proc set 0
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21804 thread 5 bound to OS proc set 2
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21805 thread 6 bound to OS proc set 1
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 21806 thread 7 bound to OS proc set 3
OMP: Info #254: KMP_AFFINITY: pid 16836 tid 16836 thread 8 bound to OS proc set 0
2021-07-13 22:03:17.100625: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Traceback (most recent call last):
File "/opt/deepmd-kit-2.0.0.b3/bin/dp", line 10, in
sys.exit(main())
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 429, in main
compress(**dict_args)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/compress.py", line 97, in compress
train(
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 212, in train
_do_work(jdata, run_opt)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/entrypoints/train.py", line 262, in _do_work
model.build(train_data, stop_batch)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/train/trainer.py", line 306, in build
= self.neighbor_stat.get_stat(data)
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/deepmd/utils/neighbor_stat.py", line 85, in get_stat
dt = np.min(dt)
File "<array_function internals>", line 5, in amin
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2858, in amin
return _wrapreduction(a, np.minimum, 'min', axis, None, out,
File "/opt/deepmd-kit-2.0.0.b3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity

@njzjz
Copy link
Member

njzjz commented Jul 13, 2021

Can you provide your input file?

@baoqinfu
Copy link
Author

Can you provide your input file?

the below is my input file:


{
"_comment": "v2.0",
"model": {
"type_map": [
"Si",
"C"
],
"descriptor": {
"type": "se_e2_a",
"sel": [
300,
300
],
"rcut_smth": 0.5,
"rcut": 9.0,
"neuron": [
25,
50,
100
],
"resnet_dt": false,
"axis_neuron": 12,
"seed": 678568530
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"resnet_dt": true,
"seed": 4111181373
}
},
"loss": {
"start_pref_e": 0.02,
"limit_pref_e": 2,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0.01,
"limit_pref_v": 1
},
"learning_rate": {
"start_lr": 0.001,
"decay_steps": 20000,
"_decay_rate": 0.95
},
"training": {
"training_data":{
"systems": "../data/",
"batch_size": "auto"
},
"validation_data":{
"systems": "../data/",
"batch_size": "auto",
"numb_btch": 4,
"_comment": "that's all"
},
"numb_steps": 4000000,
"seed": 2061570774,
"_comment": "that's all",
"disp_file": "lcurve.out",
"disp_freq": 2000,
"numb_test": 1,
"save_freq": 2000,
"save_ckpt": "model.ckpt",
"disp_training": true,
"time_training": true,
"profiling": false,
"profiling_file": "timeline.json"
}
}


@njzjz
Copy link
Member

njzjz commented Jul 19, 2021

#869 reported a same error.

@njzjz njzjz linked a pull request Jul 23, 2021 that will close this issue
@njzjz
Copy link
Member

njzjz commented Jul 27, 2021

Fixed in #882.

@njzjz njzjz closed this as completed Jul 27, 2021
njzjz pushed a commit to njzjz/deepmd-kit that referenced this issue Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants