-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix_bug simplify.py #1114
fix_bug simplify.py #1114
Conversation
to fix the break of dpgen simplify with "labeled true", when all candidates are picked but still have non-zero failed frames, which give output files: ``` ls iter.000002/01.model_devi/data.rest/* ... iter.000002/01.model_devi/data.rest/my_elements: box.raw coord.raw energy.raw force.raw type_map.raw type.raw virial.raw ... ``` no `set.000` within, then model_devi of iter.000003 breaks, due to no labeld systems can be loaded, where `iter.000003/01.model_devi/data.rest.old` is the symlink of `iter.000002/01.model_devi/data.rest` https://github.com/deepmodeling/dpgen/blob/355f8eda0c212fe3a072f0865d1ac0d0d7c753b1/dpgen/simplify/simplify.py#L171 **clues** rest_idx only contains not-picked candidates_idx of this iter ``` /dpgen/simplify/simplify.py def post_model_devi(iter_index, jdata, mdata): ... counter = {"candidate": sys_candinate.get_nframes(), "accurate": sys_accurate.get_nframes(), "failed": sys_failed.get_nframes()} <<<--- .... # candinate: pick up randomly iter_pick_number = jdata['iter_pick_number'] idx = np.arange(counter['candidate']) <<--- assert(len(idx) == len(labels)) np.random.shuffle(idx) pick_idx = idx[:iter_pick_number] rest_idx = idx[iter_pick_number:] <--- ``` <--- https://github.com/deepmodeling/dpgen/blob/355f8eda0c212fe3a072f0865d1ac0d0d7c753b1/dpgen/simplify/simplify.py#L292 but "rest_systems" for next iter should contain both "not-picked candidates" and "sys_failed" of this iter: ``` for j in rest_idx: sys_name, sys_id = labels[j] rest_systems.append(sys_candinate[sys_name][sys_id]) rest_systems += sys_failed <--- ``` <--- https://github.com/deepmodeling/dpgen/blob/355f8eda0c212fe3a072f0865d1ac0d0d7c753b1/dpgen/simplify/simplify.py#L314 thus the size passed to `set_size` should be `rest_systems.get_nframes()`, didn't find the necessary of the size_check of the deleted "if line", thought it the insurance of set_zise =0 when passing the size of rest_idx **when break, err would be like**: std output of dpgen: ``` INFO:dpgen:-------------------------iter.000003 task 05-------------------------- Traceback (most recent call last): File "/opt/anaconda3/bin/dpgen", line 8, in <module> sys.exit(main()) File "/opt/anaconda3/lib/python3.8/site-packages/dpgen/main.py", line 185, in main args.func(args) File "/opt/anaconda3/lib/python3.8/site-packages/dpgen/simplify/simplify.py", line 535, in gen_simplify run_iter(args.PARAM, args.MACHINE) File "/opt/anaconda3/lib/python3.8/site-packages/dpgen/simplify/simplify.py", line 508, in run_iter post_model_devi(ii, jdata, mdata) File "/opt/anaconda3/lib/python3.8/site-packages/dpgen/simplify/simplify.py", line 250, in post_model_devi sys_entire = dpdata.MultiSystems(type_map = type_map).from_deepmd_npy(os.path.join(work_path, rest_data_name + ".old"), labeled=labeled) File "/opt/anaconda3/lib/python3.8/site-packages/dpdata/system.py", line 1465, in from_format return self.from_fmt_obj(ff(), file_name, **kwargs) File "/opt/anaconda3/lib/python3.8/site-packages/dpdata/system.py", line 1188, in from_fmt_obj system = LabeledSystem().from_fmt_obj(fmtobj, dd, **kwargs) File "/opt/anaconda3/lib/python3.8/site-packages/dpdata/system.py", line 1078, in from_fmt_obj data = fmtobj.from_labeled_system(file_name, **kwargs) File "/opt/anaconda3/lib/python3.8/site-packages/dpdata/plugins/deepmd.py", line 60, in from_labeled_system return dpdata.deepmd.comp.to_system_data(file_name, type_map=type_map, labels=True) File "/opt/anaconda3/lib/python3.8/site-packages/dpdata/deepmd/comp.py", line 50, in to_system_data data['cells'] = np.concatenate(all_cells, axis = 0) File "<__array_function__ internals>", line 5, in concatenate ValueError: need at least one array to concatenate ``` iter.000003/01.model_devi/model_devi.log: ``` WARNING:tensorflow:From /opt/deepmd-kit-2.1.5/lib/python3.10/site-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. WARNING:root:Environment variable KMP_BLOCKTIME is empty. Use the default value 0 WARNING:root:Environment variable KMP_AFFINITY is empty. Use the default value granularity=fine,verbose,compact,1,0 /opt/deepmd-kit-2.1.5/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged. _bootstrap._exec(spec, module) Traceback (most recent call last): File "/opt/deepmd-kit-2.1.5/bin/dp", line 10, in <module> sys.exit(main()) File "/opt/deepmd-kit-2.1.5/lib/python3.10/site-packages/deepmd/entrypoints/main.py", line 576, in main make_model_devi(**dict_args) File "/opt/deepmd-kit-2.1.5/lib/python3.10/site-packages/deepmd/infer/model_devi.py", line 199, in make_model_devi dp_data = DeepmdData(system, set_prefix, shuffle_test=False, type_map=tmap) File "/opt/deepmd-kit-2.1.5/lib/python3.10/site-packages/deepmd/utils/data.py", line 51, in __init__ self.mixed_type = self._check_mode(self.dirs[0]) # mixed_type format only has one set IndexError: list index out of range ``` Signed-off-by: Wanrun Jiang <58099845+Vibsteamer@users.noreply.github.com>
Codecov ReportBase: 46.06% // Head: 46.07% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## devel #1114 +/- ##
=======================================
Coverage 46.06% 46.07%
=======================================
Files 82 82
Lines 14452 14451 -1
=======================================
Hits 6658 6658
+ Misses 7794 7793 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
If there is no data picked, how do you continue the next training procedure? |
code decomposes it into two cases: dpgen/dpgen/simplify/simplify.py Lines 277 to 278 in 355f8ed
dpgen/dpgen/simplify/simplify.py Lines 293 to 294 in 355f8ed
seems to need adjustment of training schemes or simplify_param, seems not related to the bug discussed above |
I see, you mean no unpicked data, not no picked data. |
to fix the break of dpgen simplify with "labeled true", when the last iter picked all candidate frames but still have non-zero failed frames
output files of the last iter:
no
set.000
within,then model_devi of iter.000003 breaks, due to no labeld systems can be loaded, where
iter.000003/01.model_devi/data.rest.old
is the symlink ofiter.000002/01.model_devi/data.rest
dpgen/dpgen/simplify/simplify.py
Line 171 in 355f8ed
tracking clues
rest_idx only contains not-picked candidates_idx of this iter
dpgen/dpgen/simplify/simplify.py
Line 292 in 355f8ed
but "rest_systems" for next iter should contain both "not-picked candidates" and "sys_failed" of this iter:
dpgen/dpgen/simplify/simplify.py
Line 314 in 355f8ed
thus,
the size passed to
set_size
should berest_systems.get_nframes()
,didn't find the necessity of the size_check of the deleted "if line", thought it only the insurance in case of set_zise =0 when passing the size of rest_idx
when breaked, err would be like:
std output of dpgen:
iter.000003/01.model_devi/model_devi.log:
Signed-off-by: Wanrun Jiang 58099845+Vibsteamer@users.noreply.github.com