Enable setting test size individually for each system #267

marian-code · 2020-10-07T13:29:58Z

When one is dealing with multiple systems which have very distinct numbers of frames setting number of tests as only one integer which is same for all systems is rather restraining. For example consider two systems:

system with 10000 frames
system with 100 frames

Setting number of tests to 10 should be ideal for the second system but is far too low for the first one. There is no good solution to this situation, since increasing test number to say 100 would be appropriate for the first system but wrong for the second.

The change I propose simply allows to set numb_test parameter as a list with separate value for each system or as a string specifying a percentage of the systems frames that should be used for testing.

In the first case, number of tests could be specified in json as numb_test : [1000, 10] or "10%" which would amount to 10% of testing frames for each of the systems respectively in both of the settings.

This requires addition of only a few lies of code and minimal new logic, all changes are commented in the code.

I am pushing this to devel branch, since I don't know your workflow, but if you accept it can be merged directly to master too as it would be nice to see it in conda and docker as soon as possible.

I also made a few minor changes to README which I think serve to better clarify the purpose of json settings.

njzjz

The unittest was not passing here:

deepmd-kit/source/tests/test_deepmd_data_sys.py

Lines 79 to 100 in 59d780c

    
           def test_get_test(self): 
        
               batch_size = 3 
        
               test_size = 2 
        
               ds = DeepmdDataSystem(self.sys_name, batch_size, test_size, 2.0) 
        
               ds.add('test', self.test_ndof, atomic = True, must = True) 
        
               ds.add('null', self.test_ndof, atomic = True, must = False) 
        
               sys_idx = 0 
        
               data = ds.get_test(sys_idx=sys_idx) 
        
               self.assertEqual(list(data['type'][0]), list(np.sort(self.atom_type[sys_idx]))) 
        
               self._in_array(np.load('sys_0/set.002/coord.npy'), 
        
                              ds.get_sys(sys_idx).idx_map, 
        
                              3,  
        
                              data['coord']) 
        
               self._in_array(np.load('sys_0/set.002/test.npy'), 
        
                              ds.get_sys(sys_idx).idx_map, 
        
                              self.test_ndof, 
        
                              data['test']) 
        
               self.assertAlmostEqual(np.linalg.norm(np.zeros([self.nframes[sys_idx]+2, 
        
                                                               self.natoms[sys_idx]*self.test_ndof]) 
        
                                                     - 
        
                                                     data['null'] 
        
               ), 0.0)

marian-code · 2020-10-07T19:40:16Z

I am taking a look at it, just haven't figured out where the problem is yet.

marian-code · 2020-10-08T10:20:51Z

Everything should be in order now.

amcadmus · 2020-10-09T09:13:49Z

The new features of the code will be merge to devel branch.

README.md

source/train/DataSystem.py

clear the confusion caused by adding python style comments to json file

njzjz

missing comma

README.md

Co-authored-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

Enable setting test size individually for each system

Devel

enable setting test size individually for each system

2039b43

njzjz requested changes Oct 7, 2020

View reviewed changes

marian-code added 2 commits October 7, 2020 22:55

fix failing test_get_test

af3867a

some small alterations to better preserve the original logic of the code

265c559

trigger new travis build

373b0ae

marian-code requested a review from njzjz October 8, 2020 12:59

njzjz approved these changes Oct 9, 2020

View reviewed changes

amcadmus requested changes Oct 16, 2020

View reviewed changes

README.md Outdated Show resolved Hide resolved

source/train/DataSystem.py Show resolved Hide resolved

resolve requested changes

44b889c

clear the confusion caused by adding python style comments to json file

marian-code requested a review from amcadmus October 16, 2020 14:34

njzjz requested changes Oct 17, 2020

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

amcadmus and others added 2 commits October 19, 2020 10:11

Update README.md

289e532

Co-authored-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

Update README.md

b7c523c

Co-authored-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

amcadmus approved these changes Oct 19, 2020

View reviewed changes

amcadmus changed the base branch from master to devel October 19, 2020 02:55

amcadmus merged commit d81eb57 into deepmodeling:devel Oct 19, 2020

gzq942560379 pushed a commit to HPC-AI-Team/deepmd-kit that referenced this pull request Sep 1, 2021

Merge pull request deepmodeling#267 from marian-code/variable_n_tests

3998ddb

Enable setting test size individually for each system

njzjz pushed a commit to njzjz/deepmd-kit that referenced this pull request Sep 21, 2023

Merge pull request deepmodeling#267 from dingzhaohan/devel

2ce42d5

Devel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable setting test size individually for each system #267

Enable setting test size individually for each system #267

marian-code commented Oct 7, 2020

njzjz left a comment •

edited

Loading

marian-code commented Oct 7, 2020

marian-code commented Oct 8, 2020 •

edited

Loading

amcadmus commented Oct 9, 2020

njzjz left a comment

	def test_get_test(self):
	batch_size = 3
	test_size = 2
	ds = DeepmdDataSystem(self.sys_name, batch_size, test_size, 2.0)
	ds.add('test', self.test_ndof, atomic = True, must = True)
	ds.add('null', self.test_ndof, atomic = True, must = False)
	sys_idx = 0
	data = ds.get_test(sys_idx=sys_idx)
	self.assertEqual(list(data['type'][0]), list(np.sort(self.atom_type[sys_idx])))
	self._in_array(np.load('sys_0/set.002/coord.npy'),
	ds.get_sys(sys_idx).idx_map,
	3,
	data['coord'])
	self._in_array(np.load('sys_0/set.002/test.npy'),
	ds.get_sys(sys_idx).idx_map,
	self.test_ndof,
	data['test'])
	self.assertAlmostEqual(np.linalg.norm(np.zeros([self.nframes[sys_idx]+2,
	self.natoms[sys_idx]*self.test_ndof])
	-
	data['null']
	), 0.0)

Enable setting test size individually for each system #267

Enable setting test size individually for each system #267

Conversation

marian-code commented Oct 7, 2020

njzjz left a comment • edited Loading

Choose a reason for hiding this comment

marian-code commented Oct 7, 2020

marian-code commented Oct 8, 2020 • edited Loading

amcadmus commented Oct 9, 2020

njzjz left a comment

Choose a reason for hiding this comment

njzjz left a comment •

edited

Loading

marian-code commented Oct 8, 2020 •

edited

Loading