-
Notifications
You must be signed in to change notification settings - Fork 26
debug deeprank for models with no feature value #242
Conversation
Pull Request Test Coverage Report for Build 1468853462
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a few comments but overall it looks good :) Thanks !
maxv = self.feature_mean[ic] + w * self.feature_std[ic] | ||
if minv != maxv: | ||
feature[ic] = np.clip(feature[ic], minv, maxv) | ||
#feature[ic] = self._mad_based_outliers(feature[ic],minv,maxv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that line commented ? should we simply remove it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, this line was already commented in the code
compute the norm values if `self.clip_features` exists
compute the norm values if `self.clip_features` exists
compute the norm values if `self.clip_features` exists
Remove the benchmarking mode: The prediction/benchmarking mode can be detected by simply checking the presence of target values for a given input data set when `plot == True` Added the different required controls Note that the hitrate function is only adapted for IRMSD input, should be modified
I modified the |
Modify Hitrate so that it can handle any type of target values (not limited to irmsd anymore)
deeprank/learn/NeuralNet.py
Outdated
|
||
targ = self.data[l]['targets'].flatten() | ||
try: | ||
targ = self.data[l]['targets'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to comfirm: the targets
data does not need to be flatten()
, right? I see the old code is using targ = self.data[l]['targets'].flatten()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch ! It must be flattened
deeprank/learn/NeuralNet.py
Outdated
for fname, mol in data['mol']: | ||
|
||
f5 = h5py.File(fname, 'r') | ||
irmsd.append(f5[mol + '/targets/IRMSD'][()]) | ||
targets.append(f5[mol + f'/targets/self.data_set.select_target'][()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to change f'/targets/self.data_set.select_target'
to f'/targets/{self.data_set.select_target}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @manonreau, the changes look good. I left a few comments for you to check.
1. Change field self.grid_shape to self._grid_shape 2. Update DataSet class docstring for grid_info 3. Update get_grid_shape method
1. Rename target_thr to hit_cutoff 2. Update hit_cutoff docstring 3. Replace print with logger 4. Shorten long lines
Hi @manonreau I pushed two commits (since it's not easy to comment all the details), take a look please :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CunliangGeng, everything looks fine, I will merge that PR once you approve it
deeprank/learn/NeuralNet.py
Outdated
|
||
targ = self.data[l]['targets'].flatten() | ||
try: | ||
targ = self.data[l]['targets'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch ! It must be flattened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @manonreau, I approve now.
BTW, please also close the related issues after merging the PR :-) |
Compute grid_shape only if it is not provided as an input
Compute the feature_mean if
self.clip_features == True
feature_mean was computed when
self.normalize_features == True
, this is not required anymore.Format logger message in the
compute_norm
functionAllow feature clipping in the
_clip_feature
function only if values exists for that featureAdd a condition not to transform features values when they correspond to an empty vector in the mapping process
set
save_hit_rate
as False by default since it call the IRMSD target that may not be used by the usersAdded plots as optional in the
test()
function since the users, in principle, have no target information for the test set, excepted in benchmark conditions