-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added z-scoring method for structured data (e.g., time series) #597
Conversation
Is this ready for review? |
oops i should've written here: no not yet:
otherwise yes |
Regarding You do not have to apply the changes made in this PR to |
good to review now @michaeldeistler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks a lot! Left minor nitpicks regarding docstrings in the comments.
One more request: could you modify one of the snpe tests to use "structured" z-scoring? E.g. this one. And make sure to put a comment like "Test whether SNPE works properly with structured z-scoring".
sbi/neural_nets/classifier.py
Outdated
- `none`, None: do not z-score | ||
- `independent`: z-score each dimension independently | ||
- `structured`: treat dimensions as related, therefore compute mean and std | ||
over the entire batch, instead of per-dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add something like. Should be used when the data are time series or an image
sbi/neural_nets/classifier.py
Outdated
- `independent`: z-score each dimension independently | ||
- `structured`: treat dimensions as related, therefore compute mean and std | ||
over the entire batch, instead of per-dimension. | ||
z_score_y: Whether to z-score ys passing into the network, same as z_score_x. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same options as z_score_x
sbi/utils/get_nn_models.py
Outdated
- `none`, None: do not z-score | ||
- `independent`: z-score each dimension independently | ||
- `structured`: treat dimensions as related, therefore compute mean and std | ||
over the entire batch, instead of per-dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring comments i made above especially apply to these user-facing methods
sbi/utils/sbiutils.py
Outdated
|
||
Args: | ||
z_score_flag: str flag for z-scoring method stating whether the data | ||
dimensions are "structured" or "independent", or does not require z-scoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent second and third line
sbi/utils/sbiutils.py
Outdated
if type(z_score_flag) is bool: | ||
# Raise warning if boolean was passed. | ||
warnings.warn( | ||
"""Boolean flag for z-scoring is accepted for backwards compatibility only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boolean flag for z-scoring is deprecated as of sbi v0.18.0. It will be removed in a future release. Use 'none', 'independent', or 'structured' to indicate z-scoring option.
structured_data = True if z_score_flag == "structured" else False | ||
|
||
else: | ||
# Return warning due to invalid option, defaults to not z-scoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please raise a ValueError
here.
@@ -416,7 +416,7 @@ def simulator(theta): | |||
else: | |||
return linear_gaussian(theta, -likelihood_shift, likelihood_cov) | |||
|
|||
net = utils.posterior_nn("maf", hidden_features=20) | |||
net = utils.posterior_nn("maf", z_score_x="structured", hidden_features=20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaeldeistler I changed this one to use structured z-scoring, is this sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm the sample_conditional
might not be the perfect place for that. Maybe use some other function that actually tests the posterior (not the conditional posterior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, i put it in def test_c2st_multi_round_snpe_on_linearGaussian()
because it already has a posterior_nn
call, instead of test_c2st_snpe_on_linearGaussian_different_dims()
which calls SNPE_C()
directly
just realized, when you call SNPE_C directly, for example, it takes the default value for z-scoring, which is now 'independent' but does the same as before, but there's no way to specify 'structured'. Is that fine? |
Codecov Report
@@ Coverage Diff @@
## main #597 +/- ##
==========================================
+ Coverage 66.77% 68.37% +1.60%
==========================================
Files 67 67
Lines 4199 4285 +86
==========================================
+ Hits 2804 2930 +126
+ Misses 1395 1355 -40
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Yes that's fine |
…h structured or independent dimensions
… calls to use the new options
standardizing_net()
andstandardizing_transform()
now both have options to perform z-scoring for structured data, i.e., to computemean
andstd
for each sample first, then taking the global mean to be used for z-scoring the batch (instead of z-scoring each dimension independently):x_mean = torch.mean(x)
x_std = torch.mean(torch.std(x, dim=1))
Re: #570