-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove automatic normalization in Multinomial and Categorical #5331
Comments
@ricardoV94 I'm happy to take this. Just to make sure I understand the ask, the normalisation step in |
Yes the normalization in both should be done only in the |
I've assigned the issue to you. Thanks for volunteering! |
@ricardoV94 I'm close to being done on this, my solution is to insert this code section in the if isinstance(p, np.ndarray) or isinstance(p, list):
p_sum = np.sum([p], axis=-1)
if (p_sum != 1.0).any():
warnings.warn(
f"p values sum up to {p_sum}, instead of 1.0. They will be automatically rescaled. You can rescale them directly to get rid of this warning.",
UserWarning
)
p = p / at.sum(p, axis=-1, keepdims=True) This works fine for Multinomial. However for Categorical when I move the normalisation step outside of _____________________________ TestMatchesScipy.test_categorical_valid_p[p2] _____________________________
self = <pymc.tests.test_distributions.TestMatchesScipy object at 0x7fc1bf3826a0>
p = array([-1, -1, 0, 0])
@aesara.config.change_flags(compute_test_value="raise")
@pytest.mark.parametrize(
"p",
[
np.array([-0.2, 0.3, 0.5]),
# A model where p sums to 1 but contains negative values
np.array([-0.2, 0.7, 0.5]),
# Hard edge case from #2082
# Early automatic normalization of p's sum would hide the negative
# entries if there is a single or pair number of negative values
# and the rest are zero
np.array([-1, -1, 0, 0]),
],
)
def test_categorical_valid_p(self, p):
with Model():
x = Categorical("x", p=p)
with pytest.raises(ParameterValueError):
> logp(x, 2).eval()
E Failed: DID NOT RAISE <class 'aeppl.logprob.ParameterValueError'> Which appears to be testing an edge case form a previous issue involving normalisation of invalid input [-1, -1, 0, 0]. I can fix this issue by just leaving the normalisation in with pm.Model() as m:
x = pm.Multinomial('x', n=5, p=[0.5,0.5,0.5,1])
raises a UserWarning but the same call for Categorical does not. Any suggestions on how to proceed here? |
If some/all of those test conditions no longer make sense in light of the new behavior they can be removed. |
You can also replace the inputs to be TensorVariables so that tbe |
Finally, for the negative case we should probably raise a ValueError in the |
Appologies I'm not completely clear on what you are suggesting here. To clarify, are you saying that we just check for negative values in |
@LukeLB Yeah that was my suggestion |
Great I'll get on that! |
@ricardoV94 I have put a PR in for this issue now. |
Closed via #5370 |
Discussed in #5246
Originally posted by ricardoV94 December 8, 2021
This cropped up in #5234
Should we stop doing automatic normalization of the p parameter? This can hide very wrong inputs such as
If we decide to keep the automatic normalization, we can at least remove some checks in the logp definition, since they cannot be triggered in this case.
I think this caused some problems e.g. if you a user specifies
[.3, .3, .3]
where things almost line up.Originally posted by @twiecki in #5246 (comment)
Whenever the user creates a Multinomial / Categorical distribution with concrete (numpy) values we check if they are valid and if not we normalize them but also issue a UserWarning along the lines of:
In the logp or whenever we have symbolic inputs we don't do any invisible normalization, and let it evaluate to -inf as with invalid parameters in other distributions.
I think this covers most of the user cases and does not have big backwards compat issues.
Originally posted by @ricardoV94 in #5246 (comment)
The text was updated successfully, but these errors were encountered: