Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor basic Distributions #4508

Merged
merged 3 commits into from
Mar 16, 2021
Merged

Refactor basic Distributions #4508

merged 3 commits into from
Mar 16, 2021

Conversation

kc611
Copy link
Contributor

@kc611 kc611 commented Mar 7, 2021

This PR refactors a few distributions according to the new RandomVariable class setup.

Refactored the following distributions :

  • Beta
  • Cauchy
  • Exponential
  • Half Cauchy
  • Half Normal
  • Inverse Gamma
  • Bernoulli
  • NegativeBinomial
  • Poisson
  • MvNormal
  • Multinomial

@michaelosthege michaelosthege added this to the vNext (4.0.0) milestone Mar 8, 2021
@michaelosthege
Copy link
Member

@kc611 can you rebase on the latest v4 version? Then you'll get many tests enabled for the Github actions.
We treat XPASS as failures now, so you'll find out which tests got fixed by your distribution refactoring. You can then remove the xfail decorator on those.

@kc611 kc611 changed the title Refactored Beta,Exponential and HalfNormal Distribution Refactored distributions in distributions.continuous Mar 10, 2021
@kc611
Copy link
Contributor Author

kc611 commented Mar 10, 2021

so you'll find out which tests got fixed by your distribution refactoring.

Actually there still seems to be some precision issue with the tests ( the distributions give values as expected but they don't match exactly with the values expected by the tests ) which is causing even the initially refactored normal distribution test to fail. So I guess some changes other than refactoring are needed for the tests to actually pass.

@brandonwillard might know more about this.

@ricardoV94
Copy link
Member

ricardoV94 commented Mar 10, 2021

We should merge the changes in this PR #4497 to the V4 branch

In that PR we changed the check_logcdf precision of the normal distribution as we found it was not passing on all possible values in float32 runs.

In addition, to ensure a deterministic behavior we should set the n_samples optional argument in check_logcdf and check_logp of all the distributions marked with xfail due to precision issues to n_samples=-1. This will make those tests fail deterministically on float32 even after refactoring.

However we cannot do this to any of the tests mentioned in #4420, or else the runs will take way too long to complete.

Alternatively we can temporarily re-seed the TestMatchesScipy class, temporarily reverting the effect of this PR #4461

@brandonwillard
Copy link
Contributor

Actually there still seems to be some precision issue with the tests ( the distributions give values as expected but they don't match exactly with the values expected by the tests ) which is causing even the initially refactored normal distribution test to fail. So I guess some changes other than refactoring are needed for the tests to actually pass.

Which ones exactly?

In a few of the tests I went through that involved sampling, some had issues caused by implicit expectations regarding test values. The root of this difference is the logic in v3 that sets a random variable's corresponding TensorVariable's test value (i.e. var.tag.test_value) to a distribution-specific value (e.g. a mean or mode). v4 no longer does this, so tests that implicitly relied on very specific default values would fail.

@brandonwillard brandonwillard changed the title Refactored distributions in distributions.continuous Refactore distributions in distributions.continuous Mar 12, 2021
Copy link
Contributor

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try (re)enabling the tests for our currently converted Distributions in pymc3.tests.test_distributions_random?

@brandonwillard brandonwillard changed the title Refactore distributions in distributions.continuous Refactor distributions in distributions.continuous Mar 12, 2021
@kc611
Copy link
Contributor Author

kc611 commented Mar 13, 2021

Which ones exactly?

The ones in test_distribution.py. test_normal fails with the same error as all the others. Strangely test_uniform runs without a hitch.

Can you try (re)enabling the tests for our currently converted Distributions in pymc3.tests.test_distributions_random?

Looks like the module you specified is not yet updated according to the latest RV changes. The main testing functions and classes will need to updated accordingly before we can actually run those tests.

@brandonwillard
Copy link
Contributor

Looks like the module you specified is not yet updated according to the latest RV changes. The main testing functions and classes will need to updated accordingly before we can actually run those tests.

Yes, it's a whole other task, but it's a critical one, because we need to reinstate the tests so that we can demonstrate—to ourselves and others—which things are and aren't working in v4.

@kc611
Copy link
Contributor Author

kc611 commented Mar 14, 2021

In pymc3.tests.test_distributions_random, the BaseTestCase class originally did testing for shapes of the .random() method. Is this still necessary to do it over here ? Since now the RV's have been moved to aesara and test_basic in aesara.tests.tensor.random seems to be doing the same thing. (Testing output shapes of random samples for different distributions ) or am I missing something here ?

The other two methods pymc3_random and pymc3_random_discrete test for the actual output values of the samples and having them here makes much more sense than BaseTestCase.

@brandonwillard
Copy link
Contributor

Is this still necessary to do it over here ? Since now the RV's have been moved to aesara and test_basic in aesara.tests.tensor.random seems to be doing the same thing. (Testing output shapes of random samples for different distributions ) or am I missing something here ?

Those tests are almost necessarily doing the same thing as the tests in aesara.tests.tensor.random. With that in mind, we can remove the redundant tests (i.e. ones for which the corresponding RandomVariables are already tested in Aesara) and later convert the remaining tests into tests for the new RandomVariables that we'll implement in PyMC3.

# beta = draw_values([self.beta], point=point, size=size)[0]
# return generate_samples(self._random, beta, dist_shape=self.shape, size=size)
@_logp.register(HalfCauchyRV)
def half_cauchy_logp(op, value, beta, alpha):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit sceptical as for why this function over here requires the extra alpha input argument even though it does not use it. I assumed it was something related to HalfCauchy being a special case of the above Cauchy distribution. The argument passed over here was a TensorVariable with a constant value of {1} during running of the test_distribution.

Copy link
Member

@ricardoV94 ricardoV94 Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you find it requires alpha?

Is it possible it's coming from the new random method? The scipy method allows for loc and scale, but we only used scale in pymc3. We should make sure the alpha / beta is not being confused in the random calls, since beta corresponds to the second optional argument in the scipy version (the scale): https://github.com/pymc-devs/aesara/blob/b852bd24472e13ae2a405b36eaad462830c89228/aesara/tensor/random/basic.py#L228

Copy link
Contributor Author

@kc611 kc611 Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On running the test in test_distribution with only beta parameter, the log_p method got 4 parameters instead of 3. I assumed that had something to do with the mathematics involved. So I put in an extra argument. Strangely the value of beta is being passed first. The other one (fourth argument) always remains a constant.

Is it possible it's coming from the new random method?

Probably not, since the random method is doesn't have any say as to what happens in the log_p. The RV's just exist for dispatch purposes in case of log_p.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still it seems that if a single variable is passed to the random op it will take it as a loc parameter, while the logp will take it as a scale parameter given the order of the arguments in the aesara op, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to distract from the extra parameter, which is certainly a problem.

@brandonwillard
Copy link
Contributor

I've added a commit to this PR that allows for a much simpler Distribution refactoring process by automating the dispatch registrations. Now, any relevant log* or transform functions found within a Distribution class will have dispatch entries created for them automatically.

If the relevant Distribution tests pass, we should merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants