Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .random method to MvGaussianRandomWalk (Issue #4337) #4388

Merged
merged 15 commits into from
Jan 3, 2021
Merged

Add .random method to MvGaussianRandomWalk (Issue #4337) #4388

merged 15 commits into from
Jan 3, 2021

Conversation

awray3
Copy link
Contributor

@awray3 awray3 commented Dec 27, 2020

This PR addresses Issue #4337. So far I have implemented the .random method for MvGaussianRandomWalk and updated a few of the docstrings in this file. The implementation is similar to GaussianRandomWalk.random.

Currently I am having issues with adding this method to test_distributions_random.py; I get many failing tests and I am not sure if the base test class is appropriate with its current settings for this time series.

@codecov
Copy link

codecov bot commented Dec 27, 2020

Codecov Report

Merging #4388 (b67b25c) into master (acb8da0) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4388      +/-   ##
==========================================
+ Coverage   88.13%   88.17%   +0.04%     
==========================================
  Files          88       88              
  Lines       14550    14563      +13     
==========================================
+ Hits        12823    12841      +18     
+ Misses       1727     1722       -5     
Impacted Files Coverage Δ
pymc3/distributions/timeseries.py 92.39% <100.00%> (+5.68%) ⬆️
pymc3/parallel_sampling.py 85.80% <0.00%> (-0.95%) ⬇️

Copy link
Member

@Sayam753 Sayam753 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great start @awray3 . I addressed a few things below.
Let's iterate over it and get it ready for merge.

pymc3/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc3/distributions/timeseries.py Outdated Show resolved Hide resolved
@Sayam753
Copy link
Member

I get many failing tests

Do you get failing tests in existing test cases?

@awray3
Copy link
Contributor Author

awray3 commented Dec 28, 2020

I get many failing tests

Do you get failing tests in existing test cases?

No, just for new tests created when I include a new test called TestMvGaussianRandomWalk.

@awray3
Copy link
Contributor Author

awray3 commented Dec 29, 2020

How should MvGaussianRandomWalk be added to the testing scripts? I think the way I added it in doesn't make much sense with many of the shapes supplied by BaseTestCases.

@Sayam753
Copy link
Member

Sayam753 commented Dec 29, 2020

I think the way I added it in doesn't make much sense with many of the shapes supplied by BaseTestCases.

Yes. The BaseTestCases class have parametrized shape as an argument that gets passed to the distribution and that can have more than 2 dimensions as well.

You can add tests very similar to how MvNormal random method is tested with np arrays here -
https://github.com/pymc-devs/pymc3/blob/3cfee77fda041483e86d9c61fa1d1ada9380cdb3/pymc3/tests/test_distributions_random.py#L1596-L1605

If this works, then we can add a test where a parameter can be a random variable.

Copy link
Member

@Sayam753 Sayam753 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great progress @awray3 . Minor nitpicks below -

Also, the test cases pass with numpy arrays 🎉 . Last step would be to add a test where mu and any one of chol, cov or tau are random variables. Similar to how MvNormal is tested.

pymc3/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc3/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc3/distributions/timeseries.py Outdated Show resolved Hide resolved
@Sayam753
Copy link
Member

There is a conflicting file. Can you rebase to current master to start the GitHub Actions?

@awray3
Copy link
Contributor Author

awray3 commented Dec 31, 2020

No problem, I just pushed after rebasing. Currently working out an error from test_with_chol_rv when mu and others are RVs and dist_shape = (10, 3) and mu_shape = (10, 3). Looks like another bug with indexing inside of MvGaussianRandomWalk.logp. Below is the traceback; I would welcome any pointers in troubleshooting it.

...
   test_with_chol_rv(sample_shape=None, dist_shape=(10, 3), mu_shape=(10, 3))
  File "my_test.py", line 31, in test_with_chol_rv
    mv = pm.MvGaussianRandomWalk("mv", mu, chol=chol, shape=dist_shape)
  File "/home/jovyan/pymc3/distributions/distribution.py", line 121, in __new__
    return model.Var(name, dist, data, total_size, dims=dims)
  File "/home/jovyan/pymc3/model.py", line 1135, in Var
    var = FreeRV(name=name, distribution=dist, total_size=total_size, model=self)
  File "/home/jovyan/pymc3/model.py", line 1668, in __init__
    self.logp_elemwiset = distribution.logp(self)
  File "/home/jovyan/pymc3/distributions/timeseries.py", line 466, in logp
    return self.init.logp_sum(x[0]) + self.innov.logp_sum(x_i - x_im1)
  File "/home/jovyan/pymc3/distributions/distribution.py", line 266, in logp_sum
    return tt.sum(self.logp(*args, **kwargs))
  File "/home/jovyan/pymc3/distributions/multivariate.py", line 311, in logp
    quaddist, logdet, ok = self._quaddist(value)
  File "/home/jovyan/pymc3/distributions/multivariate.py", line 127, in _quaddist
    delta = value - mu
  File "/opt/conda/envs/pymc3-dev-py37/lib/python3.7/site-packages/theano/tensor/var.py", line 121, in __sub__
    return theano.tensor.basic.sub(self, other)
  File "/opt/conda/envs/pymc3-dev-py37/lib/python3.7/site-packages/theano/gof/op.py", line 642, in __call__
    compute_test_value(node)
  File "/opt/conda/envs/pymc3-dev-py37/lib/python3.7/site-packages/theano/gof/op.py", line 111, in compute_test_value
    required = thunk()
  File "/opt/conda/envs/pymc3-dev-py37/lib/python3.7/site-packages/theano/gof/op.py", line 880, in rval
    thunk()
  File "/opt/conda/envs/pymc3-dev-py37/lib/python3.7/site-packages/theano/gof/cc.py", line 1848, in __call__
    raise exc_value.with_traceback(exc_trace)
ValueError: Input dimension mis-match. (input[0].shape[0] = 9, input[1].shape[0] = 10)

@Sayam753
Copy link
Member

Oh. This is very much similar to bug described in #4010 .
Random walk has some weird shape handling for logp when it comes to 2 dimensional inputs.

@Sayam753
Copy link
Member

Sayam753 commented Dec 31, 2020

Let's think of the case, dist_shape = (10, 3) and mu_shape = (10, 3)
10 time steps and 3 time series to model.

Let's consider an arbitrary tensor with shape (10, 3) following MvGaussianRandomWalk. Its first row (or first time step) comes from init distribution. And that initial distribution is Flat one in our case. The remaining 9 steps will come from adding (9, 3) multivariate samples as noise to the initial row.

I think the error is because we initialize MvNormal with shape (10, 3) and ask for logp from MvNormal with (9, 3) shaped tensor. And there is a mismatch.

@Sayam753
Copy link
Member

A good solution for me will be to open an issue for logp miscalculations. Write the failing test cases here and mark them as xfail.

Another thing I notice, is that after the rebase, this PR contains all the changes from #4368 . But #4368 is now merged into master. Can you lookup what went wrong in the rebase?

@awray3
Copy link
Contributor Author

awray3 commented Dec 31, 2020

Yes, I can look up the issue. I'm new to a lot of the git workflow, so I might have messed something up 😅

@awray3
Copy link
Contributor Author

awray3 commented Dec 31, 2020

Okay, I royally messed something up (probably by not rebasing as often I should have), but I think I resolved my rebase issue.

@awray3
Copy link
Contributor Author

awray3 commented Dec 31, 2020

Actually, I'm still not sure I fixed my rebase issue, and it's causing a headache! I need to step away and come back with fresh eyes.

@Sayam753
Copy link
Member

Sayam753 commented Dec 31, 2020

Oh. This is getting mixed with commits. Can you give this a try?

git checkout 79e47b19  # Let's go to a cleaner commit
git checkout -b add_random_2
git push -f origin add_random

Another option can be resetting back and force push

git reset --hard 3200a84a
git push -f origin add_random

@awray3
Copy link
Contributor Author

awray3 commented Dec 31, 2020

Another option can be resetting back and force push

git reset --hard 3200a84a
git push -f origin add_random

That definitely worked. I appreciate the lending of git expertise! I definitely have more tricks to learn. 🙂

@twiecki
Copy link
Member

twiecki commented Jan 1, 2021

@Sayam753 Feel free to merge when you think this is ready.

@awray3
Copy link
Contributor Author

awray3 commented Jan 1, 2021

Hooray! First pull request to be merged! 🥳 I also want to implement random methods for MvStudentsTRandomWalk and others listed in #4337. Should I open a new pull request for those when I finish them?

Also, I notice that MvNormal, MvGaussianRandomWalk, and MvStudentsTRandomWalk will likely all be tested in similar ways. Would it be sensible to refactor some of the tests to have a shared base class, such as BaseMvTests?

@Sayam753
Copy link
Member

Sayam753 commented Jan 2, 2021

MvStudentTRandomWalk random walk is inherited from MvGaussianRandomWalk. So, it reuses the random method implemented in this PR. We need to make sure the entire self.shape is passed during initialization.

I had not paid attention to the reason behind starting the random draws from zero. But looking at the #3985 (comment) , I am thinking this way -

  1. When we have 2 dimensional input shape, we need to make first row to all zeros (making all time series start at 0). This PR includes that.
  2. When we have 1 dimensional input shape, we need to make first element of each row to zero, thereby matching the random draws from init distribution. This PR does not handle that. We were thinking doing so, will make all random draws to zero.
  3. On top of it, a user can pass any init distribution. So, the random methods have to adapt initial values from that distribution. Feel free to ask any questions on these.

Would it be sensible to refactor some of the tests to have a shared base class, such as BaseMvTests?

That's a good point.

For the next PR, I have these ideas -

  • Making sure MvStudentTRandomWalk is correctly initialized.
  • Designing a case class BaseMvTests for all the similar test cases.
  • Refactoring the random methods of GaussianRandomWalk and MvGaussianRandomWalk to start from random values drawn from initial distribution.
    Let's polish out all RandomWalk random methods before going on to AutoRegressive random methods.

@Sayam753
Copy link
Member

Sayam753 commented Jan 2, 2021

Last step would be wrapping the docstring examples in code-block directive (similar to Normal distribution example here) and also a mention in RELEASE-NOTES.md. Then it will be ready :)

awray3 and others added 12 commits January 2, 2021 21:28
Improves the implementation by using MvNormal.random as suggested by
@Sayam753. Also updated its docstring to add more examples.
This commit rewrites the random method to pass the entire shape
into MvNormal. MvGaussianRandomWalk also now supports an integer shape
that must match the dimensionality of mu. In this case it is assumed
that there are no time steps, and a random sample from the multivariate
normal distribution will be returned.
Remove equality since the shape parameter cannot have more than 2 dimensions.

Co-authored-by: Sayam Kumar <sayamkumar753@yahoo.in>
Co-authored-by: Sayam Kumar <sayamkumar753@yahoo.in>
Adds a more explicit conditional on the time_axis statement.

Co-authored-by: Sayam Kumar <sayamkumar753@yahoo.in>
Added tests for both chol and cov to be random variables.
Moves the brittle comment in MvRandom._random's docstring to the actual location in the code for clarity.
@awray3
Copy link
Contributor Author

awray3 commented Jan 3, 2021

For the next PR, I have these ideas

These all sound like great ideas. I also like the idea of polishing the random walks before moving on.

@Sayam753 Sayam753 changed the title [WIP] Add .random methods to timeseries (Issue #4337) Add .random method to MvGaussianRandomWalk (Issue #4337) Jan 3, 2021
@Sayam753 Sayam753 merged commit 2824027 into pymc-devs:master Jan 3, 2021
@Sayam753
Copy link
Member

Sayam753 commented Jan 3, 2021

Thanks @awray3 and congrats on your first Pull Request 🎉

@awray3
Copy link
Contributor Author

awray3 commented Jan 3, 2021

Thank you @Sayam753 and @twiecki for helping me get up to speed! I appreciate all the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants