Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand pm.Data capacities #3925

Merged
merged 16 commits into from
May 18, 2020

Conversation

AlexAndorra
Copy link
Contributor

@AlexAndorra AlexAndorra commented May 17, 2020

This PR expands on @hottwaj's work in #3816. Now, pm.Data can:

  • Be used with integers as well as floats, which allows using it for index variables:
with pm.Model() as model:
    index = pm.Data('index', np.array([2, 0, 1, 0, 2]))
    alpha = pm.Normal('alpha', 0, 1.5, shape=3)
    pm.Normal('obs', alpha[index], 1., observed=y)
with pm.Model():
    x = pm.Data("x", [1.0, 2.0, 3.0])
    y = pm.Normal("y", mu=x, shape=3)

This is ready for review -- thanks a lot in advance for the reviews 🖖

  • No breaking changes (that I know of)
  • Changes are covered by tests and docstrings
  • Mentioned in the RELEASE-NOTES.md

Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @AlexAndorra! I left some comments throughout the changed you made. What I think is the most important point that should be addressed is the default dtype. I think that it should be theano.config.floatXfor floats and the corresponding intX dtype for ints. You can take a look at our intX function to find the mapping that we use to set the integer dtype based on theano's floatX.

RELEASE-NOTES.md Outdated Show resolved Hide resolved
pymc3/data.py Outdated Show resolved Hide resolved
pymc3/data.py Outdated Show resolved Hide resolved
@michaelosthege
Copy link
Member

I have nothing to add - Lucianos point on intX/float is important and the rest looks good 👍

@AlexAndorra
Copy link
Contributor Author

Thanks for the reviews @lucianopaz and @michaelosthege !
Actually, while trying to implement Luciano's suggestions, I realized, quite surprisingly, that no type setting was needed to implement the "pm.Data as index variables" feature. Even more, explicitly setting the type with floatX and intX was failing everything 😮
You can see that in the updated code and tests -- please tell me if I'm mistaken because I'm really surprised by that result. Otherwise, I think this is ready to merge 😊

@AlexAndorra AlexAndorra requested a review from lucianopaz May 18, 2020 10:23
@michaelosthege
Copy link
Member

Thanks for the reviews @lucianopaz and @michaelosthege !
Actually, while trying to implement Luciano's suggestions, I realized, quite surprisingly, that no type setting was needed to implement the "pm.Data as index variables" feature. Even more, explicitly setting the type with floatX and intX was failing everything 😮
You can see that in the updated code and tests -- please tell me if I'm mistaken because I'm really surprised by that result. Otherwise, I think this is ready to merge 😊

You mean the pipeline for commit 0d07347 ?

It fails on a test that doesn't sound related:
FAILED pymc3/tests/test_model_helpers.py::TestHelperFunc::test_pandas_to_array

@AlexAndorra
Copy link
Contributor Author

Ok, this was more complicated than I expected, but I think I understood the problem and figured out a more parcimonious implementation in the process. Tests pass locally -- if they pass on Travis too, this means it's ready for review 🍾

Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch @AlexAndorra!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants