Internal refactor #804

tomicapretto · 2024-04-14T03:17:12Z

This started as a refactor with the goal to change how we name the parameters of the response distribution. So far, we named the parent parameter with f"{response_name}_mean" and all the other parameters with f"{response_name}_{param_name}". I no longer think appending _mean is a sensible approach as the parent parameter can be something different than the mean. And also, I realized prepending the name of the response resulted in very long and confusing names. For that reason, I decided it's better to just use the name of the parameter (i.e. mu for the mean in many families, p for probability of success, kappa, sigma, and so on).

But that is not the only change. Many other changes come with this PR. I summarize them below, and they will be added to the changelog. After this is merged, we can have a 0.14.0 release.

Summary of changes

Use "__obs__" instead of f"{response_name}_obs" as the dimension name for the observation index. This makes it easier to avoid errors in our code base and results in a clear an unique name for all cases. Previously we had to worry about the possible different names of the response and their aliases.
Use param_name instead of f"{response_name}_{param_name" (mentioned above).
Use the name of the parent parameter instead of f"{response_name}_{mean}" (mentioned above).
kind in Model.predict() now use "params" and "response" instead of "mean" and "pps".
- The old versions still work, with a future warning.
include_mean has been replaced by include_params in Model.fit(). The old version still works, with a future warning.
Wrap all the parameters of the response distribution (the likelihood) with a pm.Deterministic. The benefit is that the model graphs are clearer and we don't incur in any penalty related to the computation and storage of deterministics as they are not computed and stored when sampling. This is thanks to some recent developments in PyMC.
Keep bayeux-ml as the single direct JAX-related dependency. Bayeux itself requires all the other libraries providing sampling backends for us so we don't list them directly. This may change in the future if we need to pin specific versions but I hope that's not the case.
Implemented a function called create_posterior_bayeux that creates the xarray.Dataset that holds the samples from the posterior when doing inference via bayeux. This makes sure we use the right dim/coord names and coord levels.
The response component only holds response information about the response. Previously, the response component used to hold information about the parent parameter and the response component. There's a clearer separation of concerns now. As an example, to access things related to the linear predictor of the parent parameter in a normal model now we do model.components["mu"]. Before we needed model.response_component. Now, that component still exists but it doesn't hold information about mu.

Other important notes

I've reran all the examples.
I've adapted the tests and made sure they pass.
I'm pining PyMC from main, this is because truncated responses need a fix in there. We can change it once a new version (5.16?) is released.

EDIT Bayeux based inferences don't include the observed_data group. This is problematic if we want to do ppchecks. We need to add them. Done

EDIT: It also closes #814

tomicapretto · 2024-04-15T15:01:34Z

This is temporarily suspended because the var_names parameter in pm.sample() is not working as expected pymc-devs/pymc#7258, and we need it for the modifications here.

tomicapretto · 2024-05-13T13:13:55Z

Now there's a new blocker which is pymc-devs/pymc#7312

And I would also like to have pymc-devs/pymc#7290 merged. Currently, the progressbar has many updates which slows down the sampler in jupyter notebooks.

tomicapretto · 2024-05-25T20:54:20Z

Everything should be fixed now. The pyproject.toml file is pointing to the dev version of PyMC. This should be good to go once PyMC releases a new version.

Now I'm going to re-run examples.

review-notebook-app · 2024-05-26T03:01:10Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

aloctavodia · 2024-05-31T14:06:10Z

I like the changes, but I find include_params and kind="params" ambiguous. Could we use something like linear_predictor, linear_term or something like that?

…mbi into rename_likelihood_params

tomicapretto · 2024-06-02T15:30:46Z

include_params

Thanks for the review! I'm not sure I understand what is the suggestion (i.e. if you want argument(s) called linear_predictor, linear_term or those should be argument values). I also see the current names are not the best as it's as "params" can be many things in a model.

aloctavodia · 2024-06-02T21:15:47Z

I am saying that the name "params" is very vague.

GStechschulte

I have been holding this PR back for too long. I reviewed the changes, but paid closer attention to the changes in interpret and with bayeux.

I really like the changes, and everything LGTM.

bambi/backend/pymc.py

tomicapretto · 2024-06-14T14:03:23Z

@GStechschulte thanks for the review! Do you have any ideas for the issue @aloctavodia mentioned? I agree the current approach is a bit vague. But I don't have lots of ideas right now :D

GStechschulte · 2024-06-21T05:40:12Z

@tomicapretto @aloctavodia apologies for the delayed response. This notification slipped through my GitHub 😞

Before, include_mean would compute the "posterior of the mean response" which technically may not always be a mean as you stated above. This computation is $g^{-1}(\eta)$ which relates the linear predictor to the response. Since it is a response parameter, but not always the mean, maybe include_response_params. It's a longer name, but less vague?

To be consistent in model.predict:

kind="response_params" as calling model.fit(include_response_params=True) will result in the same InferenceData as the sequence of calls: (1) model.fit(include_response_params=False), then (2) model.predict(kind="response_params", data=None)
kind="response_obs" or response so users "know" posterior predictive samples are returned and the samples are on the response scale.

… response distribution

tomicapretto added 7 commits April 11, 2024 16:24

Start migration

f176be8

Update prior scaler

6e0d0db

More updates. The backend needs futher updates

1469084

First implementation working

aad34c2

Fix aliases

358bce1

More fixes thanks to tests

e7e0a6d

Fix more tests. Detected problems with PyMC

aaf6676

tomicapretto mentioned this pull request Apr 16, 2024

Release after pymc 5.13.1 #805

Closed

tomicapretto added 2 commits May 13, 2024 10:06

Adapt code to multivarite-like families

8a327bc

Adapt code to multivarite-like families

eb0f864

tomicapretto changed the title ~~Rename likelihood params [WIP]~~ Rename likelihood params May 13, 2024

Pull from main

2a54df7

Update dependencies

7449869

Re-run several example notebooks

b7d6a6c

tomicapretto added 2 commits May 26, 2024 12:16

add function to create xr.Dataset for bayeux based inferences

da6a466

rerun more examples

750e639

tomicapretto changed the title ~~Rename likelihood params~~ Internal refactor May 27, 2024

tomicapretto marked this pull request as ready for review May 27, 2024 01:18

Rerun more examples

f5f49d9

tomicapretto requested review from aloctavodia and GStechschulte May 27, 2024 11:18

tomicapretto added 2 commits June 2, 2024 12:08

bump pymc version to latest release

d87ef79

Merge branch 'rename_likelihood_params' of github.com:tomicapretto/ba…

55aac85

…mbi into rename_likelihood_params

Add function to add observed_data group when using bayeux

e8510ea

GStechschulte approved these changes Jun 13, 2024

View reviewed changes

bambi/backend/pymc.py Show resolved Hide resolved

tomicapretto added 2 commits July 4, 2024 12:05

Rename arguments and values to indicate we want the parameters of the…

73211ed

… response distribution

fix broken link

d2904f1

tomicapretto mentioned this pull request Jul 4, 2024

Broken link in 1D HGSP example #814

Closed

Add dim name to families with categorical response levels

20acf42

tomicapretto merged commit 4cc3103 into bambinos:main Jul 4, 2024
4 checks passed

tomicapretto deleted the rename_likelihood_params branch July 4, 2024 12:27

GStechschulte mentioned this pull request Jul 5, 2024

Update docstring and args #823

Merged

digicosmos86 mentioned this pull request Jul 10, 2024

Fix graphing after bambi 0.14 is released lnccbrown/HSSM#491

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal refactor #804

Internal refactor #804

tomicapretto commented Apr 14, 2024 •

edited

Loading

tomicapretto commented Apr 15, 2024

tomicapretto commented May 13, 2024

tomicapretto commented May 25, 2024

review-notebook-app bot commented May 26, 2024

aloctavodia commented May 31, 2024

tomicapretto commented Jun 2, 2024

aloctavodia commented Jun 2, 2024

GStechschulte left a comment

tomicapretto commented Jun 14, 2024

GStechschulte commented Jun 21, 2024 •

edited

Loading

Internal refactor #804

Internal refactor #804

Conversation

tomicapretto commented Apr 14, 2024 • edited Loading

tomicapretto commented Apr 15, 2024

tomicapretto commented May 13, 2024

tomicapretto commented May 25, 2024

review-notebook-app bot commented May 26, 2024

aloctavodia commented May 31, 2024

tomicapretto commented Jun 2, 2024

aloctavodia commented Jun 2, 2024

GStechschulte left a comment

Choose a reason for hiding this comment

tomicapretto commented Jun 14, 2024

GStechschulte commented Jun 21, 2024 • edited Loading

tomicapretto commented Apr 14, 2024 •

edited

Loading

GStechschulte commented Jun 21, 2024 •

edited

Loading