Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data io - change from_cmdstanpy to work with CmdStanPy version 0.9.68 #1558

Merged
merged 12 commits into from
Feb 14, 2021

Conversation

mitzimorris
Copy link
Contributor

Description

CmdStanPy version 0.9.68 changed several properties and methods on the CmdStanMCMC object;
critically, the protected property _num_warmup was replaced with property num_draws_warmup
and corresponding num_draws_sampling. Also, the CmdStanMCMC object properties
stan_vars_cols and sampler_vars_cols provide a mapping from variable names to the output columns.

To fix from_cmdstanpy, I added a few helper functions which manipulate lists of variable names.
Going forward, the method _unpack_fit can be used instead of _unpack_frame.

Checklist

@mitzimorris
Copy link
Contributor Author

I didn't add any new unit tests; existing tests work.

Copy link
Contributor

@ahartikainen ahartikainen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I think the latest release is the default way to handle (and we can drop support for the old code when time goes forward; 1year?)

@@ -393,6 +438,105 @@ def to_inference_data(self):
},
)

@requires("posterior")
def posterior_to_xarray_v68(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change the logic so this is "default" and the other function is "old"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

@codecov
Copy link

codecov bot commented Feb 13, 2021

Codecov Report

Merging #1558 (989fe73) into main (6fa1ce8) will decrease coverage by 0.89%.
The diff coverage is 59.86%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1558      +/-   ##
==========================================
- Coverage   91.07%   90.17%   -0.90%     
==========================================
  Files         105      105              
  Lines       11361    11420      +59     
==========================================
- Hits        10347    10298      -49     
- Misses       1014     1122     +108     
Impacted Files Coverage Δ
arviz/data/io_cmdstanpy.py 60.70% <59.86%> (-37.50%) ⬇️
arviz/data/io_cmdstan.py 91.89% <0.00%> (-0.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fa1ce8...989fe73. Read the comment docs.

@mitzimorris
Copy link
Contributor Author

hi @ahartikainen, refactored - added a bunch of helper functions that do the data munging, specifically:

  • parsing base variable names from the column labels
  • munging sampler stats column names, data types

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great, thanks!

Regarding developing and maintaining the converters, do you think it would be easier to have a to_arviz in cmdstanpy instead?

It will probably be a bit more complicated in terms of setting up tests and ci in the inference libraries repos, but I feel like all the converters are becoming increasingly complicated due to having to support multiple versions: from_pystan is basically 2 converters in one to work with pystan2 and 3, from_pymc3 also has multiple checks, which get more complicated every time and it already gets to the point of monkeypatching pymc3.

@mitzimorris
Copy link
Contributor Author

It will probably be a bit more complicated in terms of setting up tests and ci in the inference libraries repos, but I feel like all the converters are becoming increasingly complicated due to having to support multiple versions: from_pystan is basically 2 converters in one to work with pystan2 and 3, from_pymc3 also has multiple checks, which get more complicated every time and it gets to the point of monkeypatching pymc3.

I can see how this is a problem. are you suggesting that to_arviz would return an InferenceData object?

@OriolAbril
Copy link
Member

I can see how this is a problem. are you suggesting that to_arviz would return an InferenceData object?

Yes, we can discuss that on the lab meeting this Friday, I have added a point about it. Currently ArviZ has cmdstanpy as a runtime dependency for from_cmdstanpy to work, same for pystan, pymc3... we could change the approach so that cmdstanpy has arviz as a runtime dependency for to_arviz to work (and we could keep from_cmdstanpy in ArviZ as an alias to to_arviz in cmdstanpy).

It may also require some changes to ArviZ (making sure all functions used by the converters are not private methods to begin with, maybe documenting how to create converters) but in the long run it this approach will probably serve ArviZ's goals better and ease ArviZ integration with new libraries. In fact, mcx supports InferenceData natively: https://github.com/rlouf/mcx/blob/master/mcx/trace.py using this approach, and it basically uses dict_to_dataset and InferenceData.

@mitzimorris
Copy link
Contributor Author

PR is now failing because of code coverage - we need tests for CmdStanPy 0.9.65 and 0.9.68.
what is the way to set this up?
other option, merge as is.

@mitzimorris
Copy link
Contributor Author

Yes, we can discuss that on the lab meeting this Friday, I have added a point about it. Currently ArviZ has cmdstanpy as a runtime dependency for from_cmdstanpy to work, same for pystan, pymc3... we could change the approach so that cmdstanpy has arviz as a runtime dependency for to_arviz to work (and we could keep from_cmdstanpy in ArviZ as an alias to to_arviz in cmdstanpy).

on further consideration, this isn't possible because the dependency is going in the wrong direction - CmdStanPy is upstream. also, desiderata for CmdStanPy is minimal dependencies on other packages.

@OriolAbril
Copy link
Member

I think it's no problem that the converters for old cmdstanpy versions are not tested. I am merging like this, it can always be added in a follow up pr.

@OriolAbril OriolAbril merged commit 43e7999 into main Feb 14, 2021
@OriolAbril OriolAbril deleted the update/from_cmdstanpy branch February 14, 2021 04:43
@mitzimorris mitzimorris mentioned this pull request Feb 14, 2021
2 tasks
utkarsh-maheshwari pushed a commit to utkarsh-maheshwari/arviz that referenced this pull request May 27, 2021
…68 (arviz-devs#1558)

* from_cmdstanpy for v0.9.68

* from_cmdstanpy for v0.9.68

* lint fix

* lint fix

* echanges per code review, also, refactor

* lint fix

* added docstrings

* docstrings lint fix

* docstrings lint fix

* lintfix, baby one more time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants