-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray usage #1
xarray usage #1
Conversation
(qs["truth"] > qs[var_name].sel(quantile=0.1)) | ||
) | ||
qs = qs.sortby(qs[var_name].sel(quantile=0.9)) | ||
y = np.linspace(*ax.get_ylim(), qs.dims["skater_name"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the piece that is more xarray specific and can be confusing. qs.dims["skater_name"]
returns an the length of the dimension as integer. The key is in the difference between dimension and coordinate. Here both things have the same name, like in all ArviZ generated datasets but this is not a requirement in xarray. qs.skater_name
or qs.coords.skater_name
will return a DataArray with all the skater name values.
Thanks a lot for the contribution! Loads of nice xarray things in there that I hadn't seen before and definitely an improvement in explicitness. I've never put the effort into properly learning how to interact with arviz InferenceData objects as xarrays because I found I often wanted to do a multi-dimensional groupby and I think xarray doesn't currently support this. For example, here I needed to count the number of posterior draws in each outcome for each measurement and did infd.posterior_predictive["yrep"]
.to_series()
...
.groupby(["measurement", "yrep"])
.size() Do you by any chance know an equivalent xarray way of doing this? |
This one I'd have to take a look and play around a bit before getting somewhere, I would actually love to do it but I don't currently have much time. I can see two ways going forward:
One thing I don't see clearly is the conversion from counts to probability, is this done by None of the two above are ideal, xarray should allow multilevel groupbys, and afaik it's a work in progress pydata/xarray#324, but not yet available. It could be that even when not ideal one of these is a bit better than converting to pandas, but I'm not sure I'd bet on that 😅 Extra comment, I noticed you have |
Thanks very much for the suggestions - I'll have a go. The conversion from counts to probability is done at this line:
It's not very explicit as you need to know what axes 0 and 1 represent, and what the axis arguments to sum and div do. I hadn't actually noticed that
|
We did something that could be similar to what needs to happen here back at pymc3 examples: https://nbviewer.jupyter.org/github/pymc-devs/pymc-examples/blob/main/examples/case_studies/rugby_analytics.ipynb (see towards the end on posterior predictive section). This ends up with a loop and uses np.unique to make the "histogram"/get rank counts of each team, but there are other alternatives. IMO, the ideal situation (for the rugby case) would be to have an implementation of np.bincount as a ufunc, so we can choose which dimensions it should reduce on and which should be batched. I see 2 quick ways of doing that. These two snippets should go between the computation of Option 1:
|
Hi, great post!
I saw the code and the first plot is quite similar to other plots I have done, so I already more or less have a template on how to do them with xarray, it is more verbose than pandas (in general xarray is quite verbose as everything is labeled) but I also find it more clear and explicit.
Basically what is happening is that we are working with
qs
as an xarray dataset with 3 variables:var_name
,"truth"
and"truth_in_interval"
. All 3 variables have the skater name dimension andvar_name
has the extra quantile dimension.Disclaimer: I have not tried to run this!⚠️
Feel free to do whatever you want with the code and the PR, I just wanted to share this possibility to avoid xarray-pandas conversion.