-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coding matrix/vector variables in tidy form #61
Comments
Unless I'm misunderstanding that is pretty close to what spread_draws / gather_draws are for in tidybayes (amongst other things). |
Sorry for the late reply - I apparently had logged onto a different Github account. Yeah that looks perfect! Do you think it's worth wrapping it to jump from the matrix form to this in one step? |
Yeah! Long term I'd like to replace A related question: is there particular functionality in gather/spread_draws that could be abstracted out or which it might be desirable to have exposed in a form that is easier to program against? Currently those functions take in specs for what transformation you want in a non-standard evaluation form, convert them internally to a simpler spec format, then do all the ugly data munging using those. This makes for a nice UI for interactive use but might be a pain to program against. If there's a need/desire for a more programmable variant that shouldn't be a hard change. The second question is if some of that moves to posterior or not. This might be related to bigger questions of "what to do with tidybayes" --- some time back I mentioned to Jonah that I was contemplating splitting off the geoms from tidybayes into a separate package (maybe "ggdist"). It has evolved from a package just about munging posteriors into a sort of two-headed beast with a bunch of uncertainty vis geoms and stats. |
One way of differentiating between tidybayes and posterior functions could be whether data or ploting is involved in addition to posterior draws. So whatever only handles draws could (potentially) go into posterior, but anything including data (such as |
Yeah definitely don't want to feature overlap here! Do you mean that this is functionality that would be useful for me to work on integrating @mjskay? |
Hmm, that's a criterion for splitting the functionality I hadn't considered. My working theory for how to split things was based on whether the functionality is specifically designed for tidy data frames of draws or not, which would suggest to me that the spread_draws/gather_draws stuff might live in tidybayes --- unless there's an analog to the kind of output from those functions in the other formats. I suppose it's a little bit on the line between the two worlds though. @lauken13 I'd be happy for you to work on this! Personally I am likely to have zero time for intense coding between now and late Spring due to teaching, but could definitely provide advice and whatnot. Would it be helpful to have a chat about what is needed and where it should live? |
@mjskay this is a good point and likely better way to differentiate the packages. I am happy to be involved in a chat if needed. |
Is there anything to be done for this issue still? I think time has basically told quite well what should be in posterior and what should be in tidybayes. @mjskay do you agree? |
I agree the split between posterior / tidybayes is clear now. Re: the original purpose of this issue (array reshaping utility functions), I wonder if @statscats @lauken13 @jgabry have thoughts about what might be useful based on the new rvar stuff. There is some internal code in posterior now for going between flat and multidimensional array formats used by rvar. So my question would be: are transformations done by rvar/as_draws_rvars sufficient for the needs in bayesplot, or would it be useful to expose some lower level functions for doing those transformations on base arrays? |
@mjskay here's my attempt to factor out the dimension parsing stuff. Mind having a look whenever you get a chance? I've tried to return an object that will be versatile enough for whatever people might need it for, including if Stan ever supports ragged arrays, etc.
|
Thanks for starting this!
Also, unless I've missed something the More generally, I think that the returned format is probably too specific to the original needs of as_draws_rvars, and for a more generic helper function should be revisited a bit. E.g. current output is: > get_indices(c("x[0]","x[1]","mu[1]", "mu[2]","mu[5]","sigma[a,b]","sigma[b,b]"))
[[1]]
V1 index
1 0 1
2 1 2
[[2]]
V1 index
1 1 1
2 2 2
4 3 NA
5 4 NA
3 5 3
[[3]]
V2 V1 index
1 b a 1
2 b b 2 Some thoughts about this format:
|
Hey @mjskay, > get_indices(c("x[-1,1]", "x[3,3]"))
[[1]]
V2 V1 index
1 1 -1 1
5 1 3 NA
3 2 -1 NA
6 2 3 NA
4 3 -1 NA
2 3 3 2 Note that it treats the columns independently in the sense that, despite that the first column is not one-indexed (and therefore doesn't get filled in with the full sequence of integers from one to max), it still fills in the second column with integers from one to max. Likewise, if one column is non-numeric and the other column is integer, it treats the second column like integers rather than factor labels. What do you think the general-purpose behavior should be here? Does it matter? |
Good questions. I think that it should treat columns independently --- as a user I would expect from You're right that the behavior of integer indices < 1 feels inconsistent. Probably it does make sense to fill those in with NAs as well. In any case I expect this to be a rare occurrence, but that the most likely situation it would happen is if someone is importing from something that is 0-indexed, in which case filling in missing indices would be the right behavior, so we should probably just do it. If we don't have a test for that we should add it. |
Cool. When you get a chance, check out the new |
Awesome, thanks! Will do |
When variables are vector/matrix they are stored in a tibble named as a[i,j].
@jgabry and I were talking about whether there was some utility of creating a convenience function that splits the indices in long form i.e.,
param idx1 idx2
a i j
It would be useful for getting things in the right form for Bayesplot.
The text was updated successfully, but these errors were encountered: