-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extract_variable_array and the with_indices parameter for variables()
#342
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #342 +/- ##
==========================================
+ Coverage 95.14% 95.31% +0.17%
==========================================
Files 47 50 +3
Lines 3745 3840 +95
==========================================
+ Hits 3563 3660 +97
+ Misses 182 180 -2 ☔ View full report in Codecov by Sentry. |
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 5f49cb3 is merged into master:
|
# select matched all_variables maintaining the input variables order | ||
variables <- all_variables[all_var_matched_ixs[order(input_ixs, all_var_matched_ixs)]] | ||
} else { | ||
missing_variables <- setdiff(variables, all_variables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is not hit by test coverage (see here), but as far as I can tell it cannot be, as scalar_only
is always FALSE
(I assume this is a parameter that was used by some old code?).
I have not changed this function except to use the new split_variable_names()
on line 263. It only appears as all new because I moved it here from R/draws-index.R
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember scalar_only. The way the variables are named suggests that I have I not written it. Was it previously used in rvars perhaps? I am fine with removing it as long as you don't see any value in keeping it for future use in rvars or similar functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I didn't write check_existing_variables... Looking back through git blame lead me to this issue: #71
Which suggests that scalar_only was going to be exposed in the API and then it was decided not to expose it, pending future use cases.
There is also an interesting conversation there about argument naming that is very reminiscent of questions I've had about the naming of the with_indices argument. I'd be curious if folks have strong opinions about its name. I chose it to parallel the naming of rvar(with_chains =), though I wonder if it is ambiguous to people whether it means the resulting variables have indices or their names do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 7336811 is merged into master:
|
This is how benchmark results would change (along with a 95% confidence interval in relative change) if 45b0643 is merged into master:
|
thank you! Please let me know once this PR is ready for review! |
Should be ready now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me. Thank you!!
@avehtari could you check if as_variable_array
does what you would like to do? If you give your okay too, I think this is ready for merging.
# select matched all_variables maintaining the input variables order | ||
variables <- all_variables[all_var_matched_ixs[order(input_ixs, all_var_matched_ixs)]] | ||
} else { | ||
missing_variables <- setdiff(variables, all_variables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember scalar_only. The way the variables are named suggests that I have I not written it. Was it previously used in rvars perhaps? I am fine with removing it as long as you don't see any value in keeping it for future use in rvars or similar functionality.
Yes, it's helpful as it works the same way for any draws object type, and I can write my example in the issue as
or if I want to get all draws array
(I can also drop |
Great! @mjskay anything to change from your side? Otherwise, feel free to merge (or tell me such that I merge). |
Thanks for the quick review @paul-buerkner! Maybe let's wait a day or two and see if @jgabry @MansMeg @avehtari or others have objections/suggestions for the |
Agreed. |
I'm fine with |
Great, thanks @avehtari! I will take that as a strong endorsement and merge :) |
Summary
This PR:
draws_rvars
format? #208 by givingvariables()
,variables<-()
,set_variables()
, andnvariables()
awith_indices
argument, which determines whether variable names are retrieved/set with ("x[1]"
,"x[2]"
...) or without ("x"
) indices.extract_variable_array()
function to extract variables with indices into arrays of iterations x chains x any remaining dimensions. @avehtari does this do what you need?It also makes a few other related changes:
For types that support
factor
variables (draws_df
,draws_list
, anddraws_rvars
),extract_variable()
andextract_variable_matrix()
now return afactor
/ordered
if the variable requested is afactor
orordered
.It factors out a bunch of internal code for manipulating variable indices into
R/variable-indices.R
. Most important are:split_variable_names()
andsplit_indices()
, which I suggest as the "canonical" functions for (1) splitting a variable into a string representing base name and indices and (2) parsing those indices if needed. The code forsplit_variable_names()
was repeated in several places in the code base (implemented slightly differently). I tried several implementations on large vectors and the one I put there should be fast.variable_names()
andvariable_names<-()
, which are functions with awith_indices
parameter that can manipulate character vectors of variable names. These usesplit_variable_names()
under the hood, and are the basis of the implementation ofwith_indices
invariables()
andvariables<-()
flatten_indices()
andflatten_array()
, which are used to turn arrays into vectors where indices are embedded into variable names (used when converting from draws_rvars to other formats that embed indices into variable names).Factoring these functions out means if we later add support for more specialized parsing of variable names, we only have to do that in one place. It also gets us a bit closer to implementations of index parsing we could consider exporting (e.g. for Coding matrix/vector variables in tidy form #61), which could perhaps be a
parse_variable_names()
function built onsplit_variable_names()
+split_indices()
.Moves the code for
variables()
,variables<-()
,set_variables()
, andnvariables()
fromR/draws-index.R
intoR/variables.R
and gives them their own documentation pages (R/draws-index.R
and the combined doc page for variables/iterations/chains/draws was getting bloated, especially with the extra args on variables functions).Minor refactor of
while_preserving_dims()
->copy_dims()
andwhile_preserving_levels()
->copy_levels()
Minor fix for a change in how the package doc page is specified in the latest version of roxygen2.
Copyright and Licensing
By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses: