-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strict pluck #482 #522
Strict pluck #482 #522
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, although I think it would be better to move the checking a bit further down in order to get better error messages (which will require some test tweaking).
src/extract.c
Outdated
@@ -66,10 +66,17 @@ int find_offset(SEXP x, SEXP index, int i) { | |||
} | |||
} | |||
|
|||
SEXP extract_vector(SEXP x, SEXP index_i, int i) { | |||
SEXP extract_vector(SEXP x, SEXP index_i, int i, int strict) { | |||
int offset = find_offset(x, index_i, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to pass strict
down to find_offset()
— that's obviously a lot more work because there are so many more returns, but I think it would lead to considerably more informative error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adjusted all of this so it gives specific messages for each type of problem in find_offset()
; however, for the error messages, which way do you think is better out of one or two line errors?
chuck(1:10, Inf)
#> Error: Each accessor must be finite:
#> * Accessor 1 is Inf
chuck(1:10 Inf)
#> Error: Accessor 1 must be finite, not Inf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second - I slightly prefer the wording, and the former suggests we might warn for any Inf, not just the first we encounter (which while nice, I don't think that would be worth the implementation complexity)
src/extract.c
Outdated
} | ||
|
||
SEXP extract_attr(SEXP x, SEXP index_i, int i) { | ||
SEXP extract_attr(SEXP x, SEXP index_i, int i, int strict) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should rename to extract_s4()
to make the intent more clear?
src/extract.c
Outdated
Rf_errorcall(R_NilValue, | ||
"Can't find slot `%s`.", | ||
Rf_translateCharUTF8(Rf_asChar(index_i)) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like you could use an else
block here, just to make clear what the default is (instead of relying on Rf_getAttrib()
)
Sorry for the delay; this latest push should have more informative error messages as I moved it further down as you suggested. I did change some of existing error messages (i.e. the ones that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! A couple more comments/questions
src/extract.c
Outdated
} else if (val >= n) { | ||
if (strict) | ||
Rf_errorcall(R_NilValue, | ||
"Index %i exceeds the length of object being plucked (%.0f > %i).", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why %.0f
and not %i
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of double val = REAL(index)[0];
, using %i
gives nvalid index (1): index must be greater than 0, not 536870913.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it.
src/extract.c
Outdated
if (Rf_length(index) > 1) { | ||
Rf_errorcall(R_NilValue, "Index %i must have length 1", i + 1); | ||
Rf_errorcall(R_NilValue, | ||
"Index %i must have length 1, not %i.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these messages are a bit easier to write/read if they all start in the same way, either with "Invalid object:" or "Invalid index (%i):".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can you replace all instances of "must not" by "can't" please? For instance this:
Object being plucked must not be NULL.
reads better as:
Plucked object can't be NULL.
cf https://style.tidyverse.org/error-messages.html
(just to be clear, the affirmative "must" cases are good!)
Feel free to start on the documentation - I think you should document |
src/extract.c
Outdated
if (val < 0 || val >= n) | ||
return -1; | ||
if (val < 0) { | ||
if (strict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: can you add curly braces even to single expressions please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want me to add those for the existing code that I haven't yet modified too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lionel- I don't think we use that style anymore, and no parens is fine for simple expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I wasn' aware, Jim, Gabor and I still use it. It's consistent with our R style and easier to read since we use 2-level indents. Especially with multi-line single expressions such as here. But nevermind then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also have implications for covr, I think? That is, the presence of the brackets helps covr do its job in some weird edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the edge cases were fixed and this is officially blessed @jimhester style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am behind the times then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The officially blessed jimhester style is to always use braces for single expressions.
The edge cases in covr are fixed, but I think it is still better to use braces, it is too easy to accidentally forget to add the braces in the future and write something like the following, then not realize the second if is unconditionally evaluated.
if (val < 0)
print(val)
if (strict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am SHOOK
@daniel-barnett do you mind updating your code to use |
Don't worry about the latest comments @daniel-barnett, I'm going to finish the PR because I need it merged before working on other pluck bugs and features. |
The latter wrongly suggests something is going on with different literal types
Refactored the code to put all the checking logic at the end of the file. Merged several code paths to get more consistent error checking and make it easier to maintain. Improved consistency of error messages. @hadley Could you review the last commit which adds documentation please? I have moved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just reviewed the docs
R/pluck.R
Outdated
#' accessor functions, i.e. functions that take an object and return | ||
#' some internal piece. | ||
#' `pluck()` and `chuck()` implement a generalised form of `[[` that | ||
#' allow you to index deeply and flexibly into data structures. While |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "while"
R/pluck.R
Outdated
#' accessors because it reads linearly and is free of syntactic | ||
#' cruft. Compare: \code{accessor(x[[1]])$foo} to `pluck(x, 1, | ||
#' accessor, "foo")`. | ||
#' * You can pluck or chuck with standard accessors like integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these should move down to details?
R/pluck.R
Outdated
#' | ||
#' @details | ||
#' | ||
#' Since it handles arbitrary accessor functions, `pluck()` is a type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this paragraph. Maybe it can be removed?
Thanks again @daniel-barnett ! |
This is an implementation of strict pluck, so-called
chuck
from #482.Once we have agreed on everything, I'll also add some examples and descriptions to the help.
Since this is based on pluck, it seems like #480 is involved here too.