-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fcase / case_when function for data.table #4021
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4021 +/- ##
==========================================
+ Coverage 99.41% 99.41% +<.01%
==========================================
Files 72 72
Lines 13769 13904 +135
==========================================
+ Hits 13688 13823 +135
Misses 81 81
Continue to review full report at Codecov.
|
to not leave it for too late I would encourage to think how to make evaluation of switch(1, "a", stop("a"))
#[1] "a" |
I also wanted to explain how the algorithm works because maybe there is a better way to do it?
I hope it makes sense. If not, excuse my French! :-) |
Thinking out loud. Another way could be to split the |
@jangorecki , I think I will probably have to change the code to make fcase lazy. The only way I see to do that would be to use |
If there will be no other option, there is always possibility to substitute dots and pass that together with the evaluation frame. |
@jangorecki , please see if you are happy with the lazy evaluation. See test 2124.72. If you need anything else please let me know. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting comments for now. Lazy evaluation approach you tried looks very promissing. Any idea how it compares in speed to the previous one?
To answer your question about speed, lazy evaluation produces the same speed. You will note that I have added a break at the end of the first loop which prevents the evaluation of further expressions in case all when conditions have already been matched with a value. This part might speed up a little bit the computation in case the user has passed too many unnecessary when conditions. Finally I have no idea how to apply openmp to this function. I think it is possible to go below the second for 1GB vector in very specific cases which are equivalent to fswitch(x, when, value). |
Thanks for all adjustments. The more difficult part of the review left, to investigate if tests cover all necessary cases (and eventually confirm that all their output is expected). |
Any idea if it is do-able to make |
We can do it but as you said we use |
Some further benchmark:
|
man/fcase.Rd
Outdated
} | ||
\arguments{ | ||
\item{...}{ A sequence consisting of logical condition (\code{when})-resulting value (\code{value}) \emph{pairs} in the following order \code{when1, value1, when2, value2, ..., whenN, valueN}. Logical conditions \code{when1, when2, ..., whenN} must all have the same length, type and attributes. Each \code{value} may either share length with \code{when} or be length 1. Please see Examples section for further details.} | ||
\item{default}{ Default return value, \code{NA} by default, for when all of the logical conditions \code{when1, when2, ..., whenN} are \code{FALSE} for some entries. } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when all of the logical conditions
when1, when2, ..., whenN
areFALSE
FALSE
or missing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
when putting benchmark you can also put a optimistic scenario benchmark (here on github, not in the code) where lazyness feature can be observed (in timings rather than just error). |
\seealso{ | ||
\code{\link{fifelse}} | ||
} | ||
\examples{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be helpful to have an example where the order of conditions matters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to understand. The order of conditions always matters (if I am not mistaken)
Looks great. I guess a "cheating" way to show utility of laziness is to have one of the conditions/values set to |
@jangorecki, @MichaelChirico , please see below benchmark with lazyness in action:
Let me know if you need anything else. tks. |
Awesome work everyone! |
Thank you. Please let me know your view with regards to issue 4114. I also sent a message to the team with regards to issue 1336. Not sure if you received it. I am interested to look into this file backed features but I Ieed someone to tell me exactly what is required. Happy to work with anyone here! In the meantime I wish you all a happy festive end of year. |
Seasonal greetings to you too! I just replied on 4114. On 1336, yes that's probably the highest priority after >2bn rows. I think we're all interested in working on it and somebody just needs to make a start. My approach was to do >2bn first as a way to prepare for 1336. Since there's not much point in having 1336 if it can't do >2bn rows. We haven't tried to work together from the start on new features before; it probably needs one person to take the lead by proposing specific code and a plan. |
Closes #3823
I just wanted to share my code.
If you think it is worth it, I can move the code to a branch so you can work on it?