warn or optimize usage of `ifelse()` #2677

mattdowle · 2018-03-15T22:26:51Z

Not sure the status of ifelse() speed in R these days. Years ago it used to be really slow and I avoided it. Is that still the case?
I've seen this idiom quite a bit in the wild :

DT[, newCol := ifelse(existingCol == 42,  1L, 2L)]

under verbose=TRUE there's a message about the plonk. It's slow and RAM inefficient I assume (haven't tested).

Better would be using i, as per #2676 :

DT[existingCol==42, newCol:=1L, fill=2L]

Nested ifelse(ifelse(ifelse())) also quite common. Either detect and warn about ifelse() usage and point to a document about how to avoid, or optimize it automatically. And if ifelse() in base is still slow, see why and try to improve it there.

The text was updated successfully, but these errors were encountered:

MichaelChirico · 2018-03-16T10:12:22Z

Just re-ran Ricardo's benchmark here and yes, it's still slow:

https://stackoverflow.com/questions/16275149/does-ifelse-really-calculate-both-of-its-vectors-every-time-is-it-slow

Isn't this just another use case for implementing more flexible nomatch (#857)?

The multiple-ifelse case should probably correspond to a join, IMO. Not sure if there's a concise way we can kludge that into existing [ code intuitively.

HughParsonage · 2018-03-16T10:18:54Z

I wrote an alternative to dplyr::if_else (itself quite a bit faster than ifelse) that might be a more stringent benchmark.

library(microbenchmark)
library(magrittr)
cnd <- sample(c(TRUE, FALSE), size = 100e3, replace = TRUE)
yes <- sample(letters, size = 100e3, replace = TRUE)
no <- sample(letters, size = 100e3, replace = TRUE)
na <- sample(letters, size = 100e3, replace = TRUE)

microbenchmark(dplyr =  dplyr::if_else(cnd, yes, no, na), base = ifelse(cnd, yes, no),
               hutils = hutils::if_else(cnd, yes, no, na),
               check = my_check) %T>%
    print %>%
    autoplot

Unit: milliseconds
   expr       min        lq      mean    median        uq       max neval cld
  dplyr  6.202732  6.931278  7.584452  7.541011  8.322263 10.836334   100  b 
   base 27.949490 28.777124 29.843440 29.234310 30.110884 47.372671   100   c
 hutils  1.816998  2.041676  2.192027  2.135944  2.211239  5.622667   100 a

mattdowle · 2018-03-16T23:48:43Z

Hugh - that's fantastic! Would you consider proposing your hutils::ifelse() as a patch to base R?

HughParsonage · 2018-03-16T23:54:00Z

Really? Haha I only wrote it because I liked dplyr::if_else safety but dplyr's compile times for it were too long to justify importing the whole package. I actually never expected it to be faster.

There are a few limitations -- namely I'm not sure how it works on more exotic classes like if_else(matrix, list, list) for example. If you think it's worth pursuing, I'll definitely offer it as a patch.

mattdowle · 2018-03-17T00:02:13Z

It's really really worth pursuing -- yes. R is kinda famous for its slow ifelse() and desperately needs to be fixed. But yes indeed -- to have any chance of being accepted it would have to be a complete drop-in replacement. Which is the reason fread and fwrite haven't been proposed, for example. And in data.table of course, the reason is we wanted [.data.frame to work differently to be not-the-same as base R. But ifelse() speed is more like sorting ... it should just be in base. And how many of the 10,000 packages use base::ifelse(). Probably quite a lot. Any automatic improvement there with no code changes needed by package maintainers is sure to be welcomed as it could speed up CRAN testing.

Scale up the data size though so that the times are many seconds or minutes. As you know, I tend to dismiss any chart that is in milliseconds scale.

One of the R manuals explains how to build R and run its tests. If you've done that already in your very first communication, it has a much better chance.

HughParsonage · 2018-03-17T04:09:39Z

Longer version with a 2 billion-length cnd etc.

> system.time(hutils::if_else(cnd, yes, no))
   user  system elapsed 
  55.26    4.19   59.75 
> system.time(hutils::if_else(cnd, yes, no))
   user  system elapsed 
  55.47    4.07   59.55 
> system.time(dplyr::if_else(cnd, yes, no))
   user  system elapsed 
 105.97   52.11  159.17 
> system.time(dplyr::if_else(cnd, yes, no))
   user  system elapsed 
 109.05   59.56  168.74 
> system.time(ifelse(cnd, yes, no))
   user  system elapsed 
 467.92   51.15  520.66

While I never intended it to be faster when I wrote it, I'm confident hutils::if_else is always faster than dplyr::if_else and include that assertion as a unit test, only skipping on CRAN because it takes a while to run (https://github.com/HughParsonage/hutils/blob/15e62bdf235dc6084f1006824a20a42fab12ce9c/tests/testthat/test_1-0-0_if_else.R#L149).

Similar with hutils::coalesce v dplyr::coalesce. (Again, nothing wrong or even slow with the original -- just needed it in situations where compile time was important).

HughParsonage · 2018-03-17T04:17:22Z

Regarding the inclusion in base R, it seems a discussion on R-dev crops up every so often (e.g. ifelse() woes ... can we agree on a ifelse2() ? ), but good points are raised. Hadley's point that even if we didn't care about backwards-compatibility, the "correct" result of ifelse(c(TRUE, FALSE), factor(c("a", "b")), factor(c("c", "b")) is unclear. That gives me a bit of pause about implementation, but my current thought is to use hutils::if_else as the superstructure but instead of raising an error when there is some mismatch, falling back to base::ifelse, thus preserving backwards-compatibility but getting performance and a missing argument.

jangorecki · 2019-08-05T21:25:24Z

We could optimize ifelse into fifelse
To reduce a risk we could "optimize" it to call both functions, compare output, if differs then ask user to report. Then after some time we could reduce extra overhead of 2 calls and leave only fifelse.

MichaelChirico · 2024-12-09T14:56:37Z

Corresponding PR #3800 was closed as waiting for some strong user request for this. Closing here now too.

mattdowle added the enhancement label Mar 15, 2018

MichaelChirico mentioned this issue Feb 20, 2019

data.table-style coalesce #3424

Closed

MichaelChirico mentioned this issue Jul 1, 2019

data.table::if_else() or data.table::iif() #3657

Closed

MichaelChirico mentioned this issue Aug 29, 2019

detect and redirect ifelse usage in j to fifelse #3800

Closed

mattdowle added this to the 1.12.4 milestone Aug 30, 2019

MichaelChirico added the pr_available label Sep 3, 2019

mattdowle modified the milestones: 1.12.4, 1.13.0 Sep 17, 2019

MichaelChirico removed the pr-available label Oct 4, 2019

MichaelChirico mentioned this issue Nov 28, 2019

Print Option: Print columns that fit in single console without wrapping #4074

Merged

mattdowle modified the milestones: 1.12.7, 1.12.9 Dec 8, 2019

mattdowle modified the milestones: 1.13.1, 1.13.3 Oct 17, 2020

jangorecki modified the milestones: 1.14.3, 1.14.5 Jul 19, 2022

jangorecki modified the milestones: 1.14.11, 1.15.1 Oct 29, 2023

MichaelChirico modified the milestones: 1.16.0, 1.17.0 Jul 10, 2024

MichaelChirico closed this as completed Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warn or optimize usage of `ifelse()` #2677

warn or optimize usage of `ifelse()` #2677

mattdowle commented Mar 15, 2018 •

edited

Loading

MichaelChirico commented Mar 16, 2018

HughParsonage commented Mar 16, 2018 •

edited

Loading

mattdowle commented Mar 16, 2018 •

edited

Loading

HughParsonage commented Mar 16, 2018

mattdowle commented Mar 17, 2018 •

edited

Loading

HughParsonage commented Mar 17, 2018

HughParsonage commented Mar 17, 2018 •

edited

Loading

jangorecki commented Aug 5, 2019

MichaelChirico commented Dec 9, 2024

warn or optimize usage of ifelse() #2677

warn or optimize usage of ifelse() #2677

Comments

mattdowle commented Mar 15, 2018 • edited Loading

MichaelChirico commented Mar 16, 2018

HughParsonage commented Mar 16, 2018 • edited Loading

mattdowle commented Mar 16, 2018 • edited Loading

HughParsonage commented Mar 16, 2018

mattdowle commented Mar 17, 2018 • edited Loading

HughParsonage commented Mar 17, 2018

HughParsonage commented Mar 17, 2018 • edited Loading

jangorecki commented Aug 5, 2019

MichaelChirico commented Dec 9, 2024

warn or optimize usage of `ifelse()` #2677

warn or optimize usage of `ifelse()` #2677

mattdowle commented Mar 15, 2018 •

edited

Loading

HughParsonage commented Mar 16, 2018 •

edited

Loading

mattdowle commented Mar 16, 2018 •

edited

Loading

mattdowle commented Mar 17, 2018 •

edited

Loading

HughParsonage commented Mar 17, 2018 •

edited

Loading