Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress bars #149

Closed
hadley opened this issue Dec 10, 2015 · 43 comments
Closed

Progress bars #149

hadley opened this issue Dec 10, 2015 · 43 comments
Labels
feature a feature request or enhancement map 🗺️
Milestone

Comments

@hadley
Copy link
Member

hadley commented Dec 10, 2015

Would be nice to have support for progress bars in all map functions. This is a nice feature of plyr.

Could use https://github.com/gaborcsardi/progress, although we might need to ask @gaborcsardi to also provide a C API.

@gaborcsardi
Copy link
Member

There is a header only C++ API, isn't that good? Although I have to say that is not very well tested and have less features. https://github.com/gaborcsardi/progress#c-api

@gaborcsardi
Copy link
Member

It would also make sense to put the C++ part in another package, as it is completely independent.

@lionel-
Copy link
Member

lionel- commented Dec 10, 2015

This will not work with mapping functions because we eval R functions from the C code :/ Any user interruption or R error will cause a long jump that bypasses all C++ destructors.

Thus if you have any data on the heap, you'll get leaks. For example all STL containers or even a simple std::string allocate memory dynamically so need to be destructed appropriately. See discussion in e2def88

@gaborcsardi
Copy link
Member

Well, there is an R API and a C++ API. It seems reasonable that you would be able to use at least one of them. :)

@vixr
Copy link

vixr commented Jan 22, 2016

+1 for this feature

@gaborcsardi
Copy link
Member

FWIW I wanted to note that I am adding some new progress bar API, which has the nice feature of having (almost) zero overhead when the progress bars are not shown (e.g. non-interactive use), in addition to ease of use. This is how it will look:

progress %~~% lapply(seq, fun, ...)

If progress bars are turned off, then it simply runs the lapply. If progress bars are on, then it appends the progress bar ticks to fun.

I am saying this, because it would be great to use it for purrr functions as well.

@lionel-
Copy link
Member

lionel- commented Mar 24, 2016

It'd probably be more purrr-like to have an adverb functional or function operator that takes mapping functionals and add progress bars to them. With a functional:

# ..f must be another functional that takes a .x and a .f
# ..f must have the usual purrr signature ..f(.x, .f, ...)
with_progress <- function(..x, ..f, .f, ...) {
  .f <- add_progress(.f, length(..x))
  ..f(..x, .f, ...)
}

mtcars %>% with_progress(map, as.character)

@lionel-
Copy link
Member

lionel- commented Mar 24, 2016

A quick thought: I think it's natural to have adverb functionals when we're modifying another functional, like in the example above. But otoh it's natural to have adverb function operators when we're modifying a regular function, e.g. safely(), lift(), etc.

@gaborcsardi
Copy link
Member

@lionel- Hmmm, maybe I misunderstand sg, but why not

mtcars %>% with_progress(map)(as.character)

then? Or is this what you mean in your second comment?

@lionel-
Copy link
Member

lionel- commented Mar 24, 2016

Or is this what you mean in your second comment?

yes this is what I mean. Maybe @hadley has another opinion though.

@gaborcsardi
Copy link
Member

Hmmm, actually I quite like this, no extra operator needed. Maybe I should do it with lapply as well. Unfortunately I cannot really do it with for loops. They have to use an operator:

with_pb %~~% for (i in 1:100) { }

@lionel-
Copy link
Member

lionel- commented Mar 24, 2016

Maybe I should do it with lapply as well

lapply(), vapply() etc should work for free with this approach since they take a vector as first argument and a function as second :)

mtcars %>% with_progress(vapply, sum, numeric(1))

Unfortunately I cannot really do it with for loops. They have to use an operator:

There is some discussion about function-like looping in #168 and #135.

@hadley
Copy link
Member Author

hadley commented Mar 24, 2016

I think that progress bars are so useful, there should be minimal friction to use them in purrr. That makes me think that they should be an option (like plyr), or possibly even automatically display given some conditional (e.g. loop has run for 2 seconds and has at least two more to go, like dplyr).

@lionel-
Copy link
Member

lionel- commented Mar 24, 2016

How about automatically displaying them unless (a) this is not an interactive session (b) a global option is set to disable them?

This is a case where it makes sense to have a global option since this is a side effect for user convenience that shouldn't have an impact on the return value. Also it'll still be possible to use withr::with_options() in case it's important to control the option on a case by case basis (though I don't see when that would be useful). I think that's preferable to a plyr-like option that would clutter the function signatures.

Also it's still nice to have a functional to add progress bars to lapply() etc.

@hadley
Copy link
Member Author

hadley commented Mar 24, 2016

Yes, agreed about non-interactive use + global option to turn off. That's what dplyr has too.

@psolymos
Copy link

Will have to look if pbapply could help here. It uses global option and turned off when non-interactive.

@hadley hadley added the feature a feature request or enhancement label Mar 3, 2017
@hadley
Copy link
Member Author

hadley commented Apr 24, 2017

Also useful to display names if they're present.

@sillasgonzaga

This comment has been minimized.

@chris-billingham

This comment has been minimized.

@tiernanmartin

This comment has been minimized.

@cderv
Copy link
Contributor

cderv commented Jan 21, 2018

@sillasgonzaga, @chris-billingham, @tiernanmartin , I am not part of the tidyverse team but I happen to know that they work on each development work by phase. There will be a purrr phase, don't worry !
So I think it does not help to ask for status update every 2 days or every week.

As you seem to be pretty interested in progress bar, if you don't already, know that currently, even if it is not transparent in purrr, you can create progress bar in the tidyverse.
Here is a dummy example you can run in your session, and it will display a progress bar.

# you can also load all the tidyverse 
library(dplyr)
library(purrr)

# dummy list of 10 elements with random numbers
dummy_list <- rerun(10, runif(5))
# create the progress bar with a dplyr function. 
pb <- progress_estimated(length(dummy_list))
res <- dummy_list %>%
  map(~{
    # update the progress bar (tick()) and print progress (print())
    pb$tick()$print()
    Sys.sleep(0.5)
    sum(.x)
  })

As you see it is just two lines to add to your code. Pretty simple.
One to create the progress bar element with dplyr::progress_estimated. It will create an object pb here that is an R6 class element. You can find the different methods with pb$<method>. For updating progress bar and print progress, you can just use pb$tick()$print() as you see in the example. You should read the help: help("progress_estimated", package = "dplyr")

It works very well with purrr function. Only drawback : makes your piped code a little less concise.

Hope it helps, and it will keep you waiting until better integration in purrr

@jtrecenti
Copy link

jtrecenti commented Feb 13, 2018

I think we have 3 options to integrate progress bars functionality in purrr

  1. create an adverb to modify the user function, adding tickers on it.
  2. add a .progress= parameter inside the map functions.
  3. create an adverb to modify the map functions.

(1) is easier to code but will force the user to learn a new adverb that depends on the original function and the input (at least the input length). (2) is harder but is straightforward to the user. (3) is the most general but also the hardest to understand

To solve (1), I was thinking something like this adverb using @gaborcsardi progress package

progressively <- function(.f, .n, ...) {
  pb <- progress::progress_bar$new(total = .n, ...)
  function(...) {
    pb$tick()
    .f(...)
  }
}

Simple example:

input <- 1:5
fun <- function(x) {
  Sys.sleep(.2)
  sample(x)
}
progress_fun <- progressively(fun, length(input))
purrr::map(input, progress_fun)

The problem is that if we run this two times the progress bar is not shown, because pb is already complete. But I think it is easy to find a way to restart it when this happens using some environment tricks.

If (1) is not enough, I think that (2) - add .progress= option - is the best option, because (3) - modify map functions - is hard to understand. But I also think it will be difficult to code.

@jtrecenti
Copy link

There's an fourth option as suggested by @lionel- and @hadley

  1. Add progress bar as default if the loop takes more than s seconds and the length of the input is greater than n. Control this in the global options.

That's better than (2) so it's the best approach. Would it require big changes in map functions?

@lionel-
Copy link
Member

lionel- commented Feb 13, 2018

This needs to be tackled at the same time as parallelism support, which we'll start working on soon.

@sdanielzafar
Copy link

@jtrecenti my vote is toward option 2, .progress = T. Also progressively is just too many characters IMO.

@johncassil
Copy link

Can't wait! :)

@jtrecenti
Copy link

jtrecenti commented May 1, 2018

We've been using furrr package for a while now. It uses the future package to do the hard job. @ctlente created a function named abjutils::pvec() inside abjutils package, that maps a function on a vector safely, in parallel and using progress bars. It has many bugs yet but I found it really really useful.

@maxheld83
Copy link

just wanted to chime in to say that I really dig @gaborcsardi progress package; much prefer the greater customisability over the simpler dplyr::progress_estimated() (which already works with purrr as per above example).
So if purrr could support progress, that'd be great.

@lionel- lionel- added this to the parallel milestone Dec 11, 2018
psychelzh added a commit to psychelzh/num-comp-nonsym that referenced this issue Jul 31, 2019
@TylerGrantSmith
Copy link

I probably should have checked here first, but i have produced wrapped versions of the purrr iterators which produce progress bars using the progress package. You can find my very early version here. purrrgress with the caveat that nothing has been tested yet outside my own use cases.

@kongdd
Copy link

kongdd commented Oct 15, 2019

call for this feature too

@6884
Copy link

6884 commented Jul 4, 2020

Yes please!!!!

@bbolker
Copy link

bbolker commented Aug 22, 2020

+1

1 similar comment
@fghjorth
Copy link

fghjorth commented Oct 3, 2020

+1

@brancengregory
Copy link

Hi, I just want to add that in my experience integration with progress would be great for purrr. As far as method, I believe the adverb option is cumbersome, and that progress bars should be default option (given conditionals ya'll mentioned) with option to disable via .progress = F

@jarobyte91

This comment has been minimized.

@kongdd
Copy link

kongdd commented May 14, 2021

call for this function too

@hathawayj

This comment has been minimized.

@ratnanil
Copy link

The solution providec by @cderv in #149 (comment) does not work anymore, since progress_estimated() has been depricated in dplyr 1.0.0. However, the library progress works nicely with purrr.

@gaborcsardi
Copy link
Member

Or use cli like so:

up <- function(x) { Sys.sleep(0.2); toupper(x) }
LETTERS <- purrr::map(cli::cli_progress_along(letters), ~ up(letters[[.x]]))

cli-purrr

With the caveat that the iterating function will receive the indices instead of the list entries.

@cboettig
Copy link

@gaborcsardi 🎉 that's fantastic! Any general pointers on whether devs should be using progress or cli to add progress bars these days, or are they considered equally viable but different options?

@gaborcsardi
Copy link
Member

Whichever you like, but progress will not receive any new features.

@JZL
Copy link

JZL commented Jun 27, 2022

Hi,
Longtime lurker on this issue since I really love using purrr and find progress bars helpful while waiting for long-running maps. This doesn't help the actual purrr defaults, but my interim solution was to create new map wrappers which internally take care of the progressr boilerplate. So I have map -> promap, map_lgl -> promap_lgl, etc. It's maybe a little verbose and the code can be cleaned up, but I just put it in my prelude. The only one I couldn't figure out was map_depth since it recurses. This is rare but I sometimes nest maps, so it's nice to control which map uses progress bars

promap <- function(.x, .f, ...) {
  with_progress({
    .f <- as_mapper(.f, ...)
    p <- progressor(steps = length(.x))
    everyN = 1
    if(length(.x) > 5000){
      everyN = 50
    }
    new.f <- function(...){
      ret = .f(...)
      # Hacky, but for large maps, only do a progressbar every so often.
      # Probably has minimal impact and maybe `runif` is more expensive
      if(everyN == 1){
        p()
      }else if(ceiling(runif(1, 0, everyN)) == 1){
        p(amount=everyN)
      }
      ret
    }
    .Call(purrr:::map_impl, environment(), ".x", "new.f", "list")
  })
}


promap_lgl <- function(.x, .f, ...) {
  with_progress({
    .f <- as_mapper(.f, ...)
    p <- progressor(steps = length(.x))
    everyN = 1
    if(length(.x) > 5000){
      everyN = 50
    }
    new.f <- function(...){
      ret = .f(...)
      if(everyN == 1){
        p()
      }else if(ceiling(runif(1, 0, everyN)) == 1){
        p(amount=everyN)
      }
      ret
    }
    .Call(purrr:::map_impl, environment(), ".x", "new.f", "logical")
  })
}
# ...
# Copying from https://github.com/tidyverse/purrr/blob/5aca9df41452f272fcef792dbc6d584be8be7167/R/map.R
# (I can post my full version on a gist, it's pretty much just a mechanical replacement skipping map_depth)

@millermc38
Copy link

millermc38 commented Jul 27, 2022

A simple and natural extrapolation of @gaborcsardi 's post is the following:

up <- function(x) { Sys.sleep(0.2); toupper(x) }

purrr::pmap_chr(.l = list(progress_bar=cli::cli_progress_along(letters),
                      to_capitalize=letters),
            .f = ~up(..2))

Could have used just map2, but I just wanted to make it clear that we can keep adding as many inputs to the map statement in the list that as we need.

Recommendation: add this to your code snippets!

@hadley hadley modified the milestones: parallel, 0.4.0 Aug 24, 2022
@hadley hadley closed this as completed in 8316d94 Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement map 🗺️
Projects
None yet
Development

No branches or pull requests