みらい 未来
Minimalist Async Evaluation Framework
for R
→ Designed for simplicity, a ‘mirai’ evaluates an R
expression asynchronously in a parallel process, locally or distributed
over the network.
→ Modern networking and concurrency, built on nanonext and NNG (Nanomsg Next Gen), ensures reliable and efficient scheduling over fast inter-process communications or TCP/IP secured by TLS. Launch remote resources via SSH or cluster managers for distributed computing.
→ The queued architecture scales to millions of tasks over thousands of
processes, requiring no storage on the file system. Innovative features
include event-driven promises, asynchronous parallel map, and automatic
serialization of otherwise non-exportable reference objects.
mirai is Japanese for ‘future’ and is an implementation of futures in R.
→ mirai()
:
Sends an expression to be evaluated asynchronously in a separate R process and returns a mirai object immediately. Creation of a mirai is never blocking.
The result of a mirai m
will be available at m$data
once evaluation
is complete and its return value is received. m[]
may be used to wait
for and collect the value.
library(mirai)
m <- mirai(
{
# slow operation
Sys.sleep(2)
sample(1:100, 1)
}
)
m
#> < mirai [] >
m$data
#> 'unresolved' logi NA
# do other work
m[]
#> [1] 9
m$data
#> [1] 9
→ daemons()
:
Sets persistent background processes (daemons) where mirai are evaluated.
To launch 6 local daemons:
daemons(6)
#> [1] 6
To launch daemons over the network for distributed computing, this is supported via:
- SSH
- HPC cluster resource managers (Slurm, SGE, Torque, PBS, LSF)
See the reference vignette for further details.
→ mirai_map()
:
Maps a function over a list or vector, with each element processed as a mirai. For a dataframe or matrix, it automatically performs multiple map over the rows.
A ‘mirai_map’ object is returned immediately, and is always non-blocking.
Its value may be retrieved at any time using its []
method, returning
a list. The []
method also provides options for flatmap, early
stopping and/or progress indicators.
df <- data.frame(
fruit = c("melon", "grapes", "coconut"),
price = c(3L, 5L, 2L)
)
m <- df |>
mirai_map(\(...) sprintf("%s: $%d", ...))
m
#> < mirai map [0/3] >
m[.flat]
#> [1] "melon: $3" "grapes: $5" "coconut: $2"
mirai is designed from the ground up to provide a production-grade experience.
→ Fast
- 1,000x more responsive vs. common alternatives [1]
- Built for low-latency applications e.g. real time inference & Shiny apps
→ Reliable
- No reliance on global options or variables for consistent behaviour
- Explicit evaluation for transparent and predictable results
→ Scalable
- Launch millions of tasks over thousands of connections
- Proven track record for heavy-duty workloads in the life sciences industry
mirai features the following core integrations, with usage examples in the linked vignettes:
Provides the first official alternative communications backend for R,
implementing the ‘MIRAI’ parallel cluster type, a feature request by
R-Core at R Project Sprint 2023.
Powers parallel map for the purrr functional programming toolkit, a
core tidyverse package.
Implements next generation, event-driven promises. ‘mirai’ and
‘mirai_map’ objects are readily convertible to ‘promises’, and may be
used directly with the promise pipe.
The primary async backend for Shiny, supporting ExtendedTask and the
next level of responsiveness and scalability for Shiny apps.
The built-in async evaluator behind the @async tag in Plumber 2, also
provides an async backend for Plumber.
Allows Torch tensors and complex objects such as models and optimizers
to be used seamlessly across parallel processes.
Allows queries using the Apache Arrow format to be handled seamlessly
over ADBC database connections hosted in background processes.
Targets, a make-like pipeline tool, has adopted crew as its default
high-performance computing backend. Crew is a distributed
worker-launcher extending mirai to different distributed computing
platforms, from traditional clusters to cloud services.
We would like to thank in particular:
Will Landau for being instrumental in shaping development of the package, from initiating the original request for persistent daemons, through to orchestrating robustness testing for the high performance computing requirements of crew and targets.
Joe Cheng for integrating the ‘promises’ method to work seamlessly within Shiny, and prototyping event-driven promises.
Luke Tierney of R Core, for discussion on L’Ecuyer-CMRG streams to ensure statistical independence in parallel processing, and making it possible for mirai to be the first ‘alternative communications backend for R’.
Travers Ching for a novel idea in extending the original custom serialization support in the package.
Henrik Bengtsson for valuable insights leading to the interface accepting broader usage patterns.
Daniel Falbel for discussion around an efficient solution to serialization and transmission of torch tensors.
Kirill Müller for discussion on using parallel processes to host Arrow database connections.
Install the latest release from CRAN:
install.packages("mirai")
The current development version is available from R-universe:
install.packages("mirai", repos = "https://r-lib.r-universe.dev")
◈ mirai R package: https://mirai.r-lib.org/
◈ nanonext R
package: https://nanonext.r-lib.org/
mirai is listed in CRAN High Performance Computing Task View:
https://cran.r-project.org/view=HighPerformanceComputing
–
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.