Skip to content

r-lib/mirai

Repository files navigation

mirai mirai logo

CRAN status R-universe status R-CMD-check Codecov test coverage DOI

ミライ


みらい 未来

Minimalist Async Evaluation Framework for R

→ Designed for simplicity, a ‘mirai’ evaluates an R expression asynchronously in a parallel process, locally or distributed over the network.

→ Modern networking and concurrency, built on nanonext and NNG (Nanomsg Next Gen), ensures reliable and efficient scheduling over fast inter-process communications or TCP/IP secured by TLS. Launch remote resources via SSH or cluster managers for distributed computing.

→ The queued architecture scales to millions of tasks over thousands of processes, requiring no storage on the file system. Innovative features include event-driven promises, asynchronous parallel map, and automatic serialization of otherwise non-exportable reference objects.

Quick Start

mirai is Japanese for ‘future’ and is an implementation of futures in R.

mirai():

Sends an expression to be evaluated asynchronously in a separate R process and returns a mirai object immediately. Creation of a mirai is never blocking.

The result of a mirai m will be available at m$data once evaluation is complete and its return value is received. m[] may be used to wait for and collect the value.

library(mirai)

m <- mirai(
  {
    # slow operation
    Sys.sleep(2)
    sample(1:100, 1)
  }
)

m
#> < mirai [] >
m$data
#> 'unresolved' logi NA

# do other work

m[]
#> [1] 9
m$data
#> [1] 9

daemons():

Sets persistent background processes (daemons) where mirai are evaluated.

To launch 6 local daemons:

daemons(6)
#> [1] 6

To launch daemons over the network for distributed computing, this is supported via:

  • SSH
  • HPC cluster resource managers (Slurm, SGE, Torque, PBS, LSF)

See the reference vignette for further details.

mirai_map():

Maps a function over a list or vector, with each element processed as a mirai. For a dataframe or matrix, it automatically performs multiple map over the rows.

A ‘mirai_map’ object is returned immediately, and is always non-blocking.

Its value may be retrieved at any time using its [] method, returning a list. The [] method also provides options for flatmap, early stopping and/or progress indicators.

df <- data.frame(
  fruit = c("melon", "grapes", "coconut"),
  price = c(3L, 5L, 2L)
)

m <- df |>
  mirai_map(\(...) sprintf("%s: $%d", ...))
m
#> < mirai map [0/3] >
m[.flat]
#> [1] "melon: $3"   "grapes: $5"  "coconut: $2"

Design Concepts

mirai is designed from the ground up to provide a production-grade experience.

→ Fast

  • 1,000x more responsive vs. common alternatives [1]
  • Built for low-latency applications e.g. real time inference & Shiny apps

→ Reliable

  • No reliance on global options or variables for consistent behaviour
  • Explicit evaluation for transparent and predictable results

→ Scalable

  • Launch millions of tasks over thousands of connections
  • Proven track record for heavy-duty workloads in the life sciences industry

Powering the Ecosystem

mirai features the following core integrations, with usage examples in the linked vignettes:

R parallel   Provides the first official alternative communications backend for R, implementing the ‘MIRAI’ parallel cluster type, a feature request by R-Core at R Project Sprint 2023.

purrr   Powers parallel map for the purrr functional programming toolkit, a core tidyverse package.

promises   Implements next generation, event-driven promises. ‘mirai’ and ‘mirai_map’ objects are readily convertible to ‘promises’, and may be used directly with the promise pipe.

Shiny   The primary async backend for Shiny, supporting ExtendedTask and the next level of responsiveness and scalability for Shiny apps.

Plumber   The built-in async evaluator behind the @async tag in Plumber 2, also provides an async backend for Plumber.

torch   Allows Torch tensors and complex objects such as models and optimizers to be used seamlessly across parallel processes.

Arrow   Allows queries using the Apache Arrow format to be handled seamlessly over ADBC database connections hosted in background processes.

targets   Targets, a make-like pipeline tool, has adopted crew as its default high-performance computing backend. Crew is a distributed worker-launcher extending mirai to different distributed computing platforms, from traditional clusters to cloud services.

Thanks

We would like to thank in particular:

Will Landau for being instrumental in shaping development of the package, from initiating the original request for persistent daemons, through to orchestrating robustness testing for the high performance computing requirements of crew and targets.

Joe Cheng for integrating the ‘promises’ method to work seamlessly within Shiny, and prototyping event-driven promises.

Luke Tierney of R Core, for discussion on L’Ecuyer-CMRG streams to ensure statistical independence in parallel processing, and making it possible for mirai to be the first ‘alternative communications backend for R’.

Travers Ching for a novel idea in extending the original custom serialization support in the package.

Henrik Bengtsson for valuable insights leading to the interface accepting broader usage patterns.

Daniel Falbel for discussion around an efficient solution to serialization and transmission of torch tensors.

Kirill Müller for discussion on using parallel processes to host Arrow database connections.

Installation

Install the latest release from CRAN:

install.packages("mirai")

The current development version is available from R-universe:

install.packages("mirai", repos = "https://r-lib.r-universe.dev")

Links & References

◈ mirai R package: https://mirai.r-lib.org/
◈ nanonext R package: https://nanonext.r-lib.org/

mirai is listed in CRAN High Performance Computing Task View:
https://cran.r-project.org/view=HighPerformanceComputing

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Contributors 3

  •  
  •  
  •  

Languages