The polars package for R gives users access to a lightning fast Data Frame library written in Rust. Polars’ embarrassingly parallel execution, cache efficient algorithms and expressive API makes it perfect for efficient data wrangling, data pipelines, snappy APIs, and much more besides. Polars also supports “streaming mode” for out-of-memory operations. This allows users to analyze datasets many times larger than RAM.
Examples of common operations:
- read CSV, JSON, Parquet, and other file formats;
- filter rows and select columns;
- modify and create new columns;
- group by and aggregate;
- reshape data;
- join and concatenate different datasets;
- sort data;
- work with dates and times;
- handle missing values;
- use the lazy execution engine for maximum performance and memory-efficient operations
Note that this package is rapidly evolving and there are a number of
breaking changes at each version. Be sure to check the
changelog when updating
polars
.
The recommended way to install this package is via R-multiverse:
Sys.setenv(NOT_CRAN = "true")
install.packages("polars", repos = "https://community.r-multiverse.org")
The “Install”
vignette
(vignette("install", "polars")
) gives more details on how to install
this package and other ways to install it.
To avoid conflicts with other packages and base R function names,
polars’s top level functions are hosted in the pl
namespace, and
accessible via the pl$
prefix. This means that polars
queries
written in Python and in R are very similar.
For example, rewriting the Python example from https://github.com/pola-rs/polars in R:
library(polars)
df = pl$DataFrame(
A = 1:5,
fruits = c("banana", "banana", "apple", "apple", "banana"),
B = 5:1,
cars = c("beetle", "audi", "beetle", "beetle", "beetle")
)
# embarrassingly parallel execution & very expressive query language
df$sort("fruits")$select(
"fruits",
"cars",
pl$lit("fruits")$alias("literal_string_fruits"),
pl$col("B")$filter(pl$col("cars") == "beetle")$sum(),
pl$col("A")$filter(pl$col("B") > 2)$sum()$over("cars")$alias("sum_A_by_cars"),
pl$col("A")$sum()$over("fruits")$alias("sum_A_by_fruits"),
pl$col("A")$reverse()$over("fruits")$alias("rev_A_by_fruits"),
pl$col("A")$sort_by("B")$over("fruits")$alias("sort_A_by_B_by_fruits")
)
#> shape: (5, 8)
#> ┌────────┬────────┬───────────────────────┬─────┬───────────────┬─────────────────┬─────────────────┬───────────────────────┐
#> │ fruits ┆ cars ┆ literal_string_fruits ┆ B ┆ sum_A_by_cars ┆ sum_A_by_fruits ┆ rev_A_by_fruits ┆ sort_A_by_B_by_fruits │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ str ┆ str ┆ str ┆ i32 ┆ i32 ┆ i32 ┆ i32 ┆ i32 │
#> ╞════════╪════════╪═══════════════════════╪═════╪═══════════════╪═════════════════╪═════════════════╪═══════════════════════╡
#> │ apple ┆ beetle ┆ fruits ┆ 11 ┆ 4 ┆ 7 ┆ 4 ┆ 4 │
#> │ apple ┆ beetle ┆ fruits ┆ 11 ┆ 4 ┆ 7 ┆ 3 ┆ 3 │
#> │ banana ┆ beetle ┆ fruits ┆ 11 ┆ 4 ┆ 8 ┆ 5 ┆ 5 │
#> │ banana ┆ audi ┆ fruits ┆ 11 ┆ 2 ┆ 8 ┆ 2 ┆ 2 │
#> │ banana ┆ beetle ┆ fruits ┆ 11 ┆ 4 ┆ 8 ┆ 1 ┆ 1 │
#> └────────┴────────┴───────────────────────┴─────┴───────────────┴─────────────────┴─────────────────┴───────────────────────┘
The Get Started
vignette
(vignette("polars")
) provides a more detailed introduction to
polars.
While one can use polars as-is, other packages build on it to provide different syntaxes:
- polarssql provides a polars backend for DBI and dbplyr.
- tidypolars allows one to use the tidyverse syntax while using the power of polars.
The online documentation can be found at https://pola-rs.github.io/r-polars/.
If you encounter a bug, please file an issue with a minimal reproducible example on GitHub.
Consider joining our Discord subchannel for additional help and discussion.