README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# dance <img src="man/figures/logo.png" align="right" />

<!-- badges: start -->
[![Lifecycle Status](https://img.shields.io/badge/lifecycle-experimental-blue.svg)](https://www.tidyverse.org/lifecycle/)
[![Travis build status](https://travis-ci.org/romainfrancois/dance.svg?branch=master)](https://travis-ci.org/romainfrancois/dance)
<!-- badges: end -->

![](https://media.giphy.com/media/mnLGTXoWVzAfm/giphy.gif)

Dancing `r emo::ji("woman_dancing")` with the stats, aka `tibble()` dancing `r emo::ji("man_dancing")`. 
`dance` is a sort of reinvention of `dplyr` classic verbs, with a more modern stack 
underneath, i.e. it leverages a lot from `vctrs` and `rlang`. 

# Installation

You can install the development version from GitHub.

```{r, eval=FALSE}
# install.packages("pak")
pak::pkg_install("romainfrancois/dance")
```

# Usage

We'll illustrate tibble dancing with `iris` grouped by `Species`. 

```{r example}
library(dance)
g <- iris %>% group_by(Species)

```

### waltz(), polka(), tango(), charleston()

These are in the neighborhood of `dplyr::summarise()`. 

`waltz()` takes a grouped tibble and a list of formulas and returns a tibble with: 
as many columns as supplied formulas, one row per group. It does not prepend the grouping 
variables (see `tango` for that). 
  
```{r}
g %>% 
  waltz(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
```

`polka()` deals with peeling off one layer of grouping: 

```{r}
g %>% 
  polka()
```

`tango()` binds the results of `polka()` and `waltz()` so is the closest to 
`dplyr::summarise()` 

```{r}
g %>% 
  tango(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
```

`charleston()` is like `tango` but it packs the new columns in a tibble: 

```{r}
g %>% 
  charleston(
    Sepal.Length = ~mean(Sepal.Length), 
    Sepal.Width  = ~mean(Sepal.Width)
  )
```


### swing, twist

There is no `waltz_at()`, `tango_at()`, etc ... but instead we can use 
either the same function on a set of columns or a set of functions on the same column. 

For this, we need to learn new dance moves: 

`swing()` and `twist()` are for applying the same function to a set 
of columns: 

```{r}
library(tidyselect)

g %>% 
  tango(swing(mean, starts_with("Petal")))

g %>% 
  tango(data = twist(mean, starts_with("Petal")))
```

They differ in the type of column is created and how to name them: 

 - `swing()` makes as many new columns as are selected by the tidy selection, and 
   the columns are named using a `.name` glue pattern, this way we might `swing()`
   several times. 
   
```{r}
g %>% 
  tango(
    swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
    swing(median, starts_with("Petal"), .name = "median_{var}"), 
  )
```

 - `twist()` instead creates a single data frame column. 
 
```{r}
g %>% 
  tango(
    mean   = twist(mean, starts_with("Petal")), 
    median = twist(median, starts_with("Petal")), 
  )
```

The first arguments of `swing()` and `twist()` are either a function or a 
formula that uses `.` as a placeholder. Subsequent arguments are 
tidyselect selections. 

You can combine `swing()` and `twist()` in the same `tango()` or `waltz()`: 

```{r}
g %>% 
  tango(
    swing(mean, starts_with("Petal"), .name = "mean_{var}"), 
    median = twist(median, contains("."))
  )
```

### rumba, zumba

Similarly `rumba()` can be used to apply several functions to a single column. 
`rumba()` creates single columns and `zumba()` packs them into a data frame column. 

```{r}
g %>% 
  tango(
    rumba(Sepal.Width, mean = mean, median = median, .name = "Sepal_{fun}"), 
    Petal = zumba(Petal.Width, mean = mean, median = median)
  )
```

### salsa, chacha, samba, madison

Now we enter the realms of `dplyr::mutate()` with: 

 - `salsa()` : to create new columns
 - `chacha()`: to reorganize a grouped tibble so that data for each group is contiguous
 - `samba()` : `chacha()` + `salsa()` 

```{r}
g %>% 
  salsa(
    Sepal = ~Sepal.Length * Sepal.Width, 
    Petal = ~Petal.Length * Petal.Width
  )
```

You can `swing()`, `twist()`, `rumba()` and `zumba()` here too, and if you
want the original data, you can use `samba()` instead of `salsa()`: 

```{r}
g %>% 
  samba(centered = twist(~ . - mean(.), everything(), -Species))
```

`madison()` packs the columns `salsa()` would have created

```{r}
g %>% 
  madison(swing(~ . - mean(.), starts_with("Sepal")))
```


### bolero and mambo

`bolero()` is similar to `dplyr::filter()`. 
The formulas may be made by `mambo()` if you want to apply the same 
predicate to a tidyselection of columns: 

```{r}
g %>% 
  bolero(~Sepal.Width > 4)

g %>% 
  bolero(mambo(~. > 4, starts_with("Sepal")))

g %>% 
  bolero(mambo(~. > 4, starts_with("Sepal"), .op = or))
```