WIP: IN DEVELOPMENT
A ggplot2 based implementation of tabplot (github repo, paper)
tabplot offers a fast way to eyeball dataframes (my go-to tool over years). This uncovers possible interactions between variables when sorted by some variable. Hence, it builds intuition for any further modeling.
- Adds out-of-box support for grouped tibbles (tidy dataframes)
- Based on ggplot for flexible geoms for different variable types
- dfvis might not be as fast as tabplot
pacman::p_load("dplyr", "tabplot", "dfvis")
data("attrition", package = "modeldata")
attrition = as_tibble(attrition)
attrition_6 = attrition[, 1:6]
skimr::skim(attrition_6)
Name | attrition_6 |
Number of rows | 1470 |
Number of columns | 6 |
_______________________ | |
Column type frequency: | |
factor | 3 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
Attrition | 0 | 1 | FALSE | 2 | No: 1233, Yes: 237 |
BusinessTravel | 0 | 1 | FALSE | 3 | Tra: 1043, Tra: 277, Non: 150 |
Department | 0 | 1 | FALSE | 3 | Res: 961, Sal: 446, Hum: 63 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Age | 0 | 1 | 36.92 | 9.14 | 18 | 30 | 36 | 43 | 60 | ▂▇▇▃▂ |
DailyRate | 0 | 1 | 802.49 | 403.51 | 102 | 465 | 802 | 1157 | 1499 | ▇▇▇▇▇ |
DistanceFromHome | 0 | 1 | 9.19 | 8.11 | 1 | 2 | 7 | 14 | 29 | ▇▅▂▂▂ |
autoplot(attrition_6, sort_column_name = "DistanceFromHome")
suppressWarnings(
attrition_6 %>%
group_by(Attrition) %>%
autoplot(sort_column_name = "DistanceFromHome")
)
## Adding missing grouping variables: `Attrition`
## Adding missing grouping variables: `Attrition`
tabplot::tableplot(attrition_6, sortCol = "DistanceFromHome", nBins = 10)
- Contributions are welcome!
- Create interactive version (with shiny?)