-
Notifications
You must be signed in to change notification settings - Fork 7
/
README.Rmd
77 lines (52 loc) · 2.56 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "tools/README-"
)
```
# xray
[![Travis-CI Build Status](https://travis-ci.org/sicarul/xray.svg?branch=master)](https://travis-ci.org/sicarul/xray) [![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/xray)](https://cran.r-project.org/package=xray) [![CRAN_Downloads_Badge](https://cranlogs.r-pkg.org/badges/xray)](https://cran.r-project.org/package=xray)
The R Package to have X Ray vision on your datasets. This package lets you analyze the variables of a dataset, to evaluate how is your data shaped. Consider this the first step when you have your data for modeling, you can use this package to analyze all variables and check if there is anything weird worth transforming or even avoiding the variable altogether.
## Installation
You can install the stable version of xray from CRAN with:
```{r cran-installation, eval = FALSE}
install.packages("xray")
```
Or the latest dev version from Github:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("sicarul/xray")
```
## Usage
### Anomaly detection
`xray::anomalies` analyzes all your columns for anomalies, whether they are NAs, Zeroes, Infinite, etc, and warns you if it detects variables with at least 80% of rows with those anomalies. It also warns you when all rows have the same value.
Example:
```{r example-anomalies}
data(longley)
badLongley=longley
badLongley$GNP=NA
xray::anomalies(badLongley)
```
### Distributions
`xray::distributions` tries to analyze the distribution of your variables, so you can understand how each variable is statistically structured. It also returns a percentiles table of numeric variables as a result, which can inform you of the shape of the data.
```{r example-distributions}
distrLongley=longley
distrLongley$testCategorical=c(rep('One',7), rep('Two', 9))
xray::distributions(distrLongley)
```
### Distributions along a time axis
`xray::timebased` also investigates into your distributions, but shows you the change over time, so if there is any change in the distribution over time (For example a variable stops or starts being collected) you can easily visualize it.
```{r example-timebased}
dateLongley=longley
dateLongley$Year=as.Date(paste0(dateLongley$Year,'-01-01'))
dateLongley$Data='Original'
ndateLongley=dateLongley
ndateLongley$GNP=dateLongley$GNP+10
ndateLongley$Data='Offseted'
xray::timebased(rbind(dateLongley, ndateLongley), 'Year')
```