README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r echo=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/"
)
options(width = 100)
library(Rpriori)
```
[![Build Status](https://travis-ci.org/titoeb/Rpriori.svg?branch=master)](https://travis-ci.org/titoeb/Rpriori)


# Rpriori

The goal of `Rpriori` is to create association rules of type X=>Y by using the Apriori algorithm. For a more detailed demonstration of how the package works please consider the examples.R file in the examples folder. This `README` only covers the most important features.

## Installation

You can install `Rpriori` from github with:

```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("abuchmueller/Rpriori")
```

## Example: Creating association rules

This is a basic example which shows you how to create association rules with `Rpriori` using the `Groceries` dataset.

Before we do that it might be useful to take a look at our data beforehand.
Use the `summary()` function of Rpriori to do so:

```{r}
summary(Groceries)
```

We have 9835 transactions recorded with a total of 169 items and a density of 0.026. The average basket has 4 items in it.

Now we can try to find frequent items with the `FindFrequentItemsets()` function:

```{r}
Frequent <- FindFrequentItemsets(Groceries, minsupport = 0.1)
show(Frequent)
```

There are 8 frequent itemsets that occur in one out of ten transactions. 
If you want to take look at the itemsets now you can use the `print()` function.

```{r}
print(Frequent)
```

To create rules you need to supply `AssociationRules()` with a transactions database and a minimum support threshold. 
You can additionally set a minimum confidence threshold, the default value for minimal confidence is 0.

```{r}
Rules <- AssociationRules(Groceries, minsupport = 0.01, minconfidence = 0.2)
show(Rules)
```
We found 231 rules Association rules with our specified paramters.

## Example: Inspecting data

To get summary statistics on rules simply call `summary()`
```{r}
summary(Rules)
```

If you want to take a look at the underlying data used in rule creation there are multiple ways. One way is to use the `extract` function:
```{r}
Frequent <- extract(Rules)
summary(Frequent)
```

This extracts the frequent itemsets used to calculate association rules. You can also create a frequent itemmatrix directly:
```{r}
Frequent2 <- FindFrequentItemsets(Groceries, 0.01)
Frequent2
```
Since frequent itemset generation takes a lot longer than rule creation, it might be better to create a frequent item matrix first, and then use `AssociationRules()` to calculate rules.

```{r}
fRules <- AssociationRules(Groceries, Frequent, minsupport = 0.03, minconfidence = 0.4)
```
In this case `AssociationRules` won't need to recalculate the frequent item-sets if you do not lower the support threshold.

If you want to take a look at the transactions matrix used to calculate the frequent items you need to create a `TAMatrix` object first:
```{r}
Transactions <- makeTAMatrix(Groceries)
summary(Transactions)
```

## Example: Visualizing data with `plot()` or `qplot()`

All classes come with base plotting and `ggplot2` methods. Both `plot()` and `qplot()` only need to be supplied a valid object to work, however `qplot()` is more flexible and can sometimes be supplied with additional arguments like `col`or `alpha`.

###Plotting transactions
```{r}
plot(Transactions)
qplot(Transactions)
```

###Plotting frequent items
```{r}
plot(Frequent)
qplot(Frequent, type = "scatter", col = "red", alpha = 0.1)
```

###Plotting frequent items (as a histogram)
```{r}
hist(Frequent)
qplot(Frequent, type = "hist")
```

###Plotting rules
```{r}
plot(Rules)
qplot(Rules)
```

## Example: Using convenience functions like `support()`

There is a set of convenience functions to access information about the rules quality directly.
```{r}
support(Frequent)[1:5]
support(Rules)[1:5]
confidence(Rules)[1:5]
lift(Rules)[1:5]
leverage(Rules)[1:5]
```

## Example: Pruning rules

In this example we will use the `Epub` dataset containing the download history of
documents from the electronical publiation platform of the Vienna University of Economics and
Business Administration.

If you want to find association rules with minimal support of 0.0009 and minimal confidence of 0.1 run:
```{r}
rules <- AssociationRules(Epub, 0.0009, 0.1)
```

If you use `qplot` you can see that there are some rules with extremely high lift values.

```{r}
qplot(rules)
```


Now imagine you are only interested in rules with a lift value above 300:
```{r}
rules_pruned <- prune(rules, Lift = 300)
print(rules_pruned)
```

Similarly, you can also prune by confidence and have a look at the rules with confidence 
of 0.75 or above:
```{r}
print(prune(rules, Confidence = 0.75))
```