forked from titoeb/Rpriori
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
166 lines (125 loc) · 4.77 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r echo=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "figures/"
)
options(width = 100)
library(Rpriori)
```
[![Build Status](https://travis-ci.org/titoeb/Rpriori.svg?branch=master)](https://travis-ci.org/titoeb/Rpriori)
# Rpriori
The goal of `Rpriori` is to create association rules of type X=>Y by using the Apriori algorithm. For a more detailed demonstration of how the package works please consider the examples.R file in the examples folder. This `README` only covers the most important features.
## Installation
You can install `Rpriori` from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("abuchmueller/Rpriori")
```
## Example: Creating association rules
This is a basic example which shows you how to create association rules with `Rpriori` using the `Groceries` dataset.
Before we do that it might be useful to take a look at our data beforehand.
Use the `summary()` function of Rpriori to do so:
```{r}
summary(Groceries)
```
We have 9835 transactions recorded with a total of 169 items and a density of 0.026. The average basket has 4 items in it.
Now we can try to find frequent items with the `FindFrequentItemsets()` function:
```{r}
Frequent <- FindFrequentItemsets(Groceries, minsupport = 0.1)
show(Frequent)
```
There are 8 frequent itemsets that occur in one out of ten transactions.
If you want to take look at the itemsets now you can use the `print()` function.
```{r}
print(Frequent)
```
To create rules you need to supply `AssociationRules()` with a transactions database and a minimum support threshold.
You can additionally set a minimum confidence threshold, the default value for minimal confidence is 0.
```{r}
Rules <- AssociationRules(Groceries, minsupport = 0.01, minconfidence = 0.2)
show(Rules)
```
We found 231 rules Association rules with our specified paramters.
## Example: Inspecting data
To get summary statistics on rules simply call `summary()`
```{r}
summary(Rules)
```
If you want to take a look at the underlying data used in rule creation there are multiple ways. One way is to use the `extract` function:
```{r}
Frequent <- extract(Rules)
summary(Frequent)
```
This extracts the frequent itemsets used to calculate association rules. You can also create a frequent itemmatrix directly:
```{r}
Frequent2 <- FindFrequentItemsets(Groceries, 0.01)
Frequent2
```
Since frequent itemset generation takes a lot longer than rule creation, it might be better to create a frequent item matrix first, and then use `AssociationRules()` to calculate rules.
```{r}
fRules <- AssociationRules(Groceries, Frequent, minsupport = 0.03, minconfidence = 0.4)
```
In this case `AssociationRules` won't need to recalculate the frequent item-sets if you do not lower the support threshold.
If you want to take a look at the transactions matrix used to calculate the frequent items you need to create a `TAMatrix` object first:
```{r}
Transactions <- makeTAMatrix(Groceries)
summary(Transactions)
```
## Example: Visualizing data with `plot()` or `qplot()`
All classes come with base plotting and `ggplot2` methods. Both `plot()` and `qplot()` only need to be supplied a valid object to work, however `qplot()` is more flexible and can sometimes be supplied with additional arguments like `col`or `alpha`.
###Plotting transactions
```{r}
plot(Transactions)
qplot(Transactions)
```
###Plotting frequent items
```{r}
plot(Frequent)
qplot(Frequent, type = "scatter", col = "red", alpha = 0.1)
```
###Plotting frequent items (as a histogram)
```{r}
hist(Frequent)
qplot(Frequent, type = "hist")
```
###Plotting rules
```{r}
plot(Rules)
qplot(Rules)
```
## Example: Using convenience functions like `support()`
There is a set of convenience functions to access information about the rules quality directly.
```{r}
support(Frequent)[1:5]
support(Rules)[1:5]
confidence(Rules)[1:5]
lift(Rules)[1:5]
leverage(Rules)[1:5]
```
## Example: Pruning rules
In this example we will use the `Epub` dataset containing the download history of
documents from the electronical publiation platform of the Vienna University of Economics and
Business Administration.
If you want to find association rules with minimal support of 0.0009 and minimal confidence of 0.1 run:
```{r}
rules <- AssociationRules(Epub, 0.0009, 0.1)
```
If you use `qplot` you can see that there are some rules with extremely high lift values.
```{r}
qplot(rules)
```
Now imagine you are only interested in rules with a lift value above 300:
```{r}
rules_pruned <- prune(rules, Lift = 300)
print(rules_pruned)
```
Similarly, you can also prune by confidence and have a look at the rules with confidence
of 0.75 or above:
```{r}
print(prune(rules, Confidence = 0.75))
```