-
Notifications
You must be signed in to change notification settings - Fork 0
/
cf-tutorial.Rmd
75 lines (57 loc) · 2.05 KB
/
cf-tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
title: "CF Tutorial"
subtitle: "Some Notes on Collaborative Filtering"
author: "Ralph Schlosser"
date: "February 2017"
output:
beamer_presentation:
incremental: false
includes:
in_header: [mystyle.tex, settings.tex]
keep_tex: true
---
## Overview
* Introduction
* Model-based Collaborative Filtering
* Alternating Least Squares: Intuition
## Introduction
* Want to predict user ratings for movies which they haven't rated.
* This can be achieved with \HIGHLIGHT{Collaborative Filtering}.
* CF is not *one* algorithm, it's a broad set of different techniques.
* Here: Model-based approach using \HIGHLIGHT{Alternating Least Squares}.
| Users | Movie 1 | Movie 2 | Movie 3 | ... |
|-------- |--------- |--------- |--------- |--------- |
| User 1 | \orange{?}| 5.9 | 2.6 | ... |
| User 2 | 1.4 | 5.8 | \orange{?}| ... |
| User 3 | 1.5 | 5.8 | \orange{?}| ... |
| ... | ... | ... | ... | ... |
## Model-based Collaborative Filtering
* Have a *sparse* and usually large matrix $\mathbf{R} \in \mathbb{R}^{u \times v}$ of user ratings.
* $u$ - Number of users (rows)
* $v$ - Number of movies (columns)
* $R_{ij}$ - Rating of user $i$ for movie $j$.
```{r, echo=FALSE, eval=FALSE}
# Create random matrix.
replicate(5, round(runif(5), 2) * 10)
```
* Example:
$$
\mathbf{R} =
\begin{pmatrix}
2.2 & 2.7 & 7.7 & 2.9 & 3.3\\
? & ? & 2.6 & 1.2 & 8.9\\
7.0 & ? & 3.5 & 0.7 & 2.1\\
9.1 & 0.6 & ? & 1.8 & ?\\
? & ? & 7.4 & 3.1 & 5.9
\end{pmatrix}
$$
## Alternating Least Squares: Intuition
* Model-based: Try to uncover \HIGHLIGHT{latent factors} that model the data.
* Can be achieved through approximate matrix decomposition: $\mathbf{R} \approx \mathbf{U} \times \mathbf{V}^{T}$
\begin{center}
\includegraphics[scale=0.5]{fig/mat_decomp.pdf}
\end{center}
* The $k$ columns in $\mathbf{U}$ and $\mathbf{V}^{T}$ correspond to the latent, i.e. *unobserved* factors.
* \orange{ALS}: Approximate $\mathbf{U}$ and $\mathbf{V}$ through linear regression.
## Links
\FIXME{Add links}