The Recursive Neural Tensor Network (RNTN) was state of the art for sentiment analysis in 2013.
This is an old (from 2015, before TensorFlow and Torch) GPU-implementation of RNTN described by Socher et al (2013) in Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.
The model is trained using the Stanford Sentiment Treebank. Download extract extract train.txt
and vocabulary.txt
to ./data/
. RNTN.py
loads and trains the model.
The only dependencies are PyCUDA and NumPy.
-
$d$ - Length of word vector -
$n$ - Node/layer -
$x$ - Activation/output of neuron$(x \in \mathbb{R}^{d}$ ;$\tanh z)$ -
$z$ - Input to neuron$(z \in \mathbb{R}^{d}$ ;$z = Wx)$ -
$t$ - Target vector$(t \in \mathbb{R}^5$ ; 0-1 coded) -
$y$ - Prediction$(y \in \mathbb{R}^5$ ; output of softmax layer -$softmax(z))$ -
$W_s$ - Classification matrix$(W_s \in \mathbb{R}^{5 \times d})$ -
$W$ - Weight matrix$(W \in \mathbb{R}^{d \times 2d})$ -
$V$ - Weight tensor$(V^{1:d} \in \mathbb{R}^{2d \times 2d \times d} )$ -
$L$ - Word embedding matrix$(L \in \mathbb{R}^{d \times |V|}$ ,$|V|$ is the size of the vocabulary) -
$\theta$ - All weight parameters$(\theta = (W_s, W, V, L))$ -
$E$ - The cost as a function of$\theta$ -
$\delta_l$ - Error going to the left child node$(\delta_r$ error to the right child node)
$$ \frac{\partial y_k}{\partial z^{s}{i}} = y{i}(\delta_{ik} - y_{k}) $$
$$ \begin{split} \delta^s = W_s{^T}(y - t) \otimes (1 - x{n}^2) \ \frac{\partial E_n}{\partial V^{[k]}} = \delta^s{k} \bigg[ \begin{array}{c} x^l{{n-1}} \ x^r{{n-1}} \end{array} \bigg] \bigg[ \begin{array}{c} x_{n-1}^l \ x^{r}{_{n-1}} \end{array} \bigg]^T \ \end{split} $$
$$ \begin{split} \delta_{n} = \delta^{s,n} \ \delta^{n-1}{k} = \big( W^T \delta^n + S \big) \otimes f' \bigg( \bigg[ \begin{array}{c} x^l{n-1}\ x^r_{n-1} \end{array} \bigg] \bigg) \ S = \sum\limits_{k = 1}^d \delta^n \bigg( V^{[k]} + \big(V^{[k]})^T \bigg) \bigg[ \begin{array}{c} x^l_{n-1}\ x^r_{n-1} \end{array} \bigg] \ \delta^{n-1}l = \delta_l^{s,n-1} + \delta^{n-1}[1:d] \ \frac{\partial E{n-1}}{\partial V^{[k]}} = \frac{\partial E_n}{\partial V^{[k]}} + \delta^{n-1}l \bigg[ \begin{array}{c} x{n-2}^l \ x^{r}{_{n-2}} \end{array} \bigg]^T \end{split} $$
Reference: R. Socher, A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng and C. Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP.