UPD January 10th, 2021: These scripts mostly become a part of AREkit-0.22.0 demo and examples! [demo-readme]
This repository is an application for neural-networks of AREkit framework, devoted to sentiment attitude extraction task [initial-paper], applied for a document contexts:
Figure: Example of a context with attitudes mentioned in it; named entities «Russia» and «NATO» have the negative attitude towards each other with additional indication of other named entities.
It provides applications for:
- Data serialization;
- Training neural networks for the following models list.
- Aspect-based Attentive encoders:
- Multilayer Perceptron (MLP) [code] / [github:nicolay-r];
- Self-based Attentive encoders:
- P. Zhou et. al. [code] / [github:SeoSangwoo];
- Z. Yang et. al. [code] / [github:ilivans];
- Single Sentence Based Architectures:
- CNN [code] / [github:roomylee];
- CNN + Aspect-based MLP Attention [code];
- PCNN [code] / [github:nicolay-r];
- PCNN + Aspect-based MLP Attention [code];
- RNN (LSTM/GRU/RNN) [code] / [github:roomylee];
- IAN (frames based) [code] / [github:lpq29743];
- RCNN (BiLSTM + CNN) [code] / [github:roomylee];
- RCNN + Self Attention [code];
- BiLSTM [code] / [github:roomylee];
- Bi-LSTM + Aspect-based MLP Attention [code]
- Bi-LSTM + Self Attention [code] / [github:roomylee];
- RCNN + Self Attention [code];
- Multi Sentence Based Encoders Architectures:
- Python-2.7
- AREKit == 0.20.5
AREkit repository:
# Clone repository in local folder of the currect project.
git clone -b 0.20.5-rc https://github.com/nicolay-r/AREkit ../arekit
# Install dependencies.
pip install -r arekit/requirements.txt
We utilize RusVectores news-2015
embedding:
mkdir -p data
curl http://rusvectores.org/static/models/rusvectores2/news_mystem_skipgram_1000_20_2015.bin.gz -o "data/news_rusvectores2.bin.gz"
Using run_serialization.sh
in order to prepare data for a particular experiment:
python run_serialization.py
--cv-count 3 --frames-version v2_0
--experiment rsr+ra --labels-count 3 --ra-ver v1_0
--emb-filepath data/news_rusvectores2.bin.gz
--entity-fmt rus-simple --balance-samples True
Using run_train_classifier.sh
to run an experiment.
CUDA_VISIBLE_DEVICES=0 python run_training.py --do-eval
--bags-per-minibatch 32 --dropout-keep-prob 0.80 --cv-count 3
--labels-count 3 --experiment rsr+ra --model-input-type ctx --ra-ver v1_0
--model-name cnn --test-every-k-epoch 5 --learning-rate 0.1
--balanced-input True --train-acc-limit 0.99 --epochs 100
Common flags:
--experiment
-- is an experiment which could be as follows:rsr
-- supervised learning + evaluation within RuSentRel collection;ra
-- pretraining with RuAttitudes collection;rsr+ra
-- combined training within RuSentRel and RuAttitudes and evalut.
--cv_count
-- data folding mode:1
-- predefined docs separation onto TRAIN/TEST (RuSentRel);k
-- CV-based folding ontok
-folds; (k=3
supported);
--frames_versions
-- RuSentiFrames collection version:v2.0
-- RuSentiFrames-2.0;
--ra_ver
-- RuAttitudes version, if collection is applicable (ra
orrsr+ra
experiments):v1_2
-- RuAttitudes-1.0 paper;v2_0_base
;v2_0_large
;v2_0_base_neut
;v2_0_large_neut
;
Training specific flags:
--model_name
-- model to train (see [list]);--do_eval
-- activates evaluation during training process;--bags_per_minibatch
-- количество мешков в мини-партии;--balanced_input
-- флаг, указывает на использование сбалансированной коллекции в обучении модели;--emb-filepath
-- path to Word2Vec model;--entity-fmt
-- entities formatting type:rus-simple
-- using russian masks:объект
,субъект
,сущость
;sharp-simple
-- using BERT related notation for meta tokens:#O
(object),#S
(subjects),#E
(entities);
--balance-samples
-- activates sample balancing;