Lion

Text pair classification toolkit.

Usage

Preprocessing:

Transform the dataset to the standard formation . We currently support snli, qnli and quoraqp. Please write your own transformation scripts for other dataset.
python lion/data/dataset_utils/quoraqp.py convert-dataset --indir INDIR --outdir OUTDIR
Preprocess the dataset.
python lion/data/processor.py process-dataset --in_dir IN_DIR --out_dir OUT_DIR --splits ['train'|'dev'|'test'] --tokenizer_name [spacy/bert/xlnet] --vocab_file FILE_PATH --max_length SEQUENCE_LENGTH

Training:

Create a directory for saving model and put the config file in it .
Edit the config file, modifying the train file and dev file path .
Run lion/training/trainer.py
For example:
python lion/training/trainer.py --train --output_dir experiments/QQP/esim/ .

Hyper-parameter searching

Create a directory for saving model and put the config file in it .
Edit the config file, modifying the train file and dev file path .
Edit the tuned_params.yaml For example:

hidden_size:
    - 100
    - 200
    - 300
dropout:
    - 0.1
    - 0.2

Run python lion/training/search_parameter.py --parent_dir experiments/QQP/esim/hidden_dim/

Evaluation:

python lion/training/trainer.py --evaluate --output_dir experiments/QQP/esim/ --dev_file your_dev_path

Testing:

python lion/training/trainer.py --predict --output_dir experiments/QQP/esim/ --test_file your_test_file

Models

Model	Quora QP	SNLI	QNLI
BiMPM	86.9	86.0	80.5
Esim	88.4	87.4	81.4
BERT	91.3	91.1	91.7
XLNET	91.5	91.6	91.9

Note: All the performance in the above table is tested on the dev set. The hyperparameter we used for these models are all in the experiments/DATASET/MODEL directory.

How to use ELMO

Just write this in your config file: use_elmo: concat or only and remember to set the word_dim correctly. For example if you use ELMO embedding only, set the word_dim: 1024 or set the word_dim: 1324 if you use ELMO and GLove together.

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
experiments		experiments
lion		lion
.flake8		.flake8
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lion

Usage

Preprocessing:

Training:

Hyper-parameter searching

Evaluation:

Testing:

Models

How to use ELMO

License

About

Releases

Packages

Contributors 2

Languages

License

lixinsu/Lion

Folders and files

Latest commit

History

Repository files navigation

Lion

Usage

Preprocessing:

Training:

Hyper-parameter searching

Evaluation:

Testing:

Models

How to use ELMO

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages