Multitasking Framework for Unsupervised Simple Definition Generation

Source code for the paper Multitasking Framework for Unsupervised Simple Definition Generation published on ACL 2022.

Requirements

Training Environment

Pytorch
fairseq
blingfire

In order to install them, you can run this command:

pip install -r requirements-train.txt

Evaluation Environment

Pytorch
Sentence-Transformers
Jieba
NLTK
Pandas
scipy
xlrd
EASSE

In order to install them, you can run this command:

pip install -r requirements-eval.txt
git clone https://github.com/feralvam/easse.git
cd easse
pip install .

Usage

All data including the Chinese and English DG dataset, and the simple text corpora mentioned in the paper have been placed in the folder "data".
Please download the pretrained model parameters of MASS from [en|zh], unzip it, and put the unzipped files into the folder "pretrained_model/MASS" and "pretrained_model/MASS-zh" respectively.
To preprocess the dataset, please run the following command:

bash run/data_process.sh #for English
# or
bash run/data_process_zh.sh # for Chinese

To train a SimpDefiner that can simultaneously generated complex and simple definitions, you can run the following command:

bash run/train_oxford_oald_multi_task.sh # for English
# or
bash run/train_cwn_textbook_multi_task.sh # for Chinese

Model checkpoints will be saved in a checkpoint dir.

If you want to evaluate the trained model and generate definitions (both complex and simple) using this model, please run the following command:

bash run/evaluate_oxford_oald.sh --model_dir [model-dir] # for English
# or
bash run/evaluate_cwn_textbook.sh --model_dir [model-dir] # for Chinese

The generated definitions will be saved in the same checkpoint dir.

If you want to run automatic metrics for the generated definitions, please run the following command:

bash metrics/calc_metrics.sh [model-dir] [oxford|oald|cwn|textbook] [GPU_ID]

Cite

@inproceedings{kong-etal-2022-simpdefiner,
    title = "Multitasking Framework for Unsupervised Simple Definition Generation",
    author = "Kong, Cunliang and
      Chen, Yun and
      Zhang, Hengyuan and
      Yang, Liner and
      Yang, Erhong",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
    year = "2022"
}

Contact

If you have questions, suggestions or bug reports, please email cunliang.kong@outlook.com

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
mass		mass
metrics		metrics
pretrained_model		pretrained_model
run		run
.gitignore		.gitignore
README.md		README.md
encode-zh.py		encode-zh.py
encode.py		encode.py
requirements-eval.txt		requirements-eval.txt
requirements-train.txt		requirements-train.txt
tokenization_bert.py		tokenization_bert.py
tokenization_utils.py		tokenization_utils.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multitasking Framework for Unsupervised Simple Definition Generation

Requirements

Training Environment

Evaluation Environment

Usage

Cite

Contact

About

Releases

Packages

Languages

blcuicall/SimpDefiner

Folders and files

Latest commit

History

Repository files navigation

Multitasking Framework for Unsupervised Simple Definition Generation

Requirements

Training Environment

Evaluation Environment

Usage

Cite

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages