EWISE

Reference code for ACL2019 paper Zero-shot Word Sense Disambiguation using Sense Definition Embeddings. EWISE[1] (Extended WSD Incorporating Sense Embeddings) is a principled framework to learn from a combination of sense-annotated data, dictionary definitions and lexical knowledge bases.

We have used the WSD evalauation framework[2] for training and evaluation.

Dependencies

The code was written with, or depends on:

Python 3.6
Pytorch 1.4.0
NLTK 3.4.5
WSD evalauation framework[2]

Running the code

Create a virtualenv and install dependecies

virtualenv -p python3.6 env
source env/bin/activate
pip install -r requirements.txt
python -m nltk.downloader wordnet
python -m spacy download en

Fetch data and pre-process. This will create pre-processed files in data folder. (In case there is an issue handling large files, processed input word embeddings i_id_embedding_glove.p are also provided)
```
bash fetch_data.sh  
bash preprocess.sh data
```
- To train ConvE embeddings, change directory to the conve folder and refer to the README in that folder. Generate embeddings for the WSD task:
```
python generate_output_embeddings.py ./conve/saved_embeddings/embeddings.npz data conve_embeddings  
```
- Alternatively, to use pre-trained embeddings, copy the pre-trained conve embeddings (o_id_embedding_conve_embeddings.npz) to the data folder.

Train a WSD model. This saves the model with best dev set score at ./saved_models/model.pt.

CUDA_VISIBLE_DEVICES=0 python wsd_main.py --cuda --dropout 0.5 --epochs 200 --input_directory ./data --scorer ./ --output_embedding customnpz-o_id_embedding_conve_embeddings.npz --train semcor --val semeval2007 --lr 0.0001 --predict_on_unseen --save ./saved_models/model.pt

Test a WSD model (the model is assumed to saved at ./saved_models/model.pt.

CUDA_VISIBLE_DEVICES=0 python wsd_main.py --cuda --dropout 0.5 --epochs 0 --input_directory ./data --scorer ./ --output_embedding customnpz-o_id_embedding_conve_embeddings.npz --train semcor --val semeval2007 --lr 0.0001 --predict_on_unseen --evaluate --pretrained ./saved_models/model.pt

Pre-trained embeddings and models

All files are shared at https://drive.google.com/drive/folders/1NSrOx4ZY9Zx957RANFO90RX9daqIDElR Uncompress model files using gunzip before using. A & B would suffice if only training/evaluating a WSD model.

A. Pre-trained conve embeddings: o_id_embedding_conve_embeddings.npz

B. Pre-trained model: model.pt.gz (F1 score on ALL dataset: 72.1)

C. Pre-trained ConvE model: WN18RR_conve_0.2_0.3__defn.model.gz

D. Processed input word embeddings: i_id_embedding_glove.p (Needed only if there are issues handling large files during preprocessing)

An earlier version contained some code for weighted cross entropy loss (now enabled only by the --weighted_loss flag). The scheme wasn't really helpful and is not recommended. However, a pre-trained model for the same is shared: model_weighted.pt.gz (F1 score on ALL dataset: 72.1)

Citation

If you use this code, please consider citing:

[1] Kumar, Sawan, Sharmistha Jat, Karan Saxena, and Partha Talukdar. "Zero-shot word sense disambiguation using sense definition embeddings." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5670-5681. 2019.

References

[2] Alessandro Raganato, Jose Camacho-Collados, and Roberto Navigli. 2017. Word sense disambiguation: A unified evaluation framework and empirical comparison. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 99–110, Valencia, Spain. Association for Computational Linguistics.

Contact

For any clarification, comments, or suggestions please create an issue or contact sawankumar@iisc.ac.in

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
conve		conve
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
batcher.py		batcher.py
create_candidate_dictionary.py		create_candidate_dictionary.py
create_glove_dict.py		create_glove_dict.py
fetch_data.sh		fetch_data.sh
generate_output_embeddings.py		generate_output_embeddings.py
index_file_creator.py		index_file_creator.py
initialize_word_embeddings.py		initialize_word_embeddings.py
mfs.py		mfs.py
preprocess.sh		preprocess.sh
readxml.py		readxml.py
requirements.txt		requirements.txt
wsd_main.py		wsd_main.py
wsd_model.py		wsd_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EWISE

Dependencies

Running the code

Pre-trained embeddings and models

Citation

References

Contact

About

Releases

Packages

Contributors 2

Languages

License

malllabiisc/EWISE

Folders and files

Latest commit

History

Repository files navigation

EWISE

Dependencies

Running the code

Pre-trained embeddings and models

Citation

References

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages