Official code repository: Word Sense Disambiguation of French Lexicographical Examples Using Lexical Networks
/data/
: contains instructions to organize the data/
folder
/scripts/
: contains individual script modules
-
Clone the repo :
git clone https://github.com/ATILF-UMR7118/GraphWSD.git
-
Create the virtualenv
python3 -m venv wsdvenv
. wsdvenv/bin/activate
pip3 install --upgrade pip
cd GraphWSD/
pip3 install -r requirements.txt
-
Follow the instructions provided in
data/
folder -
To run the models for NOUN/VERB wsd:
(a.1) STRUCT model
python3 ~/GraphWSD/scripts/wsd_ewiser.py \
--data ~/GraphWSD/data/ortolang/nountmp/ \
--save_dir ~/GraphWSD/scripts/ortolog/ \
--num 100 --model_num onoun_ewiser_29061156 --mtype ewiser --save-model \
--learning 0.001 --hidden 8000 --batch 64 --device cuda --embed 768 --lm camembert-base
(a.2) STRUCT* model
python3 ~/GraphWSD/scripts/wsd_ewiser.py \
--data ~/GraphWSD/data/ortolang/nountmp/ \
--save_dir ~/GraphWSD/scripts/ortolog/ \
--num 100 --model_num onoun_ewiser_29061156 --mtype ewiser --save-model \
--learning 0.001 --hidden 8000 --batch 64 --device cuda --embed 768 --lm camembert-base --trainable
(a.3) STRUCT** model
python3 ~/GraphWSD/scripts/wsd_ewiser.py \
--data ~/GraphWSD/data/ortolang/nountmp/ \
--save_dir ~/GraphWSD/scripts/ortolog/ \
--num 100 --model_num onoun_ewiser_29061156 --mtype ewiser --save-model \
--learning 0.001 --hidden 8000 --batch 64 --device cuda --embed 768 --lm camembert-base --fragment --trainable
(b.1) SEM model
python3 ~/GraphWSD/scripts/wsd_ewiser.py \
--data ~/GraphWSD/data/ortolang/nountmp/ --num 100 \
--save_dir ~/GraphWSD/scripts/ortolog/ --model_num onoun_seml_29061522 \
--mtype ewiserc --save-model --batch 64 --device cuda --semantics\
--hidden-dim 8000 --embed 768 --lm camembert-base
(b.2) SEM* model
python3 ~/GraphWSD/scripts/wsd_ewiser.py \
--data ~/GraphWSD/data/ortolang/nountmp/ --num 100 \
--save_dir ~/GraphWSD/scripts/ortolog/ --model_num onoun_seml_29061522 \
--mtype ewiserc --save-model --batch 64 --device cuda --semantics\
--hidden-dim 8000 --embed 768 --lm camembert-base --trainable
(b.3) SEM** model
python3 ~/GraphWSD/scripts/wsd_ewiser.py \
--data ~/GraphWSD/data/ortolang/nountmp/ --num 100 \
--save_dir ~/GraphWSD/scripts/ortolog/ --model_num onoun_seml_29061522 \
--mtype ewiserc --save-model --batch 64 --device cuda --semantics\
--hidden-dim 8000 --embed 768 --lm camembert-base --fragment --trainable
STRUCT and SEM are two strategies to intialize
STRUCT : count number of edges
SEM : count weight (strength) of edges. SEM model requires --semantics
config | interpretation | command |
---|---|---|
STRUCT/SEM | A is frozen | |
STRUCT/SEM * |
|
--trainable |
STRUCT/SEM ** |
|
--trainable --fragment |
For any questions related to repository contact: asinha@atilf.fr
@inproceedings{sinha2022word,
title={Word sense disambiguation of french lexicographical examples using lexical networks},
author={Sinha, Aman and Ollinger, Sandrine and Constant, Mathieu},
booktitle={TextGraphs-16: Graph-based Methods for Natural Language Processing},
pages={70--76},
year={2022}
}