Code for IJCAI 2022 paper: Enhancing Entity Representations with Prompt Learning for Biomedical Entity Linking.
We propose a two-stage entity linking algorithm to enhance the entity representations based on prompt learning. The first stage includes a coarser-grained retrieval from a representation space defined by a bi encoder that independently embeds the mentions and entities’ surface forms. Unlike previous one-model-fits-all systems, each candidate is then re-ranked with a finer-grained encoder based on prompt-tuning that concatenates the mention context and entity information. Extensive experiments show that our model achieves promising performance improvements compared with several state of-the-art techniques on the largest biomedical public dataset MedMentions and the NCBI disease corpus. We also observe by cases that the proposed prompt-tuning strategy is effective in solving both the variety and ambiguity challenges in the linking task.
python: 3.8
PyTorch: 1.9.0
transformers: 4.10.0
openprompt: 0.1.1
you can go here to know more about OpenPrompt.
1.Download the pytorch based pubmed bert pretrained model from here, and put it to the folder "pretrain".
2.Generate training sample and test sample data according to the file data_process/data_process.py.Provide entity dictionary file, entity type information file, and corresponding sample files of mention and gold entity according to the code description to generate corresponding training samples.
3.Run prompt_ranking/prompt_medicine_train.py to train the model.
4.Run prompt_ranking/prompt_medicine_predict.py to predict the result.
5.Run prompt_retrieval/prompt_entity_vector.py to generate mention and entity vector with prompt model.
6.Run prompt_retrieval/vector_search.py to serach top N candidates with prompt model.