BioNLP-2016

Here are the scripts, code and vectors for the ACL BioNLP 2016 workshop paper:

Chiu et al. How to Train good Word Embeddings for Biomedical NLP

API Package

word2vec: original word2vec from Mikolov: https://code.google.com/archive/p/word2vec/
wvlib: lib to read word2vec file: https://github.com/spyysalo/wvlib
geniass: lib to segment bioMedical text: http://www.nactem.ac.uk/y-matsu/geniass/

Scripts

pre-process.sh: segment and tokenized input text (e.g. raw PubMed or PMC text)
create_shf_low_text.sh: create lowercased and sentence-shuffled text (input: tokenized text)
createModel.sh: Create word2vec.bin file with different parameters
intrinsicEva.sh: run intrinsic evaluation on UMNSRS and Mayo data-set (input: Dir. for testing vector)
ExtrinsicEva.sh: run extrinsic evaluation

Code

Pre-processing:
tokenize_text.py: tokenized text (requires NLTK)
geniass: segment sentence

Intrinsic evaluation:
evaluate.py: perform intrinisic evaluation

Extrinsic evaluation: (Keras folder: Need either tensorflow or theano installed):
mlp.py: simple feed-forward Neural Network
setting.py: parameters for the Neual Network

Word vectors

https://drive.google.com/open?id=0BzMCqpcgEJgiUWs0ZnU0NlFTam8

License

All data on this page is made available under the Creative Commons Attribution (CC BY) license

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioNLP-2016

API Package

Scripts

Code

Word vectors

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
geniass		geniass
keras		keras
tools		tools
word2vec		word2vec
wvlib		wvlib
.gitattributes		.gitattributes
ExtrinsicEva.sh		ExtrinsicEva.sh
README.md		README.md
createModel.sh		createModel.sh
create_shf_low_text.sh		create_shf_low_text.sh
evaluate.py		evaluate.py
intrinsicEva.sh		intrinsicEva.sh
lower_shuffled_combine_tokenized.txt		lower_shuffled_combine_tokenized.txt
pre-process.sh		pre-process.sh
tokenize_Text.py		tokenize_Text.py

cambridgeltl/BioNLP-2016

Folders and files

Latest commit

History

Repository files navigation

BioNLP-2016

API Package

Scripts

Code

Word vectors

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages