This repository contains word embeddings trained on medical subreddits. We provide embeddings for GloVe (Pennington et al., 2014), ELMo (Peters et al., 2018), and Flair (Akbik et al., 2018).
The embeddings are trained on ~800,000 Reddit posts from over 60 medical-themed communities. We describe the training and evaluation process of the embeddings in Basaldella and Collier, BioReddit: Word Embeddings for User-Generated Biomedical NLP, presented at the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), co-located with the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP 2019).
You can download the embeddings in the release section of this repository or using the links in the table below:
Embedding | Download Link |
---|---|
ELMo | options, weights |
Flair | forward, backward |
GloVe 50 | txt, bin |
Glove 100 | txt, bin |
Glove 200 | txt, bin |
FastText | See COMETA |
BERT | See COMETA |
You can find the code used to download the subreddits here.