This repository contains the resources for extracting concepts and their context using MedSpaCy.
For more information on MedSpaCy and installation instructions, please visit the MedSpaCy Github page.
To extract concepts from Dutch biomedical or clinical text, a reference dataset is required containing all the concepts and their terms that need to be extracted. Please note that we cannot directly provide the reference dataset as it includes the UMLS vocabularies and the Dutch SNOMED CT vocabulary.
Before downloading the UMLS, you will need to obtain a license from the National Library of Medicine. Similarly, for access to the Dutch SNOMED CT vocabulary, you will need to obtain a license from NICTIZ and follow their instructions.
The "QuickUMLS_resources" folder contains a jupyter notebook, that takes the UMLS and Dutch SNOMED CT files as input to build a concept reference database.
The Dutch rules for detecting context information about a concept are listed in the "Concept_resources" folder.
The pipeline can be set up in a similar manner to the English pipeline, but with links to the language-specific resources. You can find an example notebook here.