Skip to content

Latest commit

 

History

History
45 lines (31 loc) · 4.11 KB

README.md

File metadata and controls

45 lines (31 loc) · 4.11 KB

private_nlp

Natural Language Processing using private and secure data. Powered by OpenMined's tools PySyft and SyferText.

Blog post

The contents of this repo were featured in the Encrypted training on medical text data using SyferText and PyTorch blog post at OpenMined's blog

Disclaimer

This is an ongoing work in progress. Be prepared to tackle coding errors and/or typos.

Getting Started

Follow the instructions to install:

  • PySyft==0.2.5. There is an incompatibility issue with Tensorflow on version 0.2.6
  • SyferText

Data

Dataset compiled for Natural Language Processing using a corpus of medical transcriptions and custom-generated clinical stop words and vocabulary.

  • X.csv. Fully processed dataset obtained from running the Data Modelling notebook.
  • classes.txt. Text file describing the dataset's classes: Surgery, Medical Records, Internal Medicine and Other
  • train.csv. Training data subset. Contains 90% of the X.csv processed file.
  • test.csv. Test data subset. Contains 10% of the X.csv processed file.

Authors and acknowledgment

Notebooks

Scripts

Holds the script used to download whole datasets using url

Contributing

Issues and Pull requests welcomed

License

GNU GENERAL PUBLIC LICENSE VERSION 3