Skip to content

AVoss84/pdf_extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text classification based on PDF input data

Package structure

.
├── environment.yml
├── logs
├── main.py
├── README.md
├── requirements.txt
├── src
│   ├── __init__.py
│   ├── notebooks
│   │   ├── fasttext_classifier.ipynb
│   │   └── naivebayes_classifier.ipynb
│   ├── pdf_extract
│   │   ├── config
│   │   ├── data
│   │   ├── resources
│   │   ├── services
│   │   └── utils
│   ├── setup.py
│   └── templates
└── stream_app.py

Package installation

Create conda virtual environment with required packages

conda env create -f environment.yml 
conda activate env_pdf

Install your package

python -m spacy download en_core_web_lg
python -m spacy download de_core_news_lg      # install large word embeddings
pip install -e src

Start REST API locally:

uvicorn main:app --reload --port 5000         # checkout Swagger docs: http://127.0.0.1:5000/docs 

Start streamlit app locally:

streamlit run stream_app.py     

About

Text classification based on PDF inputs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published