pdfparser

pdf 文字内容提取服务，不使用 GPU && 基于 YOLO && PaddleOCR。
支持容器部署 docker-compose。

本服务提供基本的 pdf/picture 内容识别，识别内容以数组形式返回。

TODO：
1. 表格返回格式凌乱，需要调整排版
2. 排版恢复，转为 docx 返回文件流
3. pdf/picture 转 html

0x01. QuickStart

docker-compose build

docker-compose up -d

# 使用的是 fastapi, swagger 地址为
http://127.0.0.1:5555/docs

0x02. Debug

python main.py

cd test && python test.py

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
config		config
docker		docker
model		model
test		test
.gitignore		.gitignore
README.md		README.md
config.py		config.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdfparser

0x01. QuickStart

0x02. Debug

About

Releases

Packages

Languages

LydiaCai1203/pdfparser

Folders and files

Latest commit

History

Repository files navigation

pdfparser

0x01. QuickStart

0x02. Debug

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages