Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update the pdf file path in the operation demonstration #13575

Merged
merged 1 commit into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/ppstructure/model_train/recovery_to_doc.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,14 +86,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash linenums="1"
# install paddleocr
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

Command line:

```bash linenums="1"
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
Expand Down
6 changes: 3 additions & 3 deletions docs/ppstructure/model_train/recovery_to_doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,14 +84,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash linenums="1"
# 安装 paddleocr,推荐使用2.6版本
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过命令行的方式:

```bash linenums="1"
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
Expand All @@ -117,7 +117,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```

### 4.1 下载模型
Expand Down
6 changes: 3 additions & 3 deletions docs/ppstructure/quick_start.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Two layout recovery methods are provided, For detailed usage tutorials, please r
Recovery by using PDF parse (only support pdf as input):

```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

Recovery by using OCR:
Expand Down Expand Up @@ -171,7 +171,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
Expand All @@ -193,7 +193,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'

fitz = try_import("fitz")
imgs = []
Expand Down
10 changes: 5 additions & 5 deletions docs/ppstructure/quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
通过PDF解析(只支持pdf格式的输入):

```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过OCR技术:
Expand All @@ -89,7 +89,7 @@ paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --rec
通过PDF解析(只支持pdf格式的输入):

```bash linenums="1"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过OCR技术:
Expand All @@ -100,7 +100,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```

### 2.2 Python脚本使用
Expand Down Expand Up @@ -189,7 +189,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
Expand All @@ -211,7 +211,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'

fitz = try_import("fitz")
imgs = []
Expand Down
10 changes: 5 additions & 5 deletions ppstructure/docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ paddleocr --image_dir=ppstructure/docs/table/table.jpg --type=structure --layout
通过PDF解析(只支持pdf格式的输入):

```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过OCR技术:
Expand All @@ -112,7 +112,7 @@ paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --rec
通过PDF解析(只支持pdf格式的输入):

```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过OCR技术:
Expand All @@ -123,7 +123,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```

<a name="22"></a>
Expand Down Expand Up @@ -217,7 +217,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
Expand All @@ -239,7 +239,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'

fitz = try_import("fitz")
imgs = []
Expand Down
6 changes: 3 additions & 3 deletions ppstructure/docs/quickstart_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ Two layout recovery methods are provided, For detailed usage tutorials, please r
Recovery by using PDF parse (only support pdf as input):

```bash
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

Recovery by using OCR:
Expand Down Expand Up @@ -200,7 +200,7 @@ from paddleocr import PPStructure,save_structure_res
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'
result = ocr_engine(img_path)
for index, res in enumerate(result):
save_structure_res(res, save_folder, os.path.basename(img_path).split('.')[0], index)
Expand All @@ -222,7 +222,7 @@ from PIL import Image
ocr_engine = PPStructure(table=False, ocr=True, show_log=True)

save_folder = './output'
img_path = 'ppstructure/recovery/UnrealText.pdf'
img_path = 'ppstructure/docs/recovery/UnrealText.pdf'

fitz = try_import("fitz")
imgs = []
Expand Down
4 changes: 2 additions & 2 deletions ppstructure/recovery/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,14 +110,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash
# install paddleocr
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

Command line:

```bash
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
Expand Down
6 changes: 3 additions & 3 deletions ppstructure/recovery/README_ch.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,14 +106,14 @@ pip3 install pdf2docx-0.0.0-py3-none-any.whl
```bash
# 安装 paddleocr,推荐使用2.6版本
pip3 install "paddleocr>=2.6"
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --use_pdf2docx_api=true
```

通过命令行的方式:

```bash
python3 predict_system.py \
--image_dir=ppstructure/recovery/UnrealText.pdf \
--image_dir=ppstructure/docs/recovery/UnrealText.pdf \
--recovery=True \
--use_pdf2docx_api=True \
--output=../output/
Expand Down Expand Up @@ -142,7 +142,7 @@ paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=t
# 英文测试图
paddleocr --image_dir=ppstructure/docs/table/1.png --type=structure --recovery=true --lang='en'
# pdf测试文件
paddleocr --image_dir=ppstructure/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
paddleocr --image_dir=ppstructure/docs/recovery/UnrealText.pdf --type=structure --recovery=true --lang='en'
```

<a name="4.1"></a>
Expand Down