Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove some of the less common dependencies #13461

Merged
merged 2 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/doc_ch/algorithm_rec_latex_ocr.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。

此外,需要安装额外的依赖:
```shell
pip install "tokenizers==0.19.1" "imagesize"
```

<a name="3"></a>
## 3. 模型训练、评估、预测

Expand Down
4 changes: 4 additions & 0 deletions doc/doc_en/algorithm_rec_latex_ocr_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ Using LaTeX-OCR printed mathematical expression recognition datasets for trainin
## 2. Environment
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.

Furthermore, additional dependencies need to be installed:
```shell
pip install "tokenizers==0.19.1" "imagesize"
```

<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
Expand Down
3 changes: 2 additions & 1 deletion ppocr/data/imaug/label_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
import random
from random import sample
from collections import defaultdict
from tokenizers import Tokenizer as TokenizerFast

from ppocr.utils.logging import get_logger
from ppocr.data.imaug.vqa.augment import order_by_tbyx
Expand Down Expand Up @@ -1780,6 +1779,8 @@ def __init__(
rec_char_dict_path,
**kwargs,
):
from tokenizers import Tokenizer as TokenizerFast

self.tokenizer = TokenizerFast.from_file(rec_char_dict_path)
self.model_input_names = ["input_ids", "token_type_ids", "attention_mask"]
self.pad_token_id = 0
Expand Down
3 changes: 2 additions & 1 deletion ppocr/postprocess/rec_postprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
import numpy as np
import paddle
from paddle.nn import functional as F
from tokenizers import Tokenizer as TokenizerFast
import re


Expand Down Expand Up @@ -1217,6 +1216,8 @@ class LaTeXOCRDecode(object):
"""Convert between latex-symbol and symbol-index"""

def __init__(self, rec_char_dict_path, **kwargs):
from tokenizers import Tokenizer as TokenizerFast

super(LaTeXOCRDecode, self).__init__()
self.tokenizer = TokenizerFast.from_file(rec_char_dict_path)

Expand Down
4 changes: 2 additions & 2 deletions ppocr/utils/formula_utils/math_txt2pkl.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@
import pickle
from tqdm import tqdm
import os
import cv2
import imagesize
from paddle.utils import try_import
from collections import defaultdict
import glob
from os.path import join
import argparse


def txt2pickle(images, equations, save_dir):
imagesize = try_import("imagesize")
save_p = os.path.join(save_dir, "latexocr_{}.pkl".format(images.split("/")[-1]))
min_dimensions = (32, 32)
max_dimensions = (672, 192)
Expand Down
2 changes: 0 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,3 @@ Pillow
pyyaml
requests
albumentations==1.4.10
tokenizers==0.19.1
imagesize