Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove some of the less common dependencies #13461

Merged
merged 2 commits into from
Jul 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/doc_ch/algorithm_rec_latex_ocr.md
Original file line number Diff line number Diff line change
@@ -33,6 +33,11 @@
## 2. 环境配置
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。

此外,需要安装额外的依赖:
```shell
pip install "tokenizers==0.19.1" "imagesize"
```

<a name="3"></a>
## 3. 模型训练、评估、预测

4 changes: 4 additions & 0 deletions doc/doc_en/algorithm_rec_latex_ocr_en.md
Original file line number Diff line number Diff line change
@@ -31,6 +31,10 @@ Using LaTeX-OCR printed mathematical expression recognition datasets for trainin
## 2. Environment
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.

Furthermore, additional dependencies need to be installed:
```shell
pip install "tokenizers==0.19.1" "imagesize"
```

<a name="3"></a>
## 3. Model Training / Evaluation / Prediction
3 changes: 2 additions & 1 deletion ppocr/data/imaug/label_ops.py
Original file line number Diff line number Diff line change
@@ -26,7 +26,6 @@
import random
from random import sample
from collections import defaultdict
from tokenizers import Tokenizer as TokenizerFast

from ppocr.utils.logging import get_logger
from ppocr.data.imaug.vqa.augment import order_by_tbyx
@@ -1780,6 +1779,8 @@ def __init__(
rec_char_dict_path,
**kwargs,
):
from tokenizers import Tokenizer as TokenizerFast

self.tokenizer = TokenizerFast.from_file(rec_char_dict_path)
self.model_input_names = ["input_ids", "token_type_ids", "attention_mask"]
self.pad_token_id = 0
3 changes: 2 additions & 1 deletion ppocr/postprocess/rec_postprocess.py
Original file line number Diff line number Diff line change
@@ -15,7 +15,6 @@
import numpy as np
import paddle
from paddle.nn import functional as F
from tokenizers import Tokenizer as TokenizerFast
import re


@@ -1217,6 +1216,8 @@ class LaTeXOCRDecode(object):
"""Convert between latex-symbol and symbol-index"""

def __init__(self, rec_char_dict_path, **kwargs):
from tokenizers import Tokenizer as TokenizerFast

super(LaTeXOCRDecode, self).__init__()
self.tokenizer = TokenizerFast.from_file(rec_char_dict_path)

4 changes: 2 additions & 2 deletions ppocr/utils/formula_utils/math_txt2pkl.py
Original file line number Diff line number Diff line change
@@ -15,15 +15,15 @@
import pickle
from tqdm import tqdm
import os
import cv2
import imagesize
from paddle.utils import try_import
from collections import defaultdict
import glob
from os.path import join
import argparse


def txt2pickle(images, equations, save_dir):
imagesize = try_import("imagesize")
save_p = os.path.join(save_dir, "latexocr_{}.pkl".format(images.split("/")[-1]))
min_dimensions = (32, 32)
max_dimensions = (672, 192)
2 changes: 0 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -13,5 +13,3 @@ Pillow
pyyaml
requests
albumentations==1.4.10
tokenizers==0.19.1
imagesize