-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
印章文字识别模型训练,使用多点标注的标签 #2718
Comments
那可能:
不过对于印章文本识别,并不推荐这样做,还是建议用弯曲文本检测模型,配合文本矫正,再做识别。您是有什么特殊需求吗,必须用常规文本检测? |
1、也就是说对于弯曲文本,训练时正常不需要对裁剪出的弯曲文字图片进行处理使它变平整的一行是不是? |
|
你好,可能理解错了。 python main.py -c paddlex/configs/text_recognition/PP-OCRv4_server_rec.yaml \
-o Global.mode=train \
-o Global.dataset_dir=./sealdata 2、产线的结果还可以,不过有的字识别错了,所以我才打算训练里面的印章文字识别模型部分 |
|
Checklist:
描述问题
https://paddlepaddle.github.io/PaddleOCR/latest/applications/%E5%8D%B0%E7%AB%A0%E5%BC%AF%E6%9B%B2%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB.html#522
按照上述链接描述进行了数据标注和处理,处理后裁剪出的印章文字和标签如上图所示,利用这个数据和标签训练PPOCRv4 server rec识别模型,训练后用这个数据集的图片进行测试,结果变得更差,印章文字识别部分该如何训练?
复现
训练命令
自己标注的印章识别的数据集。
原本能识别出结果:
训练后的不能识别结果:
环境
Package Version Editable project location
aiofiles 24.1.0
aiohappyeyeballs 2.4.4
aiohttp 3.11.11
aiosignal 1.3.2
aistudio-sdk 0.2.6
albucore 0.0.13
albumentations 1.4.10
alembic 1.14.0
annotated-types 0.7.0
anyio 4.3.0
astor 0.8.1
asttokens 3.0.0
async-timeout 4.0.3
asyncio-atexit 1.0.1
attrdict3 2.0.2
attrs 24.3.0
babel 2.16.0
backoff 2.2.1
bce-python-sdk 0.9.25
beautifulsoup4 4.12.3
blinker 1.9.0
cachetools 5.5.0
certifi 2019.11.28
cffi 1.17.1
chardet 3.0.4
charset-normalizer 3.4.0
chinese-calendar 1.8.0
click 8.1.8
cloudpickle 3.1.0
colorama 0.4.6
colorlog 6.9.0
ConfigSpace 1.2.1
cryptography 44.0.0
cssselect 1.2.0
cssutils 2.11.1
cycler 0.12.1
Cython 3.0.11
dataclasses-json 0.6.7
datasets 3.2.0
dbus-python 1.2.16
decorator 5.1.1
dill 0.3.4
distro-info 0.23+ubuntu1.1
easydict 1.13
editdistance 0.8.1
einops 0.8.0
emoji 2.14.0
erniebot 0.5.0
erniebot_agent 0.5.0
et_xmlfile 2.0.0
eval_type_backport 0.2.2
exceptiongroup 1.2.1
executing 2.1.0
faiss-cpu 1.8.0.post1
fastapi 0.115.6
filelock 3.16.1
filetype 1.2.0
fire 0.7.0
FLAML 2.3.3
Flask 3.1.0
flask-babel 4.0.0
fonttools 4.55.3
frozenlist 1.5.0
fsspec 2024.9.0
future 1.0.0
gast 0.3.3
GPUtil 1.4.0
greenlet 3.1.1
grpcio 1.51.3
h11 0.14.0
hpbandster 0.7.4
html5lib 1.1
httpcore 1.0.5
httpx 0.27.0
huggingface-hub 0.27.0
idna 2.8
imageio 2.36.1
imagesize 1.4.1
imgaug 0.4.0
ipython 8.31.0
itsdangerous 2.2.0
jedi 0.19.2
jieba 0.42.1
Jinja2 3.1.5
joblib 1.4.2
jsonpatch 1.33
jsonpath-python 1.0.6
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-path 0.3.3
jsonschema-specifications 2023.12.1
kiwisolver 1.4.7
langchain 0.1.5
langchain-community 0.0.17
langchain-core 0.1.23
langdetect 1.0.9
langsmith 0.0.87
lapx 0.5.11.post1
lazy_loader 0.4
lazy-object-proxy 1.10.0
llvmlite 0.43.0
lmdb 1.5.1
lxml 5.3.0
Mako 1.3.8
markdown-it-py 3.0.0
MarkupSafe 3.0.2
marshmallow 3.23.2
matplotlib 3.5.2
matplotlib-inline 0.1.7
mdurl 0.1.2
more-itertools 10.5.0
motmetrics 1.4.0
msgpack 1.1.0
multidict 6.1.0
multiprocess 0.70.12.2
mypy-extensions 1.0.0
nest-asyncio 1.6.0
netifaces 0.11.0
networkx 3.3
nltk 3.9.1
numba 0.60.0
numpy 1.24.4
olefile 0.47
onnx 1.17.0
openapi-schema-validator 0.6.2
openapi-spec-validator 0.7.1
opencv-contrib-python 4.10.0.84
opencv-python 4.5.5.64
opencv-python-headless 4.10.0.84
openpyxl 3.1.5
opt-einsum 3.3.0
optuna 4.1.0
packaging 23.2
paddle2onnx 1.3.1
paddleclas 2.6.0
paddledet 0.0.0
paddlefsl 1.1.0
paddlenlp 2.8.0.post0
paddleocr 0.1.0.dev1+g3f32858
paddlepaddle-gpu 3.0.0b2
paddleseg 0.0.0.dev0
paddlets 1.1.0
paddlex 3.0.0b2 /home/work/PaddleX
pandas 1.3.5
Parsley 1.3
parso 0.8.4
pathable 0.4.3
patsy 1.0.1
pexpect 4.9.0
pillow 10.3.0
pip 24.3.1
premailer 3.10.0
prettytable 3.12.0
prompt_toolkit 3.0.48
propcache 0.2.1
protobuf 5.26.1
psutil 5.9.8
ptyprocess 0.7.0
pure_eval 0.2.3
pyarrow 18.1.0
pybind11 2.13.6
pybind11-stubgen 2.5.1
pyclipper 1.3.0.post6
pycocotools 2.0.8
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.9.2
pydantic_core 2.23.4
Pygments 2.18.0
PyGObject 3.36.0
PyMatting 1.1.13
PyMuPDF 1.25.1
pyod 2.0.3
pypandoc 1.14
pyparsing 3.2.0
pypdf 5.1.0
Pyro4 4.82
python-apt 2.0.1+ubuntu0.20.4.1
python-dateutil 2.9.0.post0
python-docx 1.1.2
python-iso639 2024.10.22
python-magic 0.4.27
python-oxmsg 0.0.1
pytz 2024.2
PyWavelets 1.3.0
PyYAML 5.3.1
qianfan 0.0.3
RapidFuzz 3.11.0
rarfile 4.2
ray 2.40.0
referencing 0.35.1
regex 2024.11.6
requests 2.32.3
requests-toolbelt 1.0.0
requests-unixsocket 0.2.0
rfc3339-validator 0.1.4
rich 13.9.4
rpds-py 0.22.3
ruamel.yaml 0.18.6
ruamel.yaml.clib 0.2.12
safetensors 0.4.5
scikit-image 0.25.0
scikit-learn 1.3.2
scipy 1.14.1
seaborn 0.13.2
sentencepiece 0.2.0
seqeval 1.2.2
serpent 1.41
setuptools 68.2.2
shap 0.46.0
shapely 2.0.6
shellingham 1.5.4
six 1.14.0
sklearn 0.0
slicer 0.0.8
sniffio 1.3.1
soupsieve 2.6
SQLAlchemy 2.0.36
stack-data 0.6.3
starlette 0.41.3
statsmodels 0.14.1
tenacity 8.5.0
tensorboardX 2.6.2.2
termcolor 2.5.0
terminaltables 3.1.10
threadpoolctl 3.5.0
tifffile 2024.12.12
tokenizers 0.19.1
tomark 0.1.4
tomli 2.2.1
tool_helpers 0.1.2
tqdm 4.67.1
traitlets 5.14.3
typeguard 4.4.1
typer 0.15.1
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2024.2
ujson 5.10.0
unattended-upgrades 0.1
unstructured 0.16.11
unstructured-client 0.28.1
urllib3 1.25.8
uvicorn 0.34.0
visualdl 2.5.3
wcwidth 0.2.13
webencodings 0.5.1
Werkzeug 3.1.3
wget 3.2
wrapt 1.17.0
xmltodict 0.14.2
xxhash 3.5.0
yacs 0.1.8
yarl 1.18.3
docker+Ubuntu(ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0b2-gpu-cuda12.3-cudnn9.0-trt8.6)
python:3.10.14
The text was updated successfully, but these errors were encountered: