Releases: open-mmlab/mmocr
MMOCR Release v1.0.0rc0
We are excited to announce the release of MMOCR 1.0.0rc0!
MMOCR 1.0.0rc0 is the first version of MMOCR 1.x, a part of the OpenMMLab 2.0 projects.
Built upon the new training engine,
MMOCR 1.x unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed.
Highlights
-
New engines. MMOCR 1.x is based on MMEngine, which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.
-
Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMOCR 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
-
Cross project calling. Benefiting from the unified design, you can use the models implemented in other OpenMMLab projects, such as MMDet. We provide an example of how to use MMDetection's Mask R-CNN through
MMDetWrapper
. Check our documents for more details. More wrappers will be released in the future. -
Stronger visualization. We provide a series of useful tools which are mostly based on brand-new visualizers. As a result, it is more convenient for the users to explore the models and datasets now.
-
More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.
Breaking Changes
We briefly list the major breaking changes here.
We also have the migration guide that provides complete details and migration instructions.
Dependencies
- MMOCR 1.x relies on MMEngine to run. MMEngine is a new foundational library for training deep learning models in OpenMMLab 2.0 models. The dependencies of file IO and training are migrated from MMCV 1.x to MMEngine.
- MMOCR 1.x relies on MMCV>=2.0.0rc0. Although MMCV no longer maintains the training functionalities since 2.0.0rc0, MMOCR 1.x relies on the data transforms, CUDA operators, and image processing interfaces in MMCV. Note that the package
mmcv
is the version that provide pre-built CUDA operators andmmcv-lite
does not since MMCV 2.0.0rc0, whilemmcv-full
has been deprecated.
Training and testing
- MMOCR 1.x uses Runner in MMEngine rather than that in MMCV. The new Runner implements and unifies the building logic of dataset, model, evaluation, and visualizer. Therefore, MMOCR 1.x no longer maintains the building logics of those modules in
mmocr.train.apis
andtools/train.py
. Those code have been migrated into MMEngine. Please refer to the migration guide of Runner in MMEngine for more details. - The Runner in MMEngine also supports testing and validation. The testing scripts are also simplified, which has similar logic as that in training scripts to build the runner.
- The execution points of hooks in the new Runner have been enriched to allow more flexible customization. Please refer to the migration guide of Hook in MMEngine for more details.
- Learning rate and momentum schedules has been migrated from
Hook
toParameter Scheduler
in MMEngine. Please refer to the migration guide of Parameter Scheduler in MMEngine for more details.
Configs
- The Runner in MMEngine uses a different config structures to ease the understanding of the components in runner. Users can read the config example of MMOCR or refer to the migration guide in MMEngine for migration details.
- The file names of configs and models are also refactored to follow the new rules unified across OpenMMLab 2.0 projects. Please refer to the user guides of config for more details.
Dataset
The Dataset classes implemented in MMOCR 1.x all inherits from the BaseDetDataset
, which inherits from the BaseDataset in MMEngine. There are several changes of Dataset in MMOCR 1.x.
- All the datasets support serializing the data list to reduce the memory when multiple workers are built to accelerate data loading.
- The interfaces are changed accordingly.
Data Transforms
Data transforms in MMOCR 1.x all inherits from those in MMCV>=2.0.0rc0, which follows a new convention in OpenMMLab 2.0 projects.
The changes are listed below:
- The interfaces are also changed. Please refer to the API Reference
- The functionalities of some data transforms (e.g.,
Resize
) are decomposed into several transforms. - The same data transforms in different OpenMMLab 2.0 libraries have the same augmentation implementation and the logic of the same arguments, i.e.,
Resize
in MMDet 3.x and MMOCR 1.x will resize the image in the exact same manner given the same arguments.
Model
The models in MMOCR 1.x all inherit from BaseModel
in MMEngine, which defines a new convention of models in OpenMMLab 2.0 projects. Users can refer to the tutorial of model in MMEngine for more details. Accordingly, there are several changes as the following:
- The model interfaces, including the input and output formats, are significantly simplified and unified following the new convention in MMOCR 1.x. Specifically, all the input data in training and testing are packed into
inputs
anddata_samples
, whereinputs
contains model inputs like a list of image tensors, anddata_samples
contains other information of the current data sample such as ground truths and model predictions. In this way, different tasks in MMOCR 1.x can share the same input arguments, which makes the models more general and suitable for multi-task learning. - The model has a data preprocessor module, which is used to pre-process the input data of model. In MMOCR 1.x, the data preprocessor usually does the necessary steps to form the input images into a batch, such as padding. It can also serve as a place for some special data augmentations or more efficient data transformations like normalization.
- The internal logic of model has been changed. In MMOCR 0.x, model used
forward_train
andsimple_test
to deal with different model forward logics. In MMOCR 1.x and OpenMMLab 2.0, the forward function has three modes:loss
,predict
, andtensor
for training, inference, and tracing or other purposes, respectively. The forward function callsself.loss()
,self.predict()
, andself._forward()
given the modesloss
,predict
, andtensor
, respectively.
Evaluation
MMOCR 1.x mainly implements corresponding metrics for each task, which are manipulated by Evaluator to complete the evaluation.
In addition, users can build an evaluator in MMOCR 1.x to conduct offline evaluation, i.e., evaluate predictions that may not be produced by MMOCR, prediction follows our dataset conventions. More details can be find in the Evaluation Tutorial in MMEngine.
Visualization
The functions of visualization in MMOCR 1.x are removed. Instead, in OpenMMLab 2.0 projects, we use Visualizer to visualize data. MMOCR 1.x implements TextDetLocalVisualizer
, TextRecogLocalVisualizer
, and KIELocalVisualizer
to allow visualization of ground truths, model predictions, and feature maps, etc., at any place, for the three tasks supported in MMOCR. It also supports dumping the visualization data to any external visualization backends such as Tensorboard and Wandb. Check our Visualization Document for more details.
Improvements
- Most models enjoy a performance improvement from the new framework and refactor of data transforms. For example, in MMOCR 1.x, DBNet-R50 achieves 0.854 hmean score on ICDAR 2015, while the counterpart can only get 0.840 hmean score in MMOCR 0.x.
- Support mixed precision training of most of the models. However, the rest models are not supported yet because the operators they used might not be representable in fp16. We will update the documentation and list the results of mixed precision training.
Ongoing changes
- Test-time augmentation: which was supported in MMOCR 0.x, is not implemented yet in this version due to limited time slot. We will support it in the following releases with a new and simplified design.
- Inference interfaces: unified inference interfaces will be supported in the future to ease the use of released models.
- Interfaces of useful tools that can be used in notebook: more useful tools that are implemented in the
tools/
directory will have their python interfaces so that they can be used through notebook and in downstream libraries. - Documentation: we will add more design docs, tutorials, and migration gui...
MMOCR Release v0.6.1
Highlights
- ArT dataset is available for text detection and recognition!
- Fix several bugs that affects the correctness of the models.
- Thanks to MIM, our installation is much simpler now! The docs has been renewed as well.
New Features & Enhancements
- Add ArT by @xinke-wang in #1006
- add ABINet_Vision api by @Abdelrahman350 in #1041
- add codespell ignore and use mdformat by @Harold-lkk in #1022
- Add mim to extras_requrie to setup.py, update mminstall… by @gaotongxiao in #1062
- Simplify normalized edit distance calculation by @maxbachmann in #1060
- Test mim in CI by @gaotongxiao in #1090
- Remove redundant steps by @gaotongxiao in #1091
- Update links to SDMGR links by @gaotongxiao in #1252
Bug Fixes
- Remove unnecessary requirements by @gaotongxiao in #1000
- Remove confusing img_scales in pipelines by @gaotongxiao in #1007
- inplace operator "+=" will cause RuntimeError when model backward by @garvan2021 in #1018
- Fix a typo problem in MASTER by @Mountchicken in #1031
- Fix config name of MASTER in ocr.py by @Mountchicken in #1044
- Relax OpenCV requirement by @gaotongxiao in #1061
- Restrict the minimum version of OpenCV to avoid potential vulnerability by @gaotongxiao in #1065
- typo by @tpoisonooo in #1024
- Fix a typo in setup.py by @gaotongxiao in #1095
- fix #1067: add torchserve DockerFile and fix bugs by @Hegelim in #1073
- Incorrect filename in labelme_converter.py by @xiefeifeihu in #1103
- Fix dataset configs by @Mountchicken in #1106
- Fix #1098: normalize text recognition scores by @Hegelim in #1119
- Update ST_SA_MJ_train.py by @MingyuLau in #1117
- PSENet metafile by @gaotongxiao in #1121
- Flexible ways of getting file name by @balandongiv in #1107
- Updating edge-embeddings after each GNN layer by @amitbcp in #1134
- links update by @TekayaNidham in #1141
- bug fix: access params by cfg.get by @doem97 in #1145
- Fix a bug in LmdbAnnFileBackend that cause breaking in Synthtext detection training by @Mountchicken in #1159
- Fix typo of --lmdb-map-size default value by @easilylazy in #1147
- Fixed docstring syntax error of line 19 & 21 by @APX103 in #1157
- Update lmdb_converter and ct80 cropped image source in document by @doem97 in #1164
- MMCV compatibility due to outdated MMDet by @gaotongxiao in #1192
- Update maximum version of mmcv by @xinke-wang in #1219
- Update ABINet links for main by @Mountchicken in #1221
- Update owners by @gaotongxiao in #1248
- Add back some missing fields in configs by @gaotongxiao in #1171
Docs
- Fix typos by @xinke-wang in #1001
- Configure Myst-parser to parse anchor tag by @gaotongxiao in #1012
- Fix a error in docs/en/tutorials/dataset_types.md by @Mountchicken in #1034
- Update readme according to the guideline by @gaotongxiao in #1047
- Limit markdown version by @gaotongxiao in #1172
- Limit extension versions by @Mountchicken in #1210
- Update installation guide by @gaotongxiao in #1254
- Update image link @gaotongxiao in #1255
New Contributors
- @tpoisonooo made their first contribution in #1024
- @Abdelrahman350 made their first contribution in #1041
- @Hegelim made their first contribution in #1073
- @xiefeifeihu made their first contribution in #1103
- @MingyuLau made their first contribution in #1117
- @balandongiv made their first contribution in #1107
- @amitbcp made their first contribution in #1134
- @TekayaNidham made their first contribution in #1141
- @easilylazy made their first contribution in #1147
- @APX103 made their first contribution in #1157
Full Changelog: v0.6.0...v0.6.1
MMOCR Release v0.6.0
Highlights
- A new recognition algorithm MASTER has been added into MMOCR, which was the championship solution for the "ICDAR 2021 Competition on Scientific Table Image Recognition to Latex"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe
- DBNet++ has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.
- Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo (Det & Recog) to explore further information.
- To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the doc.
- Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. Doc
- Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. (Doc) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the Evaluation section for details.
- MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit Labelme to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read tutorial docs to get started.
Lmdb Dataset
Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline LoadImageFromLMDB.
This section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.
Specifications
To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:
- The parameter describing the data volume of the dataset is
num-samples
instead oftotal_number
(deprecated). - Images and labels are stored with keys in the form of
image-000000001
andlabel-000000001
, respectively.
Usage
- Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.
-
Previously, MMOCR had a function
txt2lmdb
(deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility lmdb_converter to convert recognition datasets with both images and labels to lmdb format. -
Say that your recognition data in MMOCR's format are organized as follows. (See an example in ocr_toy_dataset).
# Directory structure ├──img_path | |—— img1.jpg | |—— img2.jpg | |—— ... |——label.txt (or label.jsonl) # Annotation format label.txt: img1.jpg HELLO img2.jpg WORLD ... label.jsonl: {'filename':'img1.jpg', 'text':'HELLO'} {'filename':'img2.jpg', 'text':'WORLD'} ...
-
Then pack these files up:
python tools/data/utils/lmdb_converter.py {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}
-
Check out tools.md for more details.
- The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:
-
Set parser as
LineJsonParser
andfile_format
as 'lmdb' in dataset config# configs/_base_/recog_datasets/ST_MJ_train.py train1 = dict( type='OCRDataset', img_prefix=train_img_prefix1, ann_file=train_ann_file1, loader=dict( type='AnnFileLoader', repeat=1, file_format='lmdb', parser=dict( type='LineJsonParser', keys=['filename', 'text'], )), pipeline=None, test_mode=False)
-
Use
LoadImageFromLMDB
in pipeline:# configs/_base_/recog_pipelines/crnn_pipeline.py train_pipeline = [ dict(type='LoadImageFromLMDB', color_type='grayscale'), ...
- You are good to go! Start training and MMOCR will load data from your lmdb dataset.
New Features & Enhancements
- Add analyze_logs in tools and its description in docs by @Y-M-Y in #899
- Add LSVT Data Converter by @xinke-wang in #896
- Add RCTW dataset converter by @xinke-wang in #914
- Support computing mean scores in UniformConcatDataset by @gaotongxiao in #981
- Support loading images and labels from lmdb file by @Mountchicken in #982
- Add recog2lmdb and new toy dataset files by @Mountchicken in #979
- Add labelme converter for textdet and textrecog by @cuhk-hbsun in #972
- Update CircleCI configs by @xinke-wang in #918
- Update Git Action by @xinke-wang in #930
- More customizable fields in dataloaders by @gaotongxiao in #933
- Skip CIs when docs are modified by @gaotongxiao in #941
- Rename Github tests, fix ignored paths by @gaotongxiao in #946
- Support latest MMCV by @gaotongxiao in #959
- Support dynamic threshold range in eval_hmean by @gaotongxiao in #962
- Update the version requirement of mmdet in docker by @Mountchicken in #966
- Replace
opencv-python-headless
withopen-python
by @gaotongxiao in #970 - Update Dataset Configs by @xinke-wang in #980
- Add SynthText dataset config by @xinke-wang in #983
- Automatically report mean scores when applicable by @gaotongxiao in #995
- Add DBNet++ by @xinke-wang in #973
- Add MASTER by @JiaquanYe in #807
- Allow choosing metrics to report in text recognition tasks by @gaotongxiao in #989
- Add HierText converter by @Mountchicken in #948
- Fix lint_only in CircleCI by @gaotongxiao in #998
Bug Fixes
- Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in #927
- Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in #944
- Fix a Bug in ResNet plugin by @Mountchicken in #967
- revert a wrong setting in db_r18 cfg by @gaotongxiao in #978
- Fix TotalText Anno version issue by @xinke-wang in #945
- Update installation step of
albumentations
by @gaotongxiao in #984 - Fix ImgAug transform by @gaotongxiao in #949
- Fix GPG key error in CI and docker by @gaotongxiao in #988
- update label.lmdb by @Mountchicken in #991
- correct meta key by @garvan2021 in #926
- Use new image by @gaotongxiao in #976
- Fix Data Converter Issues by @xinke-wang in https://github.com/open-...
MMOCR Release v0.5.0
Highlights
- MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain
.txt
file to JSON line format.jsonl
, and then revise a few configurations to enable theLineJsonParser
. For more information, please read our step-by-step tutorial. - Tesseract is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in
mmocr.utils.ocr
by wrapping Tesseract as a detector and/or recognizer. Users can easily create an MMOCR object byMMOCR(det=’Tesseract’, recog=’Tesseract’)
. Credit to @garvan2021 - We release data converters for 16 widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( Det & Recog ) to explore further information.
- Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!
Migration Guide - ResNet
Some refactoring processes are still going on. For text recognition models, we unified the ResNet-like
architectures which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.
Plugin
-
Plugin
is a module category inherited from MMCV's implementation ofPLUGIN_LAYERS
, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at mmocr/models/textrecog/plugins/common.py, or click the button below.Plugin Example
@PLUGIN_LAYERS.register_module() class Maxpool2d(nn.Module): """A wrapper around nn.Maxpool2d(). Args: kernel_size (int or tuple(int)): Kernel size for max pooling layer stride (int or tuple(int)): Stride for max pooling layer padding (int or tuple(int)): Padding for pooling layer """ def __init__(self, kernel_size, stride, padding=0, **kwargs): super(Maxpool2d, self).__init__() self.model = nn.MaxPool2d(kernel_size, stride, padding) def forward(self, x): """ Args: x (Tensor): Input feature map Returns: Tensor: The tensor after Maxpooling layer. """ return self.model(x)
Stage-wise Plugins
-
ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving
plugins
parameters in ResNet.[port1: before stage] ---> [stage] ---> [port2: after stage]
-
E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this
resnet18_speical = ResNet( # for simplicity, some required # parameters are omitted plugins=[ dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='before_stage') dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, False, True), position='after_stage') ])
-
You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in MASTER as an example:
Multiple Plugins Example
-
ResNet in Master is based on ResNet31. And after each stage, a module named
GCAModule
will be used. TheGCAModule
is inserted before the stage-wise convolution layer in ResNet31. In conlusion, there will be two plugins atafter_stage
port in the same time.resnet_master = ResNet( # for simplicity, some required # parameters are omitted plugins=[ dict( cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)), stages=(True, True, False, False), position='before_stage'), dict( cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)), stages=(False, False, True, False), position='before_stage'), dict( cfg=dict(type='GCAModule', kernel_size=3, stride=1, padding=1), stages=[True, True, True, True], position='after_stage'), dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='after_stage') ])
-
-
In each plugin, we will pass two parameters (
in_channels
,out_channels
) to support operations that need the information of current channels.
Block-wise Plugin (Experimental)
-
We also refactored the
BasicBlock
used in ResNet. Now it can be customized with block-wise plugins. Check here for more details. -
BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.
[port1: before_conv1] ---> [conv1] ---> [port2: after_conv1] ---> [conv2] ---> [port3: after_conv2] ---> +(shortcut) ---> [port4: after_shortcut]
-
In each plugin, we will pass a parameter
in_channels
to support operations that need the information of current channels. -
E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:
Block-wise Plugin Example
resnet_31 = ResNet( in_channels=3, stem_channels=[64, 128], block_cfgs=dict(type='BasicBlock'), arch_layers=[1, 2, 5, 3], arch_channels=[256, 256, 512, 512], strides=[1, 1, 1, 1], plugins=[ dict( cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)), stages=(True, True, False, False), position='before_stage'), dict( cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)), stages=(False, False, True, False), position='before_stage'), dict( cfg=dict( type='ConvModule', kernel_size=3, stride=1, padding=1, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU')), stages=(True, True, True, True), position='after_stage') ])
Full Examples
ResNet without plugins
-
ResNet45 is used in ASTER and ABINet without any plugins.
resnet45_aster = ResNet( in_channels=3, stem_channels=[64, 128], block_cfgs=dict(type='BasicBlock', use_conv1x1='True'), arch_layers=[3, 4, 6, 6, 3], arch_channels=[32, 64, 128, 256, 512], strides=[(2, 2), (2, 2), (2, 1), (2, 1), (2, 1)]) resnet45_abi = ResNet( in_channels=3, stem_channels=32, block_cfgs=dict(type='BasicBlock', use_conv1x1='True'), arch_layers=[3, 4, 6, 6, 3], arch_channels=[32, 64, 128, 256, 512], strides=[2, 1, 2, 1, 1])
...
MMOCR Release v0.4.1
Highlights
- Visualizing edge weights in OpenSet KIE is now supported! #677
- Some configurations have been optimized to significantly speed up the training and testing processes! Don't worry - you can still tune these parameters in case these modifications do not work. #757
- Now you can use CPU to train/debug your model! #752
- We have fixed a severe bug that causes users unable to call
mmocr.apis.test
with our pre-built wheels. #667
New Features & Enhancements
- Show edge score for openset kie by @cuhk-hbsun in #677
- Download flake8 from github as pre-commit hooks by @gaotongxiao in #695
- Deprecate the support for 'python setup.py test' by @Harold-lkk in #722
- Disable multi-processing feature of cv2 to speed up data loading by @gaotongxiao in #721
- Extend ctw1500 converter to support text fields by @Harold-lkk in #729
- Extend totaltext converter to support text fields by @Harold-lkk in #728
- Speed up training by @gaotongxiao in #739
- Add setup multi-processing both in train and test.py by @Harold-lkk in #757
- Support CPU training/testing by @gaotongxiao in #752
- Support specify gpu for testing and training with gpu-id instead of gpu-ids and gpus by @Harold-lkk in #756
- Remove unnecessary custom_import from test.py by @Harold-lkk in #758
Bug Fixes
- Fix satrn onnxruntime test by @AllentDan in #679
- Support both ConcatDataset and UniformConcatDataset by @cuhk-hbsun in #675
- Fix bugs of show_results in single_gpu_test by @cuhk-hbsun in #667
- Fix a bug for sar decoder when bi-rnn is used by @MhLiao in #690
- Fix opencv version to avoid some bugs by @gaotongxiao in #694
- Fix py39 ci error by @Harold-lkk in #707
- Update visualize.py by @TommyZihao in #715
- Fix link of config by @cuhk-hbsun in #726
- Use yaml.safe_load instead of load by @gaotongxiao in #753
- Add necessary keys to test_pipelines to enable test-time visualization by @gaotongxiao in #754
Docs
- Fix recog.md by @gaotongxiao in #674
- Add config tutorial by @gaotongxiao in #683
- Add MMSelfSup/MMRazor/MMDeploy in readme by @cuhk-hbsun in #692
- Add recog & det model summary by @gaotongxiao in #693
- Update docs link by @gaotongxiao in #710
- add pull request template.md by @Harold-lkk in #711
- Add website links to readme by @gaotongxiao in #731
- update readme according to standard by @Harold-lkk in #742
New Contributors
- @MhLiao made their first contribution in #690
- @TommyZihao made their first contribution in #715
Full Changelog: v0.4.0...v0.4.1
MMOCR Release v0.4.0
Highlights
- We release a new text recognition model - ABINet (CVPR 2021, Oral). With dedicated model design and useful data augmentation transforms, ABINet achieves the best performance on irregular text recognition tasks. Check it out!
- We are also working hard to fulfill the requests from our community. OpenSet KIE is one of the achievements, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide a demo script to convert WildReceipt to open set domain, though it may not take full advantage of the OpenSet format. For more information, read our tutorial.
- APIs of models can be exposed through TorchServe. Docs
Breaking Changes & Migration Guide
Postprocessor
Some refactoring processes are still going on. For all text detection models, we unified their decode
implementations into a new module category, POSTPROCESSOR
, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the text_repr_type
argument in bbox_head
is deprecated and will be removed in the future release.
Migration Guide: Find a similar line from detection model's config:
text_repr_type=xxx,
And replace it with
postprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),
Take a snippet of PANet's config as an example. Before the change, its config for bbox_head
looks like:
bbox_head=dict(
type='PANHead',
text_repr_type='poly',
in_channels=[128, 128, 128, 128],
out_channels=6,
loss=dict(type='PANLoss')),
Afterwards:
bbox_head=dict(
type='PANHead',
in_channels=[128, 128, 128, 128],
out_channels=6,
loss=dict(type='PANLoss'),
postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),
There are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in mmocr/models/textdet/postprocess
or through our api docs.
New Config Structure
We reorganized the configs/
directory by extracting reusable sections into configs/_base_
. Now the directory tree of configs/_base_
is organized as follows:
_base_
├── det_datasets
├── det_models
├── det_pipelines
├── recog_datasets
├── recog_models
├── recog_pipelines
└── schedules
Most of model configs are making full use of base configs now, which makes the overall structural clearer and facilitates fair comparison across models. Despite the seemingly significant hierarchical difference, these changes would not break the backward compatibility as the names of model configs remain the same.
New Features
- Support openset kie by @cuhk-hbsun in #498
- Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in #497
- Support Chinese for kie show result by @cuhk-hbsun in #464
- Add TorchServe support for text detection and recognition by @Harold-lkk in #522
- Save filename in text detection test results by @cuhk-hbsun in #570
- Add codespell pre-commit hook and fix typos by @gaotongxiao in #520
- Avoid duplicate placeholder docs in CN by @gaotongxiao in #582
- Save results to json file for kie. by @cuhk-hbsun in #589
- Add SAR_CN to ocr.py by @gaotongxiao in #579
- mim extension for windows by @gaotongxiao in #641
- Support muitiple pipelines for different datasets by @cuhk-hbsun in #657
- ABINet Framework by @gaotongxiao in #651
Refactoring
- Refactor textrecog config structure by @cuhk-hbsun in #617
- Refactor text detection config by @cuhk-hbsun in #626
- refactor transformer modules by @cuhk-hbsun in #618
- refactor textdet postprocess by @cuhk-hbsun in #640
Docs
- C++ example section by @apiaccess21 in #593
- install.md Chinese section by @A465539338 in #364
- Add Chinese Translation of deployment.md. by @fatfishZhao in #506
- Fix a model link and add the metafile for SATRN by @gaotongxiao in #473
- Improve docs style by @gaotongxiao in #474
- Enhancement & sync Chinese docs by @gaotongxiao in #492
- TorchServe docs by @gaotongxiao in #539
- Update docs menu by @gaotongxiao in #564
- Docs for KIE CloseSet & OpenSet by @gaotongxiao in #573
- Fix broken links by @gaotongxiao in #576
- Docstring for text recognition models by @gaotongxiao in #562
- Add MMFlow & MIM by @gaotongxiao in #597
- Add MMFewShot by @gaotongxiao in #621
- Update model readme by @gaotongxiao in #604
- Add input size check to model_inference by @mpena-vina in #633
- Docstring for textdet models by @gaotongxiao in #561
- Add MMHuman3D in readme by @gaotongxiao in #644
- Use shared menu from theme instead by @gaotongxiao in #655
- Refactor docs structure by @gaotongxiao in #662
- Docs fix by @gaotongxiao in #664
Enhancements
- Use bounding box around polygon instead of within polygon by @alexander-soare in #469
- Add CITATION.cff by @gaotongxiao in #476
- Add py3.9 CI by @gaotongxiao in #475
- update model-index.yml by @gaotongxiao in #484
- Use container in CI by @gaotongxiao in #502
- CircleCI Setup by @gaotongxiao in #611
- Remove unnecessary custom_import from train.py by @gaotongxiao in #603
- Change the upper version of mmcv to 1.5.0 by @zhouzaida in #628
- Update CircleCI by @gaotongxiao in #631
- Pass custom_hooks to MMCV by @gaotongxiao in #609
- Skip CI when some specific files were changed by @gaotongxiao in #642
- Add markdown linter in pre-commit hook by @gaotongxiao in #643
- Use shape from loaded image by @cuhk-hbsun in #652
- Cancel previous runs that are not completed by @Harold-lkk in #666
Bug Fixes
- Modify algorithm "sar" weights path in metafile by @ShoupingShan in #581
- Fix Cuda CI by @gaotongxiao in #472
- Fix image export in test.py for KIE models by @gaotongxiao in #486
- Allow invalid polygons in intersection and union by default by @gaotongxiao in #471
- Update checkpoints' links for SATRN by @gaotongxiao in #518
- Fix converting to onnx bug because of changing key from img_shape to resize_shape by @Harold-lkk in #523
- Fix PyTorch 1.6 incompatible checkpoints by @gaotongxiao in #540
- Fix paper field in metafiles by @gaotongxiao in #550
- Unify recognition task names in metafiles by @gaotongxiao in #548
- Fix py3.9 CI by @gaotongxiao in #563
- Always map location to cpu when loading checkpoint by @gaotongxiao in #567
- Fix wrong model builder in recog_test_imgs by @gaotongxiao in #574
- Improve dbnet r50 by fixing img std by @gaotongxiao in #578
- Fix resource warning: unclosed file by @cuhk-hbsun in #577
- Fix bug that same start_point for different texts in draw_texts_by_pil by @cuhk-hbsun in #587
- Keep original texts for kie by @cuhk-hbsun in #588
- Fix random seed by @gaotongxiao in https://github.com/open-mmlab/...
MMOCR Release v0.3.0
Highlights
- We add a new text recognition model -- SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. (@2793145003) #405
- Improve the demo script,
ocr.py
, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. (@samayala22, @manjrekarom) #371, #386, #400, #374, #428 - Our documentation is reorganized into a clearer structure. More useful contents are on the way! #409, #454
- The requirement of
Polygon3
is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions inshapely
instead. #448
Breaking Changes & Migration Guide
- Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs #382
- MMOCR now has its own model and layer registries inherited from MMDetection's or MMCV's counterparts. (#436) The modified hierarchical structure of the model registries are now organized as follows.
mmcv.MODELS -> mmdet.BACKBONES -> BACKBONES
mmcv.MODELS -> mmdet.NECKS -> NECKS
mmcv.MODELS -> mmdet.ROI_EXTRACTORS -> ROI_EXTRACTORS
mmcv.MODELS -> mmdet.HEADS -> HEADS
mmcv.MODELS -> mmdet.LOSSES -> LOSSES
mmcv.MODELS -> mmdet.DETECTORS -> DETECTORS
mmcv.ACTIVATION_LAYERS -> ACTIVATION_LAYERS
mmcv.UPSAMPLE_LAYERS -> UPSAMPLE_LAYERS
To migrate your old implementation to our new backend, you need to change the import path of any registries and their corresponding builder functions (including build_detectors
) from mmdet.models.builder
to mmocr.models.builder
. If you have referred to any model or layer of MMDetection or MMCV in your model config, you need to add mmdet.
or mmcv.
prefix to its name to inform the model builder of the right namespace to work on.
Interested users may check out MMCV's tutorial on Registry for in-depth explanations on its mechanism.
New Features
- Automatically replace SyncBN with BN for inference #420, #453
- Support batch inference for CRNN and SegOCR #407
- Support exporting documentation in pdf or epub format #406
- Support
persistent_workers
option in data loader #459
Bug Fixes
- Remove depreciated key in kie_test_imgs.py #381
- Fix dimension mismatch in batch testing/inference of DBNet #383
- Fix the problem of dice loss which stays at 1 with an empty target given #408
- Fix a wrong link in ocr.py (@naarkhoo) #417
- Fix undesired assignment to "pretrained" in test.py #418
- Fix a problem in polygon generation of DBNet #421, #443
- Skip invalid annotations in totaltext_converter #438
- Add zero division handler in poly utils, remove Polygon3 #448
Improvements
- Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to @gen-ko who has re-distributed this package!)
- Support MIM #394
- Add tests for PyTorch 1.9 in CI #401
- Enables fullscreen layout in readthedocs #413
- General documentation enhancement #395
- Update version checker #427
- Add copyright info #439
- Update citation information #440
Contributors
We thank @2793145003, @samayala22, @manjrekarom, @naarkhoo, @gen-ko, @duanjiaqi, @gaotongxiao, @cuhk-hbsun, @innerlee, @wdsd641417025 for their contribution to this release!
MMOCR Release v0.2.1
Highlights
- Upgrade to use MMCV-full >= 1.3.8 and MMDetection >= 2.13.0 for latest features
- Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) #278, #291, #300, #328
- Unified parameter initialization method which uses init_cfg in config files #365
New Features
- Support TextOCR dataset #293
- Support Total-Text dataset #266, #273, #357
- Support grouping text detection box into lines #290, #304
- Add benchmark_processing script that benchmarks data loading process #261
- Add SynthText preprocessor for text recognition models #351, #361
- Support batch inference during testing #310
- Add user-friendly OCR inference script #366
Bug Fixes
- Fix improper class ignorance in SDMGR Loss #221
- Fix potential numerical zero division error in DRRG #224
- Fix installing requirements with pip and mim #242
- Fix dynamic input error of DBNet #269
- Fix space parsing error in LineStrParser #285
- Fix textsnake decode error #264
- Correct isort setup #288
- Fix a bug in SDMGR config #316
- Fix kie_test_img for KIE nonvisual #319
- Fix metafiles #342
- Fix different device problem in FCENet #334
- Ignore improper tailing empty characters in annotation files #358
- Docs fixes #247, #255, #265, #267, #268, #270, #276, #287, #330, #355, #367
- Fix NRTR config #356, #370
Improvements
- Add backend for resizeocr #244
- Skip image processing pipelines in SDMGR novisual #260
- Speedup DBNet #263
- Update mmcv installation method in workflow #323
- Add part of Chinese documentations #353, #362
- Add support for ConcatDataset with two workflows #348
- Add list_from_file and list_to_file utils #226
- Speed up sort_vertex #239
- Support distributed evaluation of KIE #234
- Add pretrained FCENet on IC15 #258
- Support CPU for OCR demo #227
- Avoid extra image pre-processing steps #375
MMOCR Release v0.2.0
Highlights
- Add the NER approach Bert-softmax (NAACL'2019)
- Add the text detection method DRRG (CVPR'2020)
- Add the text detection method FCENet (CVPR'2021)
- Increase the ease of use via adding text detection and recognition end-to-end demo, and colab online demo.
- Simplify the installation.
New Features
- Add Bert-softmax for Ner task #148
- Add DRRG #189
- Add FCENet #133
- Add end-to-end demo #105
- Support batch inference #86 #87 #178
- Add TPS preprocessor for text recognition #117 #135
- Add demo documentation #151 #166 #168 #170 #171
- Add checkpoint for Chinese recognition #156
- Add metafile #175 #176 #177 #182 #183
- Add support for numpy array inference #74
Bug Fixes
- Fix the duplicated point bug due to transform for textsnake #130
- Fix CTC loss NaN #159
- Fix error raised if result is empty in demo #144
- Fix results missing if one image has a large number of boxes #98
- Fix package missing in dockerfile #109
Improvements
MMOCR Release v0.1.0
Main Features
- Support text detection, text recognition and the corresponding downstream tasks such as key information extraction.
- For text detection, support both single-step (
PSENet
,PANet
,DBNet
,TextSnake
) and two-step (MaskRCNN
) methods. - For text recognition, support CTC-loss based method
CRNN
; Encoder-decoder (with attention) based methodsSAR
,Robustscanner
; Segmentation based methodSegOCR
; Transformer based methodNRTR
. - For key information extraction, support GCN based method
SDMG-R
. - Provide checkpoints and log files for all of the methods above.