[Docs] Add Chinese Guidance on How to Add New Datasets to Dataset Preparer #1506

xinke-wang · 2022-11-02T15:42:32Z

Add Guidance on How to Add New Datasets to Dataset Preparer (Chinese Version Only), using ICDAR 2013 dataset as an example. Also, this PR adds the IC13 dataset to the dataset preparer.

codecov · 2022-11-02T15:46:56Z

Codecov Report

Base: 88.16% // Head: 85.85% // Decreases project coverage by -2.30% ⚠️

Coverage data is based on head (d11757d) compared to base (ff04034).
Patch coverage: 50.25% of modified lines in pull request are covered.

❗ Current head d11757d differs from pull request most recent head f116350. Consider uploading reports for the commit f116350 to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           dev-1.x    #1506      +/-   ##
===========================================
- Coverage    88.16%   85.85%   -2.31%     
===========================================
  Files          147      158      +11     
  Lines         9249     9881     +632     
  Branches      1268     1368     +100     
===========================================
+ Hits          8154     8483     +329     
- Misses         863     1156     +293     
- Partials       232      242      +10

Flag	Coverage Δ
unittests	`85.85% <50.25%> (-2.31%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmocr/utils/polygon_utils.py	`98.70% <ø> (ø)`
mmocr/datasets/preparers/data_obtainer.py	`20.73% <20.73%> (ø)`
mmocr/datasets/preparers/data_converter.py	`20.70% <21.46%> (ø)`
mmocr/datasets/preparers/parsers/coco_parser.py	`32.00% <32.00%> (ø)`
mmocr/datasets/preparers/parsers/base.py	`72.00% <72.00%> (ø)`
mmocr/datasets/preparers/data_preparer.py	`72.72% <72.72%> (ø)`
...ocr/datasets/preparers/parsers/totaltext_parser.py	`82.69% <82.69%> (ø)`
...ocr/datasets/preparers/parsers/icdar_txt_parser.py	`87.80% <87.80%> (ø)`
mmocr/datasets/preparers/dumpers/dumpers.py	`89.28% <89.28%> (ø)`
mmocr/utils/fileio.py	`95.23% <93.75%> (-4.77%)`	⬇️
... and 9 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

docs/zh_cn/user_guides/data_prepare/dataset_preparer.md

gaotongxiao · 2022-11-14T09:00:51Z

Tested CRNN on IC13 test split generated by the dataset preparer, got 82.65 instead of 87.39 (https://mmocr.readthedocs.io/en/dev-1.x/textrecog_models.html#id5). The same issue has also been found on IC15. I'll investigate the issue behind such a difference.

gaotongxiao · 2022-11-14T09:42:55Z

Dataset preparer now generates 1095 test images without post-filtering, which is usually required. (https://arxiv.org/pdf/1904.01906.pdf)

gaotongxiao · 2022-11-14T09:47:20Z

It might be easy to develop a post-filtering script for IC13, but IC15 is filtered manually and may not be generated. Shall we allow users to download existing annotations for these special cases?

xinke-wang · 2022-11-14T10:24:58Z

We may comment the original URL, and add the specified version of annotation just like the old converter.

xinke-wang · 2022-11-14T10:27:18Z

It might be easy to develop a post-filtering script for IC13, but IC15 is filtered manually and may not be generated. Shall we allow users to download existing annotations for these special cases?

I can raise a new PR to fix this issue

gaotongxiao · 2022-11-14T11:23:17Z

@xinke-wang Yes, we need to download annotations by default for IC13&15 textrecog datasets. I can upload the filtered annotations for IC13 and IC15 if needed. I think I'll not merge this PR till your get the new PR ready.

xinke-wang · 2022-11-15T03:44:16Z

@xinke-wang Yes, we need to download annotations by default for IC13&15 textrecog datasets. I can upload the filtered annotations for IC13 and IC15 if needed. I think I'll not merge this PR till your get the new PR ready.

I've added the 1015 version for IC13. However, after checking the link provided by the doc https://mmocr.readthedocs.io/en/latest/datasets/recog.html#icdar-2015, it seems the 2077 IC15 was used? Can you check if MMOCR models were tested on IC15 2077 or IC15 1811?

Harold-lkk · 2022-11-15T06:04:03Z

Can split pr into several parts for not blocking other pr:

icdar13 prepare
guide docs
bugfix: icdarparser
metafile fix: WildReceipt

Harold-lkk · 2022-11-15T06:05:23Z

@xinke-wang Yes, we need to download annotations by default for IC13&15 textrecog datasets. I can upload the filtered annotations for IC13 and IC15 if needed. I think I'll not merge this PR till your get the new PR ready.

I've added the 1015 version for IC13. However, after checking the link provided by the doc https://mmocr.readthedocs.io/en/latest/datasets/recog.html#icdar-2015, it seems the 2077 IC15 was used? Can you check if MMOCR models were tested on IC15 2077 or IC15 1811?

2077 for recogintion test

xinke-wang · 2022-11-15T06:17:13Z

@xinke-wang Yes, we need to download annotations by default for IC13&15 textrecog datasets. I can upload the filtered annotations for IC13 and IC15 if needed. I think I'll not merge this PR till your get the new PR ready.

I've added the 1015 version for IC13. However, after checking the link provided by the doc https://mmocr.readthedocs.io/en/latest/datasets/recog.html#icdar-2015, it seems the 2077 IC15 was used? Can you check if MMOCR models were tested on IC15 2077 or IC15 1811?

2077 for recogintion test

Ok, so we do not need to fix the IC15 preparer since the current one is 2077.

xinke-wang · 2022-11-15T06:44:38Z

Can split pr into several parts for not blocking other pr:
* icdar13 prepare

* guide docs

* bugfix: icdarparser

* metafile fix: WildReceipt

Split to bug fix #1529 & metafile fix #1528 & ic13 #1531

add doc for data preparer & add IC13

b1adf5e

mm-assistant bot assigned Harold-lkk Nov 2, 2022

fix bugs

121765d

This was referenced Nov 9, 2022

[Featuare] Add cute80 to dataset preparer #1522

Merged

[Feature] Add SVTP to dataset preparer #1523

Merged

gaotongxiao reviewed Nov 11, 2022

View reviewed changes

fix comments

7e6f1bc

gaotongxiao added the bug Something isn't working label Nov 14, 2022

gaotongxiao approved these changes Nov 14, 2022

View reviewed changes

Harold-lkk mentioned this pull request Nov 15, 2022

[Bugs] fix crop without padding and recog metainfo delete unuse info #1526

Merged

fix ic13

fe9f844

Harold-lkk mentioned this pull request Nov 15, 2022

[Feature] iiit5k converter #1530

Merged

split icparser & wildreceipt metafile fix

d11757d

split ic13

f116350

gaotongxiao changed the title ~~[Docs] Add Guidance on How to Add New Datasets to Dataset Preparer~~ [Docs] Add Chinese Guidance on How to Add New Datasets to Dataset Preparer Nov 15, 2022

gaotongxiao merged commit e28fc32 into open-mmlab:dev-1.x Nov 15, 2022

This was referenced Nov 17, 2022

[Attention] 超级视客营 MMOCR #1548

Closed

[Attention] OpenMMLab Codecamp #1549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Add Chinese Guidance on How to Add New Datasets to Dataset Preparer #1506

[Docs] Add Chinese Guidance on How to Add New Datasets to Dataset Preparer #1506

xinke-wang commented Nov 2, 2022

codecov bot commented Nov 2, 2022 •

edited

Loading

gaotongxiao commented Nov 14, 2022 •

edited

Loading

gaotongxiao commented Nov 14, 2022

gaotongxiao commented Nov 14, 2022 •

edited

Loading

xinke-wang commented Nov 14, 2022

xinke-wang commented Nov 14, 2022

gaotongxiao commented Nov 14, 2022 •

edited

Loading

xinke-wang commented Nov 15, 2022

Harold-lkk commented Nov 15, 2022

Harold-lkk commented Nov 15, 2022

xinke-wang commented Nov 15, 2022 •

edited

Loading

xinke-wang commented Nov 15, 2022 •

edited

Loading

[Docs] Add Chinese Guidance on How to Add New Datasets to Dataset Preparer #1506

[Docs] Add Chinese Guidance on How to Add New Datasets to Dataset Preparer #1506

Conversation

xinke-wang commented Nov 2, 2022

codecov bot commented Nov 2, 2022 • edited Loading

Codecov Report

gaotongxiao commented Nov 14, 2022 • edited Loading

gaotongxiao commented Nov 14, 2022

gaotongxiao commented Nov 14, 2022 • edited Loading

xinke-wang commented Nov 14, 2022

xinke-wang commented Nov 14, 2022

gaotongxiao commented Nov 14, 2022 • edited Loading

xinke-wang commented Nov 15, 2022

Harold-lkk commented Nov 15, 2022

Harold-lkk commented Nov 15, 2022

xinke-wang commented Nov 15, 2022 • edited Loading

xinke-wang commented Nov 15, 2022 • edited Loading

codecov bot commented Nov 2, 2022 •

edited

Loading

gaotongxiao commented Nov 14, 2022 •

edited

Loading

gaotongxiao commented Nov 14, 2022 •

edited

Loading

gaotongxiao commented Nov 14, 2022 •

edited

Loading

xinke-wang commented Nov 15, 2022 •

edited

Loading

xinke-wang commented Nov 15, 2022 •

edited

Loading