Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Add Chinese Guidance on How to Add New Datasets to Dataset Preparer #1506

Merged
merged 6 commits into from
Nov 15, 2022

Conversation

xinke-wang
Copy link
Collaborator

Add Guidance on How to Add New Datasets to Dataset Preparer (Chinese Version Only), using ICDAR 2013 dataset as an example. Also, this PR adds the IC13 dataset to the dataset preparer.

@codecov
Copy link

codecov bot commented Nov 2, 2022

Codecov Report

Base: 88.16% // Head: 85.85% // Decreases project coverage by -2.30% ⚠️

Coverage data is based on head (d11757d) compared to base (ff04034).
Patch coverage: 50.25% of modified lines in pull request are covered.

❗ Current head d11757d differs from pull request most recent head f116350. Consider uploading reports for the commit f116350 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           dev-1.x    #1506      +/-   ##
===========================================
- Coverage    88.16%   85.85%   -2.31%     
===========================================
  Files          147      158      +11     
  Lines         9249     9881     +632     
  Branches      1268     1368     +100     
===========================================
+ Hits          8154     8483     +329     
- Misses         863     1156     +293     
- Partials       232      242      +10     
Flag Coverage Δ
unittests 85.85% <50.25%> (-2.31%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmocr/utils/polygon_utils.py 98.70% <ø> (ø)
mmocr/datasets/preparers/data_obtainer.py 20.73% <20.73%> (ø)
mmocr/datasets/preparers/data_converter.py 20.70% <21.46%> (ø)
mmocr/datasets/preparers/parsers/coco_parser.py 32.00% <32.00%> (ø)
mmocr/datasets/preparers/parsers/base.py 72.00% <72.00%> (ø)
mmocr/datasets/preparers/data_preparer.py 72.72% <72.72%> (ø)
...ocr/datasets/preparers/parsers/totaltext_parser.py 82.69% <82.69%> (ø)
...ocr/datasets/preparers/parsers/icdar_txt_parser.py 87.80% <87.80%> (ø)
mmocr/datasets/preparers/dumpers/dumpers.py 89.28% <89.28%> (ø)
mmocr/utils/fileio.py 95.23% <93.75%> (-4.77%) ⬇️
... and 9 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@gaotongxiao gaotongxiao added the bug Something isn't working label Nov 14, 2022
@gaotongxiao
Copy link
Collaborator

gaotongxiao commented Nov 14, 2022

Tested CRNN on IC13 test split generated by the dataset preparer, got 82.65 instead of 87.39 (https://mmocr.readthedocs.io/en/dev-1.x/textrecog_models.html#id5). The same issue has also been found on IC15. I'll investigate the issue behind such a difference.

@gaotongxiao
Copy link
Collaborator

Dataset preparer now generates 1095 test images without post-filtering, which is usually required. (https://arxiv.org/pdf/1904.01906.pdf)

image

@gaotongxiao
Copy link
Collaborator

gaotongxiao commented Nov 14, 2022

It might be easy to develop a post-filtering script for IC13, but IC15 is filtered manually and may not be generated. Shall we allow users to download existing annotations for these special cases?

@xinke-wang
Copy link
Collaborator Author

We may comment the original URL, and add the specified version of annotation just like the old converter.

@xinke-wang
Copy link
Collaborator Author

It might be easy to develop a post-filtering script for IC13, but IC15 is filtered manually and may not be generated. Shall we allow users to download existing annotations for these special cases?

I can raise a new PR to fix this issue

@gaotongxiao
Copy link
Collaborator

gaotongxiao commented Nov 14, 2022

@xinke-wang Yes, we need to download annotations by default for IC13&15 textrecog datasets. I can upload the filtered annotations for IC13 and IC15 if needed. I think I'll not merge this PR till your get the new PR ready.

@xinke-wang
Copy link
Collaborator Author

@xinke-wang Yes, we need to download annotations by default for IC13&15 textrecog datasets. I can upload the filtered annotations for IC13 and IC15 if needed. I think I'll not merge this PR till your get the new PR ready.

I've added the 1015 version for IC13. However, after checking the link provided by the doc https://mmocr.readthedocs.io/en/latest/datasets/recog.html#icdar-2015, it seems the 2077 IC15 was used? Can you check if MMOCR models were tested on IC15 2077 or IC15 1811?

@Harold-lkk
Copy link
Collaborator

Can split pr into several parts for not blocking other pr:

  • icdar13 prepare
  • guide docs
  • bugfix: icdarparser
  • metafile fix: WildReceipt

@Harold-lkk
Copy link
Collaborator

@xinke-wang Yes, we need to download annotations by default for IC13&15 textrecog datasets. I can upload the filtered annotations for IC13 and IC15 if needed. I think I'll not merge this PR till your get the new PR ready.

I've added the 1015 version for IC13. However, after checking the link provided by the doc https://mmocr.readthedocs.io/en/latest/datasets/recog.html#icdar-2015, it seems the 2077 IC15 was used? Can you check if MMOCR models were tested on IC15 2077 or IC15 1811?

2077 for recogintion test

@xinke-wang
Copy link
Collaborator Author

xinke-wang commented Nov 15, 2022

@xinke-wang Yes, we need to download annotations by default for IC13&15 textrecog datasets. I can upload the filtered annotations for IC13 and IC15 if needed. I think I'll not merge this PR till your get the new PR ready.

I've added the 1015 version for IC13. However, after checking the link provided by the doc https://mmocr.readthedocs.io/en/latest/datasets/recog.html#icdar-2015, it seems the 2077 IC15 was used? Can you check if MMOCR models were tested on IC15 2077 or IC15 1811?

2077 for recogintion test

Ok, so we do not need to fix the IC15 preparer since the current one is 2077.

@xinke-wang
Copy link
Collaborator Author

xinke-wang commented Nov 15, 2022

Can split pr into several parts for not blocking other pr:

* icdar13 prepare

* guide docs

* bugfix: icdarparser

* metafile fix: WildReceipt

Split to bug fix #1529 & metafile fix #1528 & ic13 #1531

@gaotongxiao gaotongxiao changed the title [Docs] Add Guidance on How to Add New Datasets to Dataset Preparer [Docs] Add Chinese Guidance on How to Add New Datasets to Dataset Preparer Nov 15, 2022
@gaotongxiao gaotongxiao merged commit e28fc32 into open-mmlab:dev-1.x Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants