Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support computing mean scores in UniformConcatDataset #981

Merged
merged 5 commits into from
Apr 29, 2022

Conversation

gaotongxiao
Copy link
Collaborator

@gaotongxiao gaotongxiao commented Apr 27, 2022

Motivation

Since text recognition models are usually evaluated on multiple datasets, it's hard to compare the model's performance across epochs without a unified indicator such as the mean scores. This PR supports get_mean in UniformConcatDataset. When it's on, the mean score of {metric_name} of concatenated datasets will be added to the evaluation results with the name mean_{metric_name}.

Modification

Modified mmdet.datasets.ConcatDataset.evaluate() to compute the mean scores when both self.separate_eval and self.get_mean are True. Also disabled evaluating the datasets as a whole as this feature was not verified on MMOCR. We don't want to make excessive changes in this PR.

Use case

To show the average results, set show_mean_scores=True in data.val and data.test of the model's config: (e.g. in

val=dict(
type='UniformConcatDataset',
datasets=test_list,
pipeline=test_pipeline),
test=dict(
type='UniformConcatDataset',
datasets=test_list,
pipeline=test_pipeline))
)

 val=dict( 
     type='UniformConcatDataset', 
     datasets=test_list, 
     show_mean_scores=True,
     pipeline=test_pipeline), 
 test=dict( 
     type='UniformConcatDataset', 
     datasets=test_list, 
     show_mean_scores=True,
     pipeline=test_pipeline)) 

BC-breaking (Optional)

No

@gaotongxiao gaotongxiao changed the title [Feature] Support computing mean results in UniformConcatDataset [Feature] Support computing mean scores in UniformConcatDataset Apr 27, 2022
@gaotongxiao gaotongxiao requested a review from xinke-wang April 27, 2022 07:08
@gaotongxiao
Copy link
Collaborator Author

Any other comments? If not, I'll start to update the configs.

@xinke-wang
Copy link
Collaborator

Any other comments? If not, I'll start to update the configs.

Wait a moment. I am still testing this code.

@xinke-wang
Copy link
Collaborator

xinke-wang commented Apr 28, 2022

I have some minor concerns about the current evaluation process:

  • Current log of recognition is not clear especially when the number of test datasets increases, for example, the index 0_, 1_, 2_ ... makes it difficult to find the specific subset. (Not sure if it is necessary to fix in this PR)

screenshot-20220428-121009

In comparison, the det log prints details of each dataset in a clearer way. So the users can check the performance of a specific dataset easily.

image

  • Using different dataset type in detection task cannot trigger the NotImplementedError, instead, it raises AttributeError: UniformConcatDataset: 'IcdarDataset' object has no attribute 'flag'.

Using the following toy data config to reproduce the issue

root = 'tests/data/toy_dataset'

# dataset with type='TextDetDataset'
train1 = dict(
    type='TextDetDataset',
    img_prefix=f'{root}/imgs',
    ann_file=f'{root}/instances_test.txt',
    loader=dict(
        type='HardDiskLoader',
        repeat=4,
        parser=dict(
            type='LineJsonParser',
            keys=['file_name', 'height', 'width', 'annotations'])),
    pipeline=None,
    test_mode=False)

# dataset with type='IcdarDataset'
train2 = dict(
    type='IcdarDataset',
    ann_file=f'{root}/instances_test.json',
    img_prefix=f'{root}/imgs',
    pipeline=None)

test = dict(
    type='TextDetDataset',
    img_prefix=f'{root}/imgs',
    ann_file=f'{root}/instances_test.txt',
    loader=dict(
        type='HardDiskLoader',
        repeat=1,
        parser=dict(
            type='LineJsonParser',
            keys=['file_name', 'height', 'width', 'annotations'])),
    pipeline=None,
    test_mode=True)

train_list = [train1, train2]

test_list = [test, train2]
  • Other things look good to me. A suggestion is how about use print_mean_scores or show_average_performance or get_mean_scores. Only using get_mean is a little bit confused, and the users may have to check the doc to understand what this param refers to.

@gaotongxiao
Copy link
Collaborator Author

gaotongxiao commented Apr 28, 2022

@xinke-wang Unfortunately, even though we can print the results for each dataset, there is no way to get rid of the summary at the end of the evaluation, which can lead to duplicate outputs. To streamline the evaluation report, I can make another PR after which users can choose the evaluation metric(s) to report by customizing the config.

@gaotongxiao gaotongxiao merged commit 064a2b8 into open-mmlab:main Apr 29, 2022
@gaotongxiao gaotongxiao deleted the mean_res branch April 29, 2022 06:48
gaotongxiao added a commit to gaotongxiao/mmocr that referenced this pull request Jul 15, 2022
…-mmlab#981)

* Get avg results in UniformConcatDataset

* add docstr

* Fix

* fix test

* fix typo
gaotongxiao added a commit to gaotongxiao/mmocr that referenced this pull request Jul 15, 2022
…-mmlab#981)

* Get avg results in UniformConcatDataset

* add docstr

* Fix

* fix test

* fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants