[Feature] Support computing mean scores in UniformConcatDataset #981

gaotongxiao · 2022-04-27T04:00:10Z

Motivation

Since text recognition models are usually evaluated on multiple datasets, it's hard to compare the model's performance across epochs without a unified indicator such as the mean scores. This PR supports get_mean in UniformConcatDataset. When it's on, the mean score of {metric_name} of concatenated datasets will be added to the evaluation results with the name mean_{metric_name}.

Modification

Modified mmdet.datasets.ConcatDataset.evaluate() to compute the mean scores when both self.separate_eval and self.get_mean are True. Also disabled evaluating the datasets as a whole as this feature was not verified on MMOCR. We don't want to make excessive changes in this PR.

Use case

To show the average results, set show_mean_scores=True in data.val and data.test of the model's config: (e.g. in

mmocr/configs/textrecog/crnn/crnn_academic_dataset.py

Lines 24 to 31 in 3188e53

    
           val=dict( 
        
               type='UniformConcatDataset', 
        
               datasets=test_list, 
        
               pipeline=test_pipeline), 
        
           test=dict( 
        
               type='UniformConcatDataset', 
        
               datasets=test_list, 
        
               pipeline=test_pipeline))

)

 val=dict( 
     type='UniformConcatDataset', 
     datasets=test_list, 
     show_mean_scores=True,
     pipeline=test_pipeline), 
 test=dict( 
     type='UniformConcatDataset', 
     datasets=test_list, 
     show_mean_scores=True,
     pipeline=test_pipeline))

BC-breaking (Optional)

No

mmocr/datasets/uniform_concat_dataset.py

gaotongxiao · 2022-04-28T03:30:02Z

Any other comments? If not, I'll start to update the configs.

xinke-wang · 2022-04-28T03:33:46Z

Any other comments? If not, I'll start to update the configs.

Wait a moment. I am still testing this code.

xinke-wang · 2022-04-28T04:21:55Z

I have some minor concerns about the current evaluation process:

Current log of recognition is not clear especially when the number of test datasets increases, for example, the index 0_, 1_, 2_ ... makes it difficult to find the specific subset. (Not sure if it is necessary to fix in this PR)

In comparison, the det log prints details of each dataset in a clearer way. So the users can check the performance of a specific dataset easily.

Using different dataset type in detection task cannot trigger the NotImplementedError, instead, it raises AttributeError: UniformConcatDataset: 'IcdarDataset' object has no attribute 'flag'.

Using the following toy data config to reproduce the issue

root = 'tests/data/toy_dataset'

# dataset with type='TextDetDataset'
train1 = dict(
    type='TextDetDataset',
    img_prefix=f'{root}/imgs',
    ann_file=f'{root}/instances_test.txt',
    loader=dict(
        type='HardDiskLoader',
        repeat=4,
        parser=dict(
            type='LineJsonParser',
            keys=['file_name', 'height', 'width', 'annotations'])),
    pipeline=None,
    test_mode=False)

# dataset with type='IcdarDataset'
train2 = dict(
    type='IcdarDataset',
    ann_file=f'{root}/instances_test.json',
    img_prefix=f'{root}/imgs',
    pipeline=None)

test = dict(
    type='TextDetDataset',
    img_prefix=f'{root}/imgs',
    ann_file=f'{root}/instances_test.txt',
    loader=dict(
        type='HardDiskLoader',
        repeat=1,
        parser=dict(
            type='LineJsonParser',
            keys=['file_name', 'height', 'width', 'annotations'])),
    pipeline=None,
    test_mode=True)

train_list = [train1, train2]

test_list = [test, train2]

Other things look good to me. A suggestion is how about use print_mean_scores or show_average_performance or get_mean_scores. Only using get_mean is a little bit confused, and the users may have to check the doc to understand what this param refers to.

gaotongxiao · 2022-04-28T09:47:11Z

@xinke-wang Unfortunately, even though we can print the results for each dataset, there is no way to get rid of the summary at the end of the evaluation, which can lead to duplicate outputs. To streamline the evaluation report, I can make another PR after which users can choose the evaluation metric(s) to report by customizing the config.

…-mmlab#981) * Get avg results in UniformConcatDataset * add docstr * Fix * fix test * fix typo

Get avg results in UniformConcatDataset

a2d367e

mm-assistant bot assigned Mountchicken Apr 27, 2022

gaotongxiao changed the title ~~[Feature] Support computing mean results in UniformConcatDataset~~ [Feature] Support computing mean scores in UniformConcatDataset Apr 27, 2022

gaotongxiao requested a review from xinke-wang April 27, 2022 07:08

xinke-wang reviewed Apr 28, 2022

View reviewed changes

mmocr/datasets/uniform_concat_dataset.py Outdated Show resolved Hide resolved

add docstr

0e60a05

Fix

164df10

fix test

d52ae51

xinke-wang approved these changes Apr 29, 2022

View reviewed changes

fix typo

da519c1

gaotongxiao merged commit 064a2b8 into open-mmlab:main Apr 29, 2022

gaotongxiao deleted the mean_res branch April 29, 2022 06:48

gaotongxiao mentioned this pull request May 3, 2022

About the average acc for text_recog task? #993

Closed

gaotongxiao added a commit to gaotongxiao/mmocr that referenced this pull request Jul 15, 2022

[Feature] Support computing mean scores in UniformConcatDataset (open…

0bdc587

…-mmlab#981) * Get avg results in UniformConcatDataset * add docstr * Fix * fix test * fix typo

gaotongxiao added a commit to gaotongxiao/mmocr that referenced this pull request Jul 15, 2022

[Feature] Support computing mean scores in UniformConcatDataset (open…

0f0542d

…-mmlab#981) * Get avg results in UniformConcatDataset * add docstr * Fix * fix test * fix typo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support computing mean scores in UniformConcatDataset #981

[Feature] Support computing mean scores in UniformConcatDataset #981

gaotongxiao commented Apr 27, 2022 •

edited

Loading

gaotongxiao commented Apr 28, 2022

xinke-wang commented Apr 28, 2022

xinke-wang commented Apr 28, 2022 •

edited

Loading

gaotongxiao commented Apr 28, 2022 •

edited

Loading

	val=dict(
	type='UniformConcatDataset',
	datasets=test_list,
	pipeline=test_pipeline),
	test=dict(
	type='UniformConcatDataset',
	datasets=test_list,
	pipeline=test_pipeline))

[Feature] Support computing mean scores in UniformConcatDataset #981

[Feature] Support computing mean scores in UniformConcatDataset #981

Conversation

gaotongxiao commented Apr 27, 2022 • edited Loading

Motivation

Modification

Use case

BC-breaking (Optional)

gaotongxiao commented Apr 28, 2022

xinke-wang commented Apr 28, 2022

xinke-wang commented Apr 28, 2022 • edited Loading

gaotongxiao commented Apr 28, 2022 • edited Loading

gaotongxiao commented Apr 27, 2022 •

edited

Loading

xinke-wang commented Apr 28, 2022 •

edited

Loading

gaotongxiao commented Apr 28, 2022 •

edited

Loading