A small bug in tools/analyse_logs.py caused by wrong plot_iter in some cases. #1426

Y-M-Y · 2022-03-28T16:28:36Z

Hi, thank you for constructing such a fabulous toolbox. I meet a problem in using tools/analyse_logs.py. To be specific, there might be some bugs with the variable plot_iter in plotting iter-based curves.

Describe the bug
When using tools/analyse_logs.py to plot curves of iter-based variables such lr and loss, the curve will be wrong like the following figs. The lr schedule is poly, the curve should be a line, but it is wrong from iter200 to iter1344. So as the loss curve.

When the iteration of validation(eg. 1334) is larger than the interval of evaluation(eg. 200), the curve will be wrong. The code in analyse_logs generated the wrong plot_iter as x.

Reproduction

1.What command or script did you run?
python tools/analyze_logs.py xxx.log.json --keys lr

2.Here's my json logs.
logfile.zip

Environment

MMSegmentation v0.22.1

Bug fix

I have already analyzed the bug, it must be the error of plot_iter caused by 'iter': in the first 'mode': 'val' line in .json logs. The if...continue (line46-lin47) filtering too much useful iters in the list plot_iter. Maybe we can fix it by ignoring the iter of val mode.

                 for idx in range(len(epoch_logs[metric])):
                        if pre_iter > epoch_logs['iter'][idx]:
                            continue
                        pre_iter = epoch_logs['iter'][idx]
                        plot_iters.append(epoch_logs['iter'][idx])
                        plot_values.append(epoch_logs[metric][idx])

However, due to the limitation of my ability, I can't fix it. Sorry!

The text was updated successfully, but these errors were encountered:

MengzhangLI · 2022-03-28T18:18:11Z

Hi, thanks for your kindely remainder. We would fix it as soon as possible!

Best,

MengzhangLI · 2022-03-28T18:22:06Z

Mark: This file needs seaborn package while our default conda environment does not need it.

MengzhangLI · 2022-03-28T18:26:28Z

Try to visulize https://download.openmmlab.com/mmsegmentation/v0.5/convnext/upernet_convnext_tiny_fp16_512x512_160k_ade20k/upernet_convnext_tiny_fp16_512x512_160k_ade20k_20220227_124553.log.json

Y-M-Y · 2022-03-28T18:41:10Z

Try to visulize https://download.openmmlab.com/mmsegmentation/v0.5/convnext/upernet_convnext_tiny_fp16_512x512_160k_ade20k/upernet_convnext_tiny_fp16_512x512_160k_ade20k_20220227_124553.log.json

Thanks，this log also acts good in my environmental. In this log，the "iter" of the "mode": "val" is 250, which is smaller than evaluation interval = 16000， so it won't cause the error.

MengzhangLI · 2022-03-28T18:53:55Z

I find line46-lin47 is desiged for skip val line.

In this case pre_iter is 32000 > epoch_logs['iter'][idx] 250.

When I scaned your log.json, I can not figure it out why your val mode is iter:1334, seems like you train your model on single GPU with 1,334 images in validation. Am I right?

So this phenomenon should we sum up to too many images (1334) in validation + too small val interval (interval=200)? At first several epochs 1334 is larger than its neighbour in epoch_logs['iter'].

MengzhangLI · 2022-03-28T18:55:28Z

By the way, I suggest use resnet50v1c pretrained model rather than checkpoint model trained on cityscapes. Because I think ImageNet1K as an upstream dataset may help your tasks and cityscapes dataset maybe unrelated with your dataset.

Y-M-Y · 2022-03-28T19:11:12Z

The summary is right! I just started learning about segmentation and I'm not familiar with some habits or parameters. Thank you very much for your patience.

Y-M-Y · 2022-03-28T19:16:30Z

By the way, I suggest use resnet50v1c pretrained model rather than checkpoint model trained on cityscapes. Because I think ImageNet1K as an upstream dataset may help your tasks and cityscapes dataset maybe unrelated with your dataset.

Thanks again, my dataset is from kaggle steel-defect-seg. The dataset is really imbalanced and I'm trying to improve the mIoU. Your suggestion means a lot to me!

MengzhangLI · 2022-03-28T20:19:37Z

I have made a PR to fix this bug:#1428.

Just use

  if epoch_logs['mode'][idx] == 'train':
      plot_iters.append(epoch_logs['iter'][idx])
      plot_values.append(epoch_logs[metric][idx])

to replace

  if pre_iter > epoch_logs['iter'][idx]:
      continue

Best,

Y-M-Y · 2022-03-29T03:05:54Z

Oh！I have tried it, but I wrote it in the wrong way! Such a pity!
I used

  if epoch_logs['mode'][idx] is 'train':
      plot_iters.append(epoch_logs['iter'][idx])
      plot_values.append(epoch_logs[metric][idx])

Reality is I confused is and ==!!!

Thanks a lot！I will learn more about python basic.

…b#1334 (open-mmlab#1426) * add AudioDiffusionPipeline and LatentAudioDiffusionPipeline * add docs to toc * fix tests * fix tests * fix tests * fix tests * fix tests * Update pr_tests.yml Fix tests * parent 499ff34b3edc3e0c506313ab48f21514d8f58b09 author teticio <teticio@gmail.com> 1668765652 +0000 committer teticio <teticio@gmail.com> 1669041721 +0000 parent 499ff34b3edc3e0c506313ab48f21514d8f58b09 author teticio <teticio@gmail.com> 1668765652 +0000 committer teticio <teticio@gmail.com> 1669041704 +0000 add colab notebook [Flax] Fix loading scheduler from subfolder (open-mmlab#1319) [FLAX] Fix loading scheduler from subfolder Fix/Enable all schedulers for in-painting (open-mmlab#1331) * inpaint fix k lms * onnox as well * up Correct path to schedlure (open-mmlab#1322) * [Examples] Correct path * uP Avoid nested fix-copies (open-mmlab#1332) * Avoid nested `# Copied from` statements during `make fix-copies` * style Fix img2img speed with LMS-Discrete Scheduler (open-mmlab#896) Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the `integrate.quad` call later on- by long I mean more than 10x slower. Co-authored-by: Anton Lozhkov <anton@huggingface.co> Fix the order of casts for onnx inpainting (open-mmlab#1338) Legacy Inpainting Pipeline for Onnx Models (open-mmlab#1237) * Add legacy inpainting pipeline compatibility for onnx * remove commented out line * Add onnx legacy inpainting test * Fix slow decorators * pep8 styling * isort styling * dummy object * ordering consistency * style * docstring styles * Refactor common prompt encoding pattern * Update tests to permanent repository home * support all available schedulers until ONNX IO binding is available Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * updated styling from PR suggested feedback Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Jax infer support negative prompt (open-mmlab#1337) * support negative prompts in sd jax pipeline * pass batched neg_prompt * only encode when negative prompt is None Co-authored-by: Juan Acevedo <jfacevedo@google.com> Update README.md: Minor change to Imagic code snippet, missing dir error (open-mmlab#1347) Minor change to Imagic Readme Missing dir causes an error when running the example code. make style change the sample model (open-mmlab#1352) * Update alt_diffusion.mdx * Update alt_diffusion.mdx Add bit diffusion [WIP] (open-mmlab#971) * Create bit_diffusion.py Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG * adding bit diffusion to new branch ran tests * tests * tests * tests * tests * removed test folders + added to README * Update README.md Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * move Mel to module in pipeline construction, make librosa optional * fix imports * fix copy & paste error in comment * fix style * add missing register_to_config * fix class docstrings * fix class docstrings * tweak docstrings * tweak docstrings * update slow test * put trailing commas back * respect alphabetical order * remove LatentAudioDiffusion, make vqvae optional * move Mel from models back to pipelines :-) * allow loading of pretrained audiodiffusion models * fix tests * fix dummies * remove reference to latent_audio_diffusion in docs * unused import * inherit from SchedulerMixin to make loadable * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

MengzhangLI self-assigned this Mar 28, 2022

MengzhangLI mentioned this issue Mar 28, 2022

[Fix] Fix bug in tools/analyse_logs.py caused by wrong plot_iter in some cases. #1428

Merged

Y-M-Y closed this as completed Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A small bug in tools/analyse_logs.py caused by wrong plot_iter in some cases. #1426

A small bug in tools/analyse_logs.py caused by wrong plot_iter in some cases. #1426

Y-M-Y commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022 •

edited

Loading

Y-M-Y commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022 •

edited

Loading

MengzhangLI commented Mar 28, 2022

Y-M-Y commented Mar 28, 2022

Y-M-Y commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022 •

edited

Loading

Y-M-Y commented Mar 29, 2022

A small bug in tools/analyse_logs.py caused by wrong plot_iter in some cases. #1426

A small bug in tools/analyse_logs.py caused by wrong plot_iter in some cases. #1426

Comments

Y-M-Y commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022 • edited Loading

Y-M-Y commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022 • edited Loading

MengzhangLI commented Mar 28, 2022

Y-M-Y commented Mar 28, 2022

Y-M-Y commented Mar 28, 2022

MengzhangLI commented Mar 28, 2022 • edited Loading

Y-M-Y commented Mar 29, 2022

MengzhangLI commented Mar 28, 2022 •

edited

Loading

MengzhangLI commented Mar 28, 2022 •

edited

Loading

MengzhangLI commented Mar 28, 2022 •

edited

Loading