Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A small bug in tools/analyse_logs.py caused by wrong plot_iter in some cases. #1426

Closed
Y-M-Y opened this issue Mar 28, 2022 · 10 comments · Fixed by #1428
Closed

A small bug in tools/analyse_logs.py caused by wrong plot_iter in some cases. #1426

Y-M-Y opened this issue Mar 28, 2022 · 10 comments · Fixed by #1428
Assignees

Comments

@Y-M-Y
Copy link

Y-M-Y commented Mar 28, 2022

Hi, thank you for constructing such a fabulous toolbox. I meet a problem in using tools/analyse_logs.py. To be specific, there might be some bugs with the variable plot_iter in plotting iter-based curves.

Describe the bug
When using tools/analyse_logs.py to plot curves of iter-based variables such lr and loss, the curve will be wrong like the following figs. The lr schedule is poly, the curve should be a line, but it is wrong from iter200 to iter1344. So as the loss curve.
image
image

When the iteration of validation(eg. 1334) is larger than the interval of evaluation(eg. 200), the curve will be wrong. The code in analyse_logs generated the wrong plot_iter as x.

Reproduction

1.What command or script did you run?
python tools/analyze_logs.py xxx.log.json --keys lr

2.Here's my json logs.
logfile.zip

Environment

MMSegmentation v0.22.1

Bug fix

I have already analyzed the bug, it must be the error of plot_iter caused by 'iter': in the first 'mode': 'val' line in .json logs. The if...continue (line46-lin47) filtering too much useful iters in the list plot_iter. Maybe we can fix it by ignoring the iter of val mode.

                 for idx in range(len(epoch_logs[metric])):
                        if pre_iter > epoch_logs['iter'][idx]:
                            continue
                        pre_iter = epoch_logs['iter'][idx]
                        plot_iters.append(epoch_logs['iter'][idx])
                        plot_values.append(epoch_logs[metric][idx])

However, due to the limitation of my ability, I can't fix it. Sorry!

@MengzhangLI
Copy link
Contributor

Hi, thanks for your kindely remainder. We would fix it as soon as possible!

Best,

@MengzhangLI MengzhangLI self-assigned this Mar 28, 2022
@MengzhangLI
Copy link
Contributor

Mark: This file needs seaborn package while our default conda environment does not need it.

image

@Y-M-Y
Copy link
Author

Y-M-Y commented Mar 28, 2022

Try to visulize https://download.openmmlab.com/mmsegmentation/v0.5/convnext/upernet_convnext_tiny_fp16_512x512_160k_ade20k/upernet_convnext_tiny_fp16_512x512_160k_ade20k_20220227_124553.log.json

image

image

Thanks,this log also acts good in my environmental. In this log,the "iter" of the "mode": "val" is 250, which is smaller than evaluation interval = 16000, so it won't cause the error.

@MengzhangLI
Copy link
Contributor

MengzhangLI commented Mar 28, 2022

I find line46-lin47 is desiged for skip val line.

image

In this case pre_iter is 32000 > epoch_logs['iter'][idx] 250.

When I scaned your log.json, I can not figure it out why your val mode is iter:1334, seems like you train your model on single GPU with 1,334 images in validation. Am I right?

image

So this phenomenon should we sum up to too many images (1334) in validation + too small val interval (interval=200)? At first several epochs 1334 is larger than its neighbour in epoch_logs['iter'].

@MengzhangLI
Copy link
Contributor

By the way, I suggest use resnet50v1c pretrained model rather than checkpoint model trained on cityscapes. Because I think ImageNet1K as an upstream dataset may help your tasks and cityscapes dataset maybe unrelated with your dataset.

@Y-M-Y
Copy link
Author

Y-M-Y commented Mar 28, 2022

The summary is right! I just started learning about segmentation and I'm not familiar with some habits or parameters. Thank you very much for your patience.

@Y-M-Y
Copy link
Author

Y-M-Y commented Mar 28, 2022

By the way, I suggest use resnet50v1c pretrained model rather than checkpoint model trained on cityscapes. Because I think ImageNet1K as an upstream dataset may help your tasks and cityscapes dataset maybe unrelated with your dataset.

Thanks again, my dataset is from kaggle steel-defect-seg. The dataset is really imbalanced and I'm trying to improve the mIoU. Your suggestion means a lot to me!

@MengzhangLI
Copy link
Contributor

MengzhangLI commented Mar 28, 2022

I have made a PR to fix this bug:#1428.

Just use

  if epoch_logs['mode'][idx] == 'train':
      plot_iters.append(epoch_logs['iter'][idx])
      plot_values.append(epoch_logs[metric][idx])

to replace

  if pre_iter > epoch_logs['iter'][idx]:
      continue

Best,

@Y-M-Y
Copy link
Author

Y-M-Y commented Mar 29, 2022

Oh!I have tried it, but I wrote it in the wrong way! Such a pity!
I used

  if epoch_logs['mode'][idx] is 'train':
      plot_iters.append(epoch_logs['iter'][idx])
      plot_values.append(epoch_logs[metric][idx])

Reality is I confused is and ==!!!

Thanks a lot!I will learn more about python basic.

@Y-M-Y Y-M-Y closed this as completed Mar 30, 2022
aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this issue Mar 27, 2023
…b#1334 (open-mmlab#1426)

* add AudioDiffusionPipeline and LatentAudioDiffusionPipeline

* add docs to toc

* fix tests

* fix tests

* fix tests

* fix tests

* fix tests

* Update pr_tests.yml

Fix tests

* parent 499ff34b3edc3e0c506313ab48f21514d8f58b09
author teticio <teticio@gmail.com> 1668765652 +0000
committer teticio <teticio@gmail.com> 1669041721 +0000

parent 499ff34b3edc3e0c506313ab48f21514d8f58b09
author teticio <teticio@gmail.com> 1668765652 +0000
committer teticio <teticio@gmail.com> 1669041704 +0000

add colab notebook

[Flax] Fix loading scheduler from subfolder (open-mmlab#1319)

[FLAX] Fix loading scheduler from subfolder

Fix/Enable all schedulers for in-painting (open-mmlab#1331)

* inpaint fix k lms

* onnox as well

* up

Correct path to schedlure (open-mmlab#1322)

* [Examples] Correct path

* uP

Avoid nested fix-copies (open-mmlab#1332)

* Avoid nested `# Copied from` statements during `make fix-copies`

* style

Fix img2img speed with LMS-Discrete Scheduler (open-mmlab#896)

Casting `self.sigmas` into a different dtype (the one of original_samples) is not advisable. In my img2img pipeline this leads to a long running time in the  `integrate.quad` call later on- by long I mean more than 10x slower.

Co-authored-by: Anton Lozhkov <anton@huggingface.co>

Fix the order of casts for onnx inpainting (open-mmlab#1338)

Legacy Inpainting Pipeline for Onnx Models (open-mmlab#1237)

* Add legacy inpainting pipeline compatibility for onnx

* remove commented out line

* Add onnx legacy inpainting test

* Fix slow decorators

* pep8 styling

* isort styling

* dummy object

* ordering consistency

* style

* docstring styles

* Refactor common prompt encoding pattern

* Update tests to permanent repository home

* support all available schedulers until ONNX IO binding is available

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* updated styling from PR suggested feedback

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

Jax infer support negative prompt (open-mmlab#1337)

* support negative prompts in sd jax pipeline

* pass batched neg_prompt

* only encode when negative prompt is None

Co-authored-by: Juan Acevedo <jfacevedo@google.com>

Update README.md: Minor change to Imagic code snippet, missing dir error (open-mmlab#1347)

Minor change to Imagic Readme

Missing dir causes an error when running the example code.

make style

change the sample model (open-mmlab#1352)

* Update alt_diffusion.mdx

* Update alt_diffusion.mdx

Add bit diffusion [WIP] (open-mmlab#971)

* Create bit_diffusion.py

Bit diffusion based on the paper, arXiv:2208.04202, Chen2022AnalogBG

* adding bit diffusion to new branch

ran tests

* tests

* tests

* tests

* tests

* removed test folders + added to README

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move Mel to module in pipeline construction, make librosa optional

* fix imports

* fix copy & paste error in comment

* fix style

* add missing register_to_config

* fix class docstrings

* fix class docstrings

* tweak docstrings

* tweak docstrings

* update slow test

* put trailing commas back

* respect alphabetical order

* remove LatentAudioDiffusion, make vqvae optional

* move Mel from models back to pipelines :-)

* allow loading of pretrained audiodiffusion models

* fix tests

* fix dummies

* remove reference to latent_audio_diffusion in docs

* unused import

* inherit from SchedulerMixin to make loadable

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants