Validation during training (version 2) #828

osanwe · 2019-05-27T10:18:02Z

After the discussion in another issue thread (#785) and additional code studying, I decided to simplify the approach for tracking validation during training for earlier stopping possibility.

Instead of an additional dataset, the parameter SOLVER.TEST_PERIOD (as SOLVER.CHECKPOINT_PERIOD) is added for specifying the iterations number for validation logging. The same datasets are used for intermediate and final evaluations.

Also in this version both losses and AP (with inference method) are calculated.

qihao-huang · 2019-05-28T07:57:45Z

maskrcnn_benchmark/engine/trainer.py

+                    meters_val.update(loss=losses_reduced, **loss_dict_reduced)
+            synchronize()
+            logger.info(
+                meters.delimiter.join(


Should meters be meters_val ?

It does not matter here because meters and meters_val have the same delimiter, but yes, ideally meters_val should be here. I'll fix this.

qihao-huang · 2019-05-30T05:17:32Z

configs/e2e_mask_rcnn_R_50_FPN_1x_periodically_testing.yaml

+    SHARE_BOX_FEATURE_EXTRACTOR: False
+  MASK_ON: True
+DATASETS:
+  TRAIN: ("coco_2014_train", "coco_2014_valminusminival")


I don't understand why we have to add two data sets in TRAIN: ("coco_2014_train", "coco_2014_valminusminival").

Only one data set will be returned in maskrcnn_benchmark/data/build.py 's function build_dataset:

# for training, concatenate all datasets into a single one dataset = datasets[0] if len(datasets) > 1: dataset = D.ConcatDataset(datasets) return [dataset]

datasets is a list, so dataset is coco_2014_train, right?

And, Question 2:
Why you delete the VAL? From my perspective view, TEST is TEST, VAL is VAL. They are different distribution data set, right?

Thank you so much for your work.

Regarding to Question 1:
In the highlighted code snippet datasets are concatenated if there are more than 1 dataset in the TRAIN field.

Regarding to Question 2:
As discussed in #785 (proposed by @fmassa) in this case a separate validation dataset is needed rarely because you do not change hyperparameters when a training script works. After network tuning you can get the best model variant (evaluated on validation dataset which is TEST here) and run tools/test_net.py with another dataset.

Thanks for your patient and nice reply : )

xiaohai12 · 2019-06-13T10:01:35Z

sorry, Is the validation part already merged into code?

osanwe · 2019-06-14T09:26:14Z

@xiaohai12,
Unfortunately not. Waiting for @fmassa's review I think.

botcs · 2019-09-29T22:21:08Z

Hi @osanwe,

Thanks for this awesome PR, this one's extremely useful!🎉

chenjoya · 2019-10-08T11:36:22Z

Thanks for your implementation. But after evaluation, the training is stopped. This is strange.
Case:
When I train maskrcnn_R_50_FPN_1x, I set the period = 30000. In 30000 iterations, the program evaluates the AP on COCO minival. But then, the training seems to be stopped.
Hope your attention.

botcs · 2019-10-08T22:55:47Z

@chenjoya does it stop without any errors?

elepherai · 2019-10-14T09:44:36Z

@chenjoya Maybe it's CUDA out-of-memory problem.

sarahmass · 2019-10-18T17:02:50Z

configs/e2e_mask_rcnn_R_50_FPN_1x_periodically_testing.yaml

+  WEIGHT_DECAY: 0.0001
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+  TEST_PERIOD: 2500


I hope this isn't a silly question, but can you explain why you made the decision to change the BASE_LR : 0.02 (default: 0.001), WEIGHT_DECAY: 0.0001(default: 0.0005) and STEPS:(60000, 80000)? If these have been answered in a previous issue, I wouldn't mind being pointed to that discussion. Thank you for you time!

chenjoya · 2019-10-19T02:14:29Z

Sorry. That's may be my CPU resource is not enough. Now it works well. Thanks! ＾∀＾

p.vytovtov added 3 commits May 24, 2019 19:01

Added support of periodically testing during training.

f278c53

Added losses logging periodically.

10b11c4

Getting correct data for evaluation.

69c46ba

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label May 27, 2019

osanwe mentioned this pull request May 27, 2019

Validation datasets support during training #785

Open

qihao-huang reviewed May 28, 2019

View reviewed changes

Fixed validation dataset forming.

e000d4c

qihao-huang reviewed May 30, 2019

View reviewed changes

xiaohai12 mentioned this pull request Jun 11, 2019

Step-by-step tutorial - How to train your own dataset #521

Open

Jinksi mentioned this pull request Jul 9, 2019

Validate during training at each checkpoint Jinksi/maskrcnn-benchmark#2

Closed

botcs merged commit 0ce8f6f into facebookresearch:master Sep 29, 2019

sarahmass reviewed Oct 18, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation during training (version 2) #828

Validation during training (version 2) #828

osanwe commented May 27, 2019

qihao-huang May 28, 2019

osanwe May 28, 2019

qihao-huang May 30, 2019 •

edited

Loading

osanwe May 30, 2019

qihao-huang May 30, 2019

xiaohai12 commented Jun 13, 2019

osanwe commented Jun 14, 2019 •

edited

Loading

botcs commented Sep 29, 2019

chenjoya commented Oct 8, 2019

botcs commented Oct 8, 2019

elepherai commented Oct 14, 2019

sarahmass Oct 18, 2019

chenjoya commented Oct 19, 2019

Validation during training (version 2) #828

Validation during training (version 2) #828

Conversation

osanwe commented May 27, 2019

qihao-huang May 28, 2019

Choose a reason for hiding this comment

osanwe May 28, 2019

Choose a reason for hiding this comment

qihao-huang May 30, 2019 • edited Loading

Choose a reason for hiding this comment

osanwe May 30, 2019

Choose a reason for hiding this comment

qihao-huang May 30, 2019

Choose a reason for hiding this comment

xiaohai12 commented Jun 13, 2019

osanwe commented Jun 14, 2019 • edited Loading

botcs commented Sep 29, 2019

chenjoya commented Oct 8, 2019

botcs commented Oct 8, 2019

elepherai commented Oct 14, 2019

sarahmass Oct 18, 2019

Choose a reason for hiding this comment

chenjoya commented Oct 19, 2019

qihao-huang May 30, 2019 •

edited

Loading

osanwe commented Jun 14, 2019 •

edited

Loading