Significant Differences in Evaluation Results on the Validation Set Between `train.py` During Training and `test.py` in YOLOv5 5.0 #13485

3210448723 · 2025-01-09T12:21:19Z

Search before asking

I have searched the YOLOv5 issues and discussions and found no similar questions.

Question

YOLOv5 5.0版本在train.py训练过程中在验证集上的评估结果与test.py在验证集上的评估结果具有显著差异

Why is this happening? The results in test.py are much higher than those obtained during validation after adjusting the number of epochs, and this phenomenon occurs in most epochs. The results from test.py are extremely ideal and do not match the actual performance. Below is a portion of the output logs.
为什么会这样？test中的结果比改epoch后进行验证的结果高了很多，而且大多数epoch都有这样的现象，test.py的结果极其理想，与实际不符。下面是部分输出日志

Evaluation Output of `train.py` on the Validation Set at Epoch 115

2024-12-12 14:37:44,182 - INFO - YOLOv5 🚀 5211d5c torch 2.4.1+cu124 CUDA:0 (NVIDIA GeForce RTX 3090, 24154.375MB)
                                   CUDA:1 (NVIDIA GeForce RTX 3090, 24154.375MB)
                                   CUDA:2 (NVIDIA GeForce RTX 3090, 24154.375MB)
                                   CUDA:3 (NVIDIA GeForce RTX 3090, 24154.375MB)

2024-12-12 14:37:44,192 - INFO - Namespace(adam=False, artifact_alias='latest', batch_size=32, bbox_interval=-1, bucket='', cache_images=True, cfg='', data='data/fankou/EnhancedDataset.yaml', device='0,1,2,3', entity=None, epochs=300, evolve=False, exist_ok=False, global_rank=-1, hyp='data/fankou/hyp.yaml', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=False, nosave=False, notest=False, offline=True, project='runs/train', quad=False, rect=False, resume=True, save_dir='runs/train/exp', save_period=1, single_cls=False, sync_bn=False, total_batch_size=32, upload_dataset=False, weights='./runs/train/exp/weights/last.pt', workers=8, world_size=1)
2024-12-12 14:37:44,193 - INFO - �[34m�[1mtensorboard: �[0mStart with 'tensorboard --logdir runs/train', view at http://localhost:6006/
2024-12-12 14:37:44,194 - INFO - �[34m�[1mhyperparameters: �[0mlr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.0375, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, label_smoothing=0.0
2024-12-12 14:37:47,177 - INFO - 
                 from  n    params  module                                  arguments                     
2024-12-12 14:37:47,181 - INFO -   0                -1  1      7040  models.common.Focus                     [3, 64, 3]                    
2024-12-12 14:37:47,182 - INFO -   1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
2024-12-12 14:37:47,185 - INFO -   2                -1  1    156928  models.common.C3                        [128, 128, 3]                 
2024-12-12 14:37:47,187 - INFO -   3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
2024-12-12 14:37:47,199 - INFO -   4                -1  1   1611264  models.common.C3                        [256, 256, 9]                 
2024-12-12 14:37:47,205 - INFO -   5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
2024-12-12 14:37:47,248 - INFO -   6                -1  1   6433792  models.common.C3                        [512, 512, 9]                 
2024-12-12 14:37:47,277 - INFO -   7                -1  1   4720640  models.common.Conv                      [512, 1024, 3, 2]             
2024-12-12 14:37:47,296 - INFO -   8                -1  1   2624512  models.common.SPP                       [1024, 1024, [5, 9, 13]]      
2024-12-12 14:37:47,359 - INFO -   9                -1  1   9971712  models.common.C3                        [1024, 1024, 3, False]        
2024-12-12 14:37:47,363 - INFO -  10                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
2024-12-12 14:37:47,363 - INFO -  11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
2024-12-12 14:37:47,363 - INFO -  12           [-1, 6]  1         0  models.common.Concat                    [1]                           
2024-12-12 14:37:47,382 - INFO -  13                -1  1   2757632  models.common.C3                        [1024, 512, 3, False]         
2024-12-12 14:37:47,383 - INFO -  14                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
2024-12-12 14:37:47,383 - INFO -  15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
2024-12-12 14:37:47,383 - INFO -  16           [-1, 4]  1         0  models.common.Concat                    [1]                           
2024-12-12 14:37:47,390 - INFO -  17                -1  1    690688  models.common.C3                        [512, 256, 3, False]          
2024-12-12 14:37:47,394 - INFO -  18                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
2024-12-12 14:37:47,394 - INFO -  19          [-1, 14]  1         0  models.common.Concat                    [1]                           
2024-12-12 14:37:47,411 - INFO -  20                -1  1   2495488  models.common.C3                        [512, 512, 3, False]          
2024-12-12 14:37:47,426 - INFO -  21                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
2024-12-12 14:37:47,426 - INFO -  22          [-1, 10]  1         0  models.common.Concat                    [1]                           
2024-12-12 14:37:47,490 - INFO -  23                -1  1   9971712  models.common.C3                        [1024, 1024, 3, False]        
2024-12-12 14:37:47,491 - INFO -  24      [17, 20, 23]  1     59235  models.yolo.Detect                      [6, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [256, 512, 1024]]
2024-12-12 14:37:47,781 - INFO - Model Summary: 499 layers, 46658275 parameters, 46658275 gradients, 114.6 GFLOPS
2024-12-12 14:37:47,781 - INFO - 
2024-12-12 14:37:47,890 - INFO - Transferred 650/650 items from ./runs/train/exp/weights/last.pt
2024-12-12 14:37:47,966 - INFO - Scaled weight_decay = 0.0005
2024-12-12 14:37:47,970 - INFO - Optimizer groups: 110 .bias, 110 conv.weight, 107 other
2024-12-12 14:39:57,377 - INFO - Image sizes 640 train, 640 test
Using 8 dataloader workers
Logging results to runs/train/exp
Starting training for 300 epochs...

2024-12-13 00:16:49,333 - INFO - 
�[34m�[1mtest:�[0m data: {'train': ['/home/user/yuanjinmin/数据集/模型训练/train', '/home/user/yuanjinmin/dataset/obj_train_data/train_pro'], 'val': ['/home/user/yuanjinmin/数据集/模型训练/val', '/home/user/yuanjinmin/dataset/obj_train_data/val_pro'], 'nc': 6, 'names': ['unhelmet', 'helmet', 'cigarette', 'fire', 'smoke', 'safebelt']}, weight: None, batch_size: 64, imgsz: 640, conf_thres: 0.001, iou_thres: 0.6, save_json: False, single_cls: False, augment: False, verbose: True, dataloader: <utils.datasets.InfiniteDataLoader object at 0x784bc87e5100>, save_dir: runs/train/exp, save_txt: False, save_hybrid: False, save_conf: False, plots: False, wandb_logger: <utils.wandb_logging.wandb_utils.WandbLogger object at 0x784bd86e1610>, compute_loss: <utils.loss.ComputeLoss object at 0x784bd5c4e790>, half_precision: True, is_coco: False
2024-12-13 00:17:12,163 - INFO -                Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95
2024-12-13 00:17:12,163 - INFO -                  all        3925       22419       0.745       0.731       0.722       0.406
2024-12-13 00:17:12,164 - INFO -             unhelmet        3925        9246       0.903       0.939       0.938       0.538
2024-12-13 00:17:12,164 - INFO -               helmet        3925       10645        0.86       0.927       0.943       0.754
2024-12-13 00:17:12,164 - INFO -            cigarette        3925         761       0.631       0.618       0.594       0.229
2024-12-13 00:17:12,164 - INFO -                 fire        3925         808        0.57        0.64         0.6       0.325
2024-12-13 00:17:12,164 - INFO -                smoke        3925         717       0.602       0.351        0.35       0.135
2024-12-13 00:17:12,164 - INFO -             safebelt        3925         242       0.904       0.913        0.91       0.457

Evaluation Output of `test.py` on the Validation Set for the Model at Epoch 115

2025-01-09 20:15:37,890 - INFO - Namespace(augment=False, batch_size=64, conf_thres=0.001, data='data/fankou/EnhancedDataset.yaml', device='0,1,2,3', exist_ok=False, img_size=640, iou_thres=0.6, name='exp', project='runs/test', save_conf=True, save_hybrid=True, save_json=False, save_txt=True, single_cls=False, task='val', verbose=True, weights='runs/train/exp/weights/epoch_115.pt')
2025-01-09 20:15:39,100 - INFO - Fusing layers...
2025-01-09 20:15:40,267 - INFO - Model Summary: 392 layers, 46627491 parameters, 0 gradients, 114.0 GFLOPS
2025-01-09 20:17:21,449 - INFO -                Class      Images      Labels           P           R      mAP@.5  mAP@.5:.95
2025-01-09 20:17:21,449 - INFO -                  all        3925       22419           1           1       0.995       0.995
2025-01-09 20:17:21,449 - INFO -             unhelmet        3925        9246           1           1       0.996       0.996
2025-01-09 20:17:21,449 - INFO -               helmet        3925       10645           1       0.999       0.996       0.996
2025-01-09 20:17:21,449 - INFO -            cigarette        3925         761           1           1       0.995       0.995
2025-01-09 20:17:21,450 - INFO -                 fire        3925         808           1           1       0.995       0.995
2025-01-09 20:17:21,450 - INFO -                smoke        3925         717           1           1       0.995       0.995
2025-01-09 20:17:21,450 - INFO -             safebelt        3925         242           1           1       0.995       0.995
2025-01-09 20:17:21,450 - INFO - Speed: 3.2/1.8/4.9 ms inference/NMS/total per 640x640 image at batch-size 64
2025-01-09 20:17:22,088 - INFO - Results saved to runs/test/exp
3925 labels saved to runs/test/exp/labels

Additional

No response

The text was updated successfully, but these errors were encountered:

UltralyticsAssistant · 2025-01-09T12:22:19Z

👋 Hello @3210448723, thank you for bringing this to our attention and for your interest in YOLOv5 🚀!

It seems you're encountering differences in evaluation metrics between train.py and test.py. This discrepancy might arise due to differences in how the evaluation is performed during training versus testing. To assist you better, could you please share a minimum reproducible example (MRE)? This should include:

The exact commands you used for both train.py and test.py.
Relevant portions of your dataset or configuration files.
Specific details about your training and testing pipelines (e.g., augmentations, hyperparameters, evaluation settings).
Versions of YOLOv5, Python, and PyTorch being used.

Additionally, ensure that your environment satisfies these minimum requirements:

Python>=3.8.0
All dependencies installed as per the requirements.txt included in the repository
PyTorch>=1.8 and correctly set up CUDA (if using GPU)

If applicable, confirm whether you are running YOLOv5 in a local environment or in a cloud-based environment (such as Colab, Paperspace, etc.).

This is an automated response to help guide resolution, and an Ultralytics engineer will assist you further soon. Let us know if you need additional clarification! 😊

pderrenger · 2025-01-09T17:57:47Z

@3210448723 the significant differences in evaluation results between train.py and test.py likely stem from differences in evaluation configurations, such as augmentation settings, confidence thresholds, or IoU thresholds. During training, train.py typically uses validation with partial augmentations and real-time adjustments, while test.py evaluates the model in a purely inference-focused environment without training-specific nuances.

To investigate further:

Ensure both scripts use consistent configurations for evaluation (e.g., --augment, imgsz, conf_thres, iou_thres).
Check if the dataset and preprocessing steps are identical for both scripts.
Confirm the test.py command is evaluating the same checkpoint as the one saved during training.

For additional details on validation differences, consult the YOLOv5 validation documentation. Let me know if you need further clarification!

3210448723 added the question Further information is requested label Jan 9, 2025

UltralyticsAssistant added the detect Object Detection issues, PR's label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant Differences in Evaluation Results on the Validation Set Between `train.py` During Training and `test.py` in YOLOv5 5.0 #13485

Significant Differences in Evaluation Results on the Validation Set Between `train.py` During Training and `test.py` in YOLOv5 5.0 #13485

3210448723 commented Jan 9, 2025

UltralyticsAssistant commented Jan 9, 2025

pderrenger commented Jan 9, 2025

Significant Differences in Evaluation Results on the Validation Set Between train.py During Training and test.py in YOLOv5 5.0 #13485

Significant Differences in Evaluation Results on the Validation Set Between train.py During Training and test.py in YOLOv5 5.0 #13485

Comments

3210448723 commented Jan 9, 2025

Search before asking

Question

YOLOv5 5.0版本在train.py训练过程中在验证集上的评估结果与test.py在验证集上的评估结果具有显著差异

Evaluation Output of train.py on the Validation Set at Epoch 115

Evaluation Output of test.py on the Validation Set for the Model at Epoch 115

Additional

UltralyticsAssistant commented Jan 9, 2025

pderrenger commented Jan 9, 2025

Significant Differences in Evaluation Results on the Validation Set Between `train.py` During Training and `test.py` in YOLOv5 5.0 #13485

Significant Differences in Evaluation Results on the Validation Set Between `train.py` During Training and `test.py` in YOLOv5 5.0 #13485

Evaluation Output of `train.py` on the Validation Set at Epoch 115

Evaluation Output of `test.py` on the Validation Set for the Model at Epoch 115