You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your article and project. Before training, I qualitatively tested a batch of data using pre-trained YOLO series models (5-11) and models such as Dino, Co-Detr, Florence2(Fine tune and pre-train weights on my data), YOLOWorld, etc. The statistics showed that the results of Co-Detr and Dino were the best, which could approximate the version I trained on YOLOV7(Having parameter 37622682) based on isolated data.
So I tried to replicate the situation in the literature on Coco and found that it tended towards your results. Afterwards, I converted my training set to coco format (from millions of level data collected by urban surveillance cameras in real environments, including four categories: faces, human bodies, vehicles, and non motorized vehicles; after cleaning and analyzing the data, I selected 110w images as the training set), and then I trained the Dino model in ResNet50 and Swin_L3-384_22k versions. During the training process and yolo series comparison found some situations:
1.I found that Dino's results were not as good as the YOLOV10x(Having parameter 31662584) version on my own dataset (which was the same for both the validation and test sets).
2.I found that Dino(Having parameter 217163332) with a larger number of parameters has higher accuracy than Dino-ResNet50 (Having parameter 46604048)in human and non motorized vehicles, and lower accuracy in other indicators. Considering the experiments with the same data in the YOLO series and some anchor free models, larger parameters (Especially in terms of several parameter differences) should improve the indicators. I am not sure about the reason for this situation. Due to equipment limitations, the two training sessions were conducted on different hardware, which naturally resulted in different initial random numbers. Additionally, non deterministic operators were used in Dino, and the results could not be fixed (a higher result was found in the replication experiment in dn-detr-resnet50-12 epochs, but unfortunately only images were saved and the storage address was forgotten to be changed during subsequent testing). At the same time, my training set has a special definition of the human body for cycling drivers (involving both the human body and non motorized vehicles), and I suspect that this may have caused the decline in the indicators.
3.Because the training cannot be fully replicated, is your result the average of multiple training sessions? Previously, in some other projects, it was found that randomness had a significant impact on the results on simple datasets (with fewer categories but a lot of scenarios) and large datasets. At that time, the statistical data could reach plus or minus 2% of the baseline.
4.During training, it was found that the model is unstable (batch size=2 for each card), and training interruptions often occur. The error message is as follows: 'torch. distributed. last. multiprocessing. app: [ERROR] failed (exitcode: -9) local_rank: 3 (pid: 1064263) of binary:/opt/conda/bin/python'. The follow-up investigation may be related to CPU. Have you encountered similar problems before?
Looking forward to your reply. (I'm very sorry for asking you many questions. thank you.)
The text was updated successfully, but these errors were encountered:
Thank you for your article and project. Before training, I qualitatively tested a batch of data using pre-trained YOLO series models (5-11) and models such as Dino, Co-Detr, Florence2(Fine tune and pre-train weights on my data), YOLOWorld, etc. The statistics showed that the results of Co-Detr and Dino were the best, which could approximate the version I trained on YOLOV7(Having parameter 37622682) based on isolated data.
So I tried to replicate the situation in the literature on Coco and found that it tended towards your results. Afterwards, I converted my training set to coco format (from millions of level data collected by urban surveillance cameras in real environments, including four categories: faces, human bodies, vehicles, and non motorized vehicles; after cleaning and analyzing the data, I selected 110w images as the training set), and then I trained the Dino model in ResNet50 and Swin_L3-384_22k versions. During the training process and yolo series comparison found some situations:
1.I found that Dino's results were not as good as the YOLOV10x(Having parameter 31662584) version on my own dataset (which was the same for both the validation and test sets).
2.I found that Dino(Having parameter 217163332) with a larger number of parameters has higher accuracy than Dino-ResNet50 (Having parameter 46604048)in human and non motorized vehicles, and lower accuracy in other indicators. Considering the experiments with the same data in the YOLO series and some anchor free models, larger parameters (Especially in terms of several parameter differences) should improve the indicators. I am not sure about the reason for this situation. Due to equipment limitations, the two training sessions were conducted on different hardware, which naturally resulted in different initial random numbers. Additionally, non deterministic operators were used in Dino, and the results could not be fixed (a higher result was found in the replication experiment in dn-detr-resnet50-12 epochs, but unfortunately only images were saved and the storage address was forgotten to be changed during subsequent testing). At the same time, my training set has a special definition of the human body for cycling drivers (involving both the human body and non motorized vehicles), and I suspect that this may have caused the decline in the indicators.
3.Because the training cannot be fully replicated, is your result the average of multiple training sessions? Previously, in some other projects, it was found that randomness had a significant impact on the results on simple datasets (with fewer categories but a lot of scenarios) and large datasets. At that time, the statistical data could reach plus or minus 2% of the baseline.
4.During training, it was found that the model is unstable (batch size=2 for each card), and training interruptions often occur. The error message is as follows: 'torch. distributed. last. multiprocessing. app: [ERROR] failed (exitcode: -9) local_rank: 3 (pid: 1064263) of binary:/opt/conda/bin/python'. The follow-up investigation may be related to CPU. Have you encountered similar problems before?
Looking forward to your reply. (I'm very sorry for asking you many questions. thank you.)
The text was updated successfully, but these errors were encountered: