Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to find the best.pt is the result of which epoch? #8701

Closed
1 task done
xiaohangguo opened this issue Jul 24, 2022 · 28 comments
Closed
1 task done

How to find the best.pt is the result of which epoch? #8701

xiaohangguo opened this issue Jul 24, 2022 · 28 comments
Labels
question Further information is requested

Comments

@xiaohangguo
Copy link

Search before asking

Question

1.After iterating many times, I trained a model and produced a "best.pt" file. I know its meaning. My question is: how do I know which training result it is? In other words, can I find the data result of which training it is in result.csv?
2.During the experiment, I found that after the training model is completed, it may break inexplicably, but yolov5 will count the experimental results at the end of the training, draw the f1/p/pr/r/result curve, and produce a train_ batch val_ batch val_ PRED... What should I do if this happens? The training has been completed, but the visualization results have not been counted. I only found the code for drawing several images on the Internet, which is the code calling yolov5, but I can't get all these images.what should I do?

Additional

This is the situation I described. Every time I solve it, I practice it again. This is the most direct but stupid way
2022-07-24 19-41-33 的屏幕截图
This is a normal result
2022-07-24 19-39-49 的屏幕截图

@xiaohangguo xiaohangguo added the question Further information is requested label Jul 24, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jul 24, 2022

👋 Hello @xiaohangguo, thank you for your interest in YOLOv5 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://ultralytics.com or email support@ultralytics.com.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@xiaohangguo best.pt is saved on every maximum fitness epoch. For detection models fitness is essentially mAP@0.5:0.95:

yolov5/utils/metrics.py

Lines 15 to 19 in b367860

def fitness(x):
# Model fitness as a weighted combination of metrics
w = [0.0, 0.0, 0.1, 0.9] # weights for [P, R, mAP@0.5, mAP@0.5:0.95]
return (x[:, :4] * w).sum(1)

I don't understand your other question, but you can validate trained models easily using val.py which will create all output images like confusion matrices, PR curves, etc.

python val.py --data ... --weights ...

@xiaohangguo
Copy link
Author

1.thank you for your help. And your mean that the best.ptis the bigest of mAP@[0.5:0.95]?
2.ok,i will try it ,I think it should be that the computer stuck when saving the drawing result image caused the drawing to fail, I mean that the final image was not exported in the default way, but these seem to be unimportant, haha

@xiaohangguo
Copy link
Author

Can you give me some suggestions on the training of super parameter learning? I understand the meaning of "weight_caution"'warmupepochs "and other parameters. What I want to ask is, if I want to do some experiments and optimize the model by adjusting the size of super parameters, how should I start? Do you have any suggestions?

@glenn-jocher
Copy link
Member

glenn-jocher commented Jul 25, 2022

@xiaohangguo you can try to tune hyperparameters manually, or you can evolve hyperparameters. Evolution takes a lot of time and resources but is a good solution that requires little human oversight. See Hyperparameter Evolution tutorial to get started.

YOLOv5 Tutorials

Good luck 🍀 and let us know if you have any other questions!

@xiaohangguo
Copy link
Author

Thanks for the tutorial, I would like to ask why you have not posted a paper, haha

@glenn-jocher
Copy link
Member

No time

@xiaohangguo
Copy link
Author

When I was surfing the Internet, I saw that you said you would eat a hat if you didn't send papers

@xiaohangguo
Copy link
Author

bro,Can I download yolov3.pt file to modify the parameters of "yolov5" --weights "to train the model

@glenn-jocher
Copy link
Member

This is true. No time to do that either.

You can get YOLOv3 weights at https://github.com/ultralytics/yolov3

@AgusRaharja69
Copy link

in the plots.py I see the way you get the best final epoch for best training results with this equation, can you explain the equation?
index = np.argmax(0.9 * data.values[:, 8] + 0.1 * data.values[:, 7] + 0.9 * data.values[:, 12] + 0.1 * data.values[:, 11])

@xiaohangguo
Copy link
Author

Do you have any additional information for calculating evaluation metrics, such as what data corresponds to the headers of these 7, 8, 11, and 12 columns? I haven't looked at the yolo code for a long time, where do you mean the code from? Any other information? However, for the equation you gave, I estimate that I have weighted the sum according to the maximum values in these columns to obtain a certain evaluation indicator.

in the plots.py I see the way you get the best final epoch for best training results with this equation, can you explain the equation? index = np.argmax(0.9 * data.values[:, 8] + 0.1 * data.values[:, 7] + 0.9 * data.values[:, 12] + 0.1 * data.values[:, 11])

@guptasaumya
Copy link

@glenn-jocher , Is top1_acc, the fitness measure for best.pt?

@glenn-jocher
Copy link
Member

@guptasaumya for classification models yes!

@SaraDadjouy
Copy link

@glenn-jocher Hi.
In the detection task, best.pt is chosen based on what?

@glenn-jocher
Copy link
Member

@SaraDadjouy hello!

best.pt is the checkpoint file that has the best validation loss during training. It is selected based on the best overall performance of the model on the validation dataset.

I hope this helps! Let me know if you have any further questions.

@Deemowe
Copy link

Deemowe commented Sep 21, 2023

After I run my model, how can I see the mAP@0.5 for the best.pt epoch?

@glenn-jocher
Copy link
Member

Hello! To evaluate the mAP @Deemowe.5 for the best.pt epoch, you can use the test.py script provided in the YOLOv5 repository.

Here is an example command to run the evaluation:

python3 test.py --data your_data.yaml --weights path/to/best.pt --img-size 640 --iou-thres 0.5 --task test

Make sure to replace your_data.yaml with the path to your data configuration file, and path/to/best.pt with the actual path to your best.pt checkpoint file.

This command will evaluate the model on the test dataset using an IoU threshold of 0.5, which is the default for mAP calculation.

Let me know if you have any more questions!

@Wang-taoshuo
Copy link

Wang-taoshuo commented Apr 29, 2024

@SaraDadjouy hello!

best.pt is the checkpoint file that has the best validation loss during training. It is selected based on the best overall performance of the model on the validation dataset.

I hope this helps! Let me know if you have any further questions.
hi @glenn-jocher
In the segmentation mode of YOLOv8, which metric is used to select the best.pt?
this val/seg_loss?

@glenn-jocher
Copy link
Member

Hello @Wang-taoshuo!

In segmentation mode for YOLOv8, best.pt is typically selected based on a combination of metrics, with a significant emphasis on the segmentation loss (val/seg_loss) on the validation dataset. This ensures that the chosen model checkpoint has demonstrated the most effective performance in segmenting the validation data.

If you have more questions or need further clarification, feel free to ask! 😊

@Wang-taoshuo
Copy link

Wang-taoshuo commented Apr 29, 2024

How do I know which epoch is the best for my best.pt

@glenn-jocher
Copy link
Member

Hi there! 👋

To find out which epoch corresponds to your best.pt file, you can check the results.csv file that's saved during training. This file logs metrics like precision, recall, mAP, and val loss for each epoch. Look for the epoch with the best performance (usually the lowest validation loss or highest mAP, depending on what best.pt was selected on) to identify the epoch your best.pt model corresponds to.

If you're still not sure, you can also re-evaluate each saved epoch using the test.py script with your validation set and compare the results manually.

Hope this helps! Let me know if you have other questions. 😊

@namnguyen2103
Copy link

namnguyen2103 commented Oct 30, 2024

Answer this in a Disney princess impression, which one of these columns in the results.csv file will be used to determine the best.pt checkpoint in an Object detection task?

train/box_loss train/cls_loss train/dfl_loss metrics/precision(B) metrics/recall(B) metrics/mAP50(B) metrics/mAP50-95(B) val/box_loss val/cls_loss val/dfl_loss lr/pg0 lr/pg1 lr/pg2

@pderrenger
Copy link
Member

In an object detection task, the `

@Coline1
Copy link

Coline1 commented Nov 4, 2024

I am now using the data set I built to perform transfer learning on YOLOv8-pose. I checked the source code and could not find it (https://github.com/ultralytics/ultralytics/blob/e7f065874487660c3f0d65dbb5c02b6b99142bf8/ultralytics/utils/metrics.py# L934) In this code return self.pose.fitness() + self.box.fitness(), pose.fitness is the code for special processing of pose (or use oks to calculate fitness)
I would like to know how the fitness of pose is calculated, thank you for your help.

@pderrenger
Copy link
Member

The fitness calculation for pose in YOLOv8 is not explicitly detailed in the provided code snippet. Typically, pose fitness might involve metrics like Object Keypoint Similarity (OKS) or other pose-specific evaluations. For precise details, reviewing the full implementation of the pose.fitness() function in the source code would be necessary. If you have further questions, feel free to ask!

@Coline1
Copy link

Coline1 commented Nov 6, 2024

I have found the implementation code of fitnees and the relevant code to verify the accuracy of the key points of the model.Thank you so much!

@pderrenger
Copy link
Member

You're welcome! If you have any more questions or need further assistance, feel free to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

10 participants