-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joint dataset training question #6904
Comments
@glenn-jocher Thank you for referring to that link. I would like to ask if it would be fine to use the pre-trained m6,s6, and n6 as the starting model to train for COCO+VisDrone? From what I understand, the pre-trained models are trained from COCO. Would doing so cause any unwanted behaviors down the road? Further, would using a pre-trained model to train on another dataset cause the final model to be bloated with extra parameters from its pre-trained dataset? If that is the case, is it possible to train from scratch? |
@HeChengHui yes you can use YOLOv5 pretrained models to start training any dataset or combination of datasets. |
@glenn-jocher
Could this be some error in my settings or just a case of autobatch not working here? |
@HeChengHui your data.yaml Based on your partially completed AutoBatch results it seems like your card can support maybe --batch 4 or --batch 8. Experiment to see what works. |
@glenn-jocher After setting up the environment again, autobatch seems to work when I tried |
@glenn-jocher Even on 34 epoch, my metrics (mAP, recall, precision logged into wandb) are all 0. Could this be due to the hyperparameter evolution? |
@HeChengHui I don't know what you mean by zero mAP on epoch 34 of 300, as --evolve does not compute mAP until the final epoch. Also note that --batch 1 is extremely small and not recommended. |
I refer to the metrics shown in wandb. Is it only evaluated after 300 epochs instead of every epoch?
Does |
Oh yes! Didn't notice the -1. -1 will implement AutoBatch to automatically find the best batch size. But yes during evolution mAP is only evaluated on the final epoch, so there's no way to know it's value until a generation is finished. |
@glenn-jocher |
@HeChengHui it sounds like you should just train normally rather than using --evolve. --evolve is intended to take several weeks with significant resources, and it does not return a model, it only returns evolve hyperparameters on your base scenario that you can then use to train a model. If you just want to train a model don't use --evolve. |
@glenn-jocher |
@HeChengHui you're not understanding evolution. One training is one generation. Evolution relies on many (hundreds) of generations to evolve optimal hyperparameters. See hyperparameter evolution tutorial for details: YOLOv5 Tutorials
Good luck 🍀 and let us know if you have any other questions! |
While looking through the different model configurations under
I would like to clarify the purpose of |
@HeChengHui sure you can delete larger output blocks if you don't need them. Results will vary based on your dataset and training settings like --img-size naturally. Another option for small object detection would just be to train and detect at larger --img-size with the normal P5 models. |
I see.
Would that affect the performance? The purpose is to reduce model size and increase speed. |
@HeChengHui you can delete anything you want, but if you delete intermediate layers you need to correctly reconnect the remaining layers, i.e. the |
@glenn-jocher
Would this be a valid configuration? |
@HeChengHui you can run any model yaml through yolo.py to verify it works and profile it etc.
|
ohh thank you.
Seems like something went wrong with the concat layer? Any advice on how to debug this? |
@HeChengHui we don't provide support for model customizations, sorry. Perhaps a community member can assist. |
@glenn-jocher |
I am training a model using After training for 60 epochs, it suddenly failed with CUDA OOM. I looked around and I might need to lower the batch size. However, is there a way to lower the batch size while resuming training? Or must I restart from scratch? |
@HeChengHui hi sorry to hear that! That's very strange. Is there any other GPU memory usage on the instance? AutoBatch seeks to set a batch size for 90% CUDA memory utilization, but perhaps we should reduce the default value to 85%. You can not modify any parameters on resume, but you can go into train.py and customize the code to force it to a different batch size, i.e.: Lines 70 to 73 in 7a2a118
|
@glenn-jocher
Alright thank you! |
@glenn-jocher
It seems to exit okay but I am not sure if the error is going to cause any problem
|
@HeChengHui it appears you may have environment problems. Please ensure you meet all dependency requirements if you are attempting to run YOLOv5 locally. If in doubt, create a new virtual Python 3.9 environment, clone the latest repo (code changes daily), and 💡 ProTip! Try one of our verified environments below if you are having trouble with your local environment. RequirementsPython>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started: git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install Models and datasets download automatically from the latest YOLOv5 release when first requested. EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
@glenn-jocher |
If you trained successfully on another environment, then the independent variable that has changed is your environment, not YOLOv5. Logically you should start examining your environment for issues, or use a working one: EnvironmentsYOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
StatusIf this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), validation (val.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit. |
Sorry, I meant that I have also managed to train 2 models with no errors in the same environment. |
@HeChengHui your environment is up to you. If you have a reproducible error specific to YOLOv5, then please submit a bug report with code to reproduce. How to create a Minimal, Reproducible ExampleWhen asking a question, people will be better able to provide help if you provide code that they can easily understand and use to reproduce the problem. This is referred to by community members as creating a minimum reproducible example. Your code that reproduces the problem should be:
For Ultralytics to provide assistance your code should also be:
If you believe your problem meets all the above criteria, please close this issue and raise a new one using the 🐛 Bug Report template with a minimum reproducible example to help us better understand and diagnose your problem. Thank you! 😃 |
@glenn-jocher |
@HeChengHui default activation function for YOLOv5 is SiLU: Line 44 in 0ca85ed
|
sorry, I was asking more specifically after changing the activation function, do I need to train the model from scratch using |
@HeChengHui I don't understand your question. Nothing is changeable about a trained model. Any changes you make to modules it uses will result in errors or worse results. |
@glenn-jocher |
@HeChengHui oh, you can do both depending whether you want to start from pretrained weights or not. See Train Custom Data tutorial for details: YOLOv5 Tutorials
Good luck 🍀 and let us know if you have any other questions! |
@glenn-jocher |
|
Hi, I would like to clarify the purpose of the test split during training. My understanding is that validation is done on the validation split. Does the test split contribute in any way? |
test split is not used during training |
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. Access additional YOLOv5 🚀 resources:
Access additional Ultralytics ⚡ resources:
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed! Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐! |
Search before asking
Question
I would like to try out joint dataset training as seen here with COCO + VisDrone2019-det. However, I am not sure if I should start with pre-trained weights (v5m6, s6,n6) or start from scratch (if that is possible).
Additional
No response
The text was updated successfully, but these errors were encountered: