Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train hanging at the end of Epoch 1/10 #37

Open
henryzhao321 opened this issue Apr 15, 2021 · 3 comments
Open

Train hanging at the end of Epoch 1/10 #37

henryzhao321 opened this issue Apr 15, 2021 · 3 comments

Comments

@henryzhao321
Copy link

Following the instructions in the README.md in section Train at Point 1:

make train MODEL=yolo_mobilev1 DEPTHMUL=0.75 MAXEP=10 ILR=0.001 DATASET=voc CLSNUM=20 IAA=False BATCH=16

It starts Epoch 1/10 and runs for about 2 hours with the ETA getting close to 0s, but stops/hangs at 6s.
You can't Ctrl-C it and 'top' doesn't show any processor load.
The log dir only has an args.txt and train directory.

Example output:
979/982 [============================>.] - ETA: 20s - loss: 39.1206 - l1_loss: 11.0472 - l2_loss: 27.5336 - l1_p: 0.1742 - l1_r: 0.0855 - l2_p: 0.0486 - l2_r:
0980/982 [============================>.] - ETA: 13s - loss: 39.1038 - l1_loss: 11.0427 - l2_loss: 27.5213 - l1_p: 0.1744 - l1_r: 0.0855 - l2_p: 0.0487 - l2_r:
0981/982 [============================>.] - ETA: 6s - loss: 39.0847 - l1_loss: 11.0408 - l2_loss: 27.5041 - l1_p: 0.1747 - l1_r: 0.0856 - l2_p: 0.0487 - l2_r: 0.0118

If I set MAXEP=1 it completes after 2 hours and I get the yolo_model.h5. I tried the "make inference" with this and it didn't seem to detect anything. I also tried the pre-built yolo_model.h5 in the asset directory and that works well. The instructions say to use MAXEP=10 so perhaps this is the why my model doesn't work? Why does it hang at the end of Epoch 1/10?

@rogerkuo1981
Copy link

我也碰到同样的问题,请问您这个有没有解决掉啦?

@henryzhao321
Copy link
Author

Yes, I think so. I was using it under a ubuntu Virtual Machine and setting the CPU cores to 2 or more seemed to fix it.

@BackMountainDevil
Copy link

Please paste all output. Not enough to check what happed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants