Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python stopped working when execute 'self.solver.step(1)' code in 'train.py' file. #175

Closed
cobbwho opened this issue Apr 23, 2018 · 2 comments

Comments

@cobbwho
Copy link

cobbwho commented Apr 23, 2018

I completed the caffe compilation in cpu-only mode on windows without cuda. Because I don't have linux and no cuda. When I train with the voc_2007 data set via train_net.py, Some errors have been made.When the program runs to the position shown in the figure below:
...
I0423 net.cpp:774] Copying source layer fc7 I0423 net.cpp:774] Copying source layer relu7 I0423 net.cpp:774] Copying source layer drop7 I0423 net.cpp:771] Ignoring source layer fc8 I0423 net.cpp:771] Ignoring source layer loss
Solving...
Then, the dialog box indicates that python.exe has stopped working and no error message. I don't kown what‘s the wrong and the program crushed.
I tried to find the error code in the train.py file and I find the program stoped in code self.solver.step(1) in 92 rows and 12 columns. It seems like that error occurred in the underlying cpp file to check something about gpu , but I did not find.
Anyone got any ideas? Any word will be appreciate.

@cobbwho
Copy link
Author

cobbwho commented Apr 24, 2018

I have made some progress when I modify solver.prototxt and set fileld 'debug_info' to true. Some new messages occured.
...like before
Solving...
........Omitted
I0424 net.cpp:630] [Forward] Layer cls_score, param blob 0 data: 0.00795975
I0424 net.cpp:630] [Forward] Layer cls_score, param blob 1 data: 0
I0424 net.cpp:618] [Forward] Layer bbox_pred, top blob bbox_pred data: 0.0848037
I0424 net.cpp:630] [Forward] Layer bbox_pred, param blob 0 data: 0.000797856
I0424 net.cpp:630] [Forward] Layer bbox_pred, param blob 1 data: 0
I0424 net.cpp:618] [Forward] Layer loss_cls, top blob loss_cls data: 4.00548
and then, python.exe still stop working.
Fortunately, I actually modify file smooth_L1_loss_layer.cpp to support cpu mode about forward delivery and backward delivery. It will report an error not implemented yet before I modify the file. And now may be the error because smooth_L1_loss_layer.cpp

@cobbwho
Copy link
Author

cobbwho commented Apr 25, 2018

Unfortunately, the author rbgirshick already said they do not plan to support CPU-only training Cpu only #39.Although I have modified ‘smooth_L1_loss_layer.cpp’ and ‘roi_pooling_layer.cpp’ and add forward backward method, It's work. I don't have the time and the ability to modify the underlying code to make it support cpu_only.

@cobbwho cobbwho closed this as completed Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant