-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
turned fatal errors into exceptions, allowing error handling outside of ... #1683
Conversation
This is interesting... I've thought about doing something like this in the past, since glog crashes don't play nicely with the Python module. Some notes:
Re: warnings, shouldn't redefinition be okay if you |
Agreed. I certainly do not want to troll about exceptions :) However, in the present case of the widespread use of CHECK_XX macros throughout Caffe's code and many PRs, my current understanding is that there's not much choice. If only because using error codes would require so many changes of an already fast evolving codebase. I might be missing something, but exceptions seemed to be the only hope here... Now, if this can stir up a discussion on an error handling mode for Caffe, I believe this is for the best. I don't expect this little PR to be more than a first micro step.
I thought of this. My understanding is that a CHECK macro does in fact prevent a (real) fatal error. In that sense, I was sort of angry at glog for not allowing non-fatal checks... In more details, fatal errors at large can happen in a small set of operations, roughly: net creation, net processing of the inputs, and net processing (inner forward and/or backward pass). Net creation failure should be recoverable by destroying the net's embryo. Net input processing should be recoverable as well since it only affects an isolated area. The few fatal errors I did experience in net processing at large are either the missing GPU or the overflow of GPU memory. My understanding is that these two are not pre-checked or captured yet, but that they could be (please let me know how to help on this front). Of course, you may know much more than I do about this, but I don't see why most if not all of today's fatal errors should not be recoverable in some manner.
Yes, understood. What is needed is the ability to clean up object's embryos in which the error took place. A clean abort before trying to continue execution should be a good start! Despite libcaffe using a singleton Caffe object in common.hpp (1), my understanding is that it is possible to utterly destroy a Solver and or Net objects, correct ? If yes, what minimal object to destroy should probably be handled by catching the exception at different levels, typically, Net, Solver and Caffe object ?
Good point! If only FTR, I'll update the PR soon. (1) what is the core reason why we're not allowed to instantiate multiple, isolated, Caffe objects ? |
…ing compiler's warnings
The problems with this is that most of the code in caffe is not written with exception safety in mind. Turning them into exceptions could easily leak memory or other resources, and result in data structure corruption. Although it definitely can be fixed, I don't know if the maintainers are willing to accept such drastic change to their codebase. Besides, one of the most important reasons that Google bans exceptions is that their legacy code is not exception-safe. It may be the case here as well. |
Again, not making the case for or against exceptions here, just I needed to capture these fatal errors for my own usage, and shared the minimal result. What I would find interesting discussing however is the roadmap for transitioning from CHECK macros and fatal errors, even if not immediately applicable. |
This could turn pycaffe into a real, grown-up python module! 👍
Is that true? |
Closing since the dev branch is deprecated. Please send PRs to master. However this is still worth a conversation so I've posted issue #2976. |
Hi brewers!
This PR allows to transform all fatal glog's fatal CHECK from Caffe code into exceptions. In general, my understanding is that it is regarded as bad behavior for a library to trigger fatal errors. For my own purpose (code to be released soon), I needed to handle Caffe errors outside of the libcaffe.
This PR has two drawbacks that I know of:
However, the exception captures the file and line of the error, which in practice allows to trace back the problem to the check without difficulty.
I could not find how to do simpler and better, but in case the problem of fatal errors within libcaffe should be handled differently, please let me know as I might well (re)take the token.