turned fatal errors into exceptions, allowing error handling outside of ... #1683

beniz · 2015-01-06T16:08:01Z

Hi brewers!

This PR allows to transform all fatal glog's fatal CHECK from Caffe code into exceptions. In general, my understanding is that it is regarded as bad behavior for a library to trigger fatal errors. For my own purpose (code to be released soon), I needed to handle Caffe errors outside of the libcaffe.

This PR has two drawbacks that I know of:

triggers a lot of warning at compile time since it does redefine a couple of macros from libglog
the LOG_IF(ERROR,false) part of the code is useless and in-code custom error messages are not passed to the exception.

However, the exception captures the file and line of the error, which in practice allows to trace back the problem to the check without difficulty.

I could not find how to do simpler and better, but in case the problem of fatal errors within libcaffe should be handled differently, please let me know as I might well (re)take the token.

…of libcaffe

longjon · 2015-01-06T22:26:56Z

This is interesting... I've thought about doing something like this in the past, since glog crashes don't play nicely with the Python module.

Some notes:

Our non-use of exceptions probably derives from http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Exceptions, which is actually sort of bullish on exceptions, although they are disallowed in Google code. So we might decide to go our own way here, or perhaps have an option to switch between fatal glog errors and exceptions, much like this PR.
Are there instances of fatal errors that don't make sense as exceptions, i.e., that cannot be recovered from? If so, there is nothing in the code right now distinguishing those from recoverable errors.
There is no guarantee that our current code behaves nicely or even correctly if allowed to continue executing after (what is currently) a fatal error; we would need to read carefully and think about that.

Re: warnings, shouldn't redefinition be okay if you #undef first?

beniz · 2015-01-07T08:28:01Z

Our non-use of exceptions probably derives from http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Exceptions, which is actually sort of bullish on exceptions, although they are disallowed in Google code. So we might decide to go our own way here, or perhaps have an option to switch between fatal glog errors and exceptions, much like this PR.

Agreed. I certainly do not want to troll about exceptions :) However, in the present case of the widespread use of CHECK_XX macros throughout Caffe's code and many PRs, my current understanding is that there's not much choice. If only because using error codes would require so many changes of an already fast evolving codebase. I might be missing something, but exceptions seemed to be the only hope here...

Now, if this can stir up a discussion on an error handling mode for Caffe, I believe this is for the best. I don't expect this little PR to be more than a first micro step.

Are there instances of fatal errors that don't make sense as exceptions, i.e., that cannot be recovered from? If so, there is nothing in the code right now distinguishing those from recoverable errors.

I thought of this. My understanding is that a CHECK macro does in fact prevent a (real) fatal error. In that sense, I was sort of angry at glog for not allowing non-fatal checks...

In more details, fatal errors at large can happen in a small set of operations, roughly: net creation, net processing of the inputs, and net processing (inner forward and/or backward pass). Net creation failure should be recoverable by destroying the net's embryo. Net input processing should be recoverable as well since it only affects an isolated area. The few fatal errors I did experience in net processing at large are either the missing GPU or the overflow of GPU memory. My understanding is that these two are not pre-checked or captured yet, but that they could be (please let me know how to help on this front).

Of course, you may know much more than I do about this, but I don't see why most if not all of today's fatal errors should not be recoverable in some manner.

There is no guarantee that our current code behaves nicely or even correctly if allowed to continue executing after (what is currently) a fatal error; we would need to read carefully and think about that.

Yes, understood. What is needed is the ability to clean up object's embryos in which the error took place. A clean abort before trying to continue execution should be a good start! Despite libcaffe using a singleton Caffe object in common.hpp (1), my understanding is that it is possible to utterly destroy a Solver and or Net objects, correct ? If yes, what minimal object to destroy should probably be handled by catching the exception at different levels, typically, Net, Solver and Caffe object ?

Re: warnings, shouldn't redefinition be okay if you #undef first?

Good point! If only FTR, I'll update the PR soon.

(1) what is the core reason why we're not allowed to instantiate multiple, isolated, Caffe objects ?

…ing compiler's warnings

netheril96 · 2015-01-09T08:19:28Z

The problems with this is that most of the code in caffe is not written with exception safety in mind. Turning them into exceptions could easily leak memory or other resources, and result in data structure corruption. Although it definitely can be fixed, I don't know if the maintainers are willing to accept such drastic change to their codebase.

Besides, one of the most important reasons that Google bans exceptions is that their legacy code is not exception-safe. It may be the case here as well.

beniz · 2015-01-09T09:06:14Z

Again, not making the case for or against exceptions here, just I needed to capture these fatal errors for my own usage, and shared the minimal result. What I would find interesting discussing however is the roadmap for transitioning from CHECK macros and fatal errors, even if not immediately applicable.

lukeyeager · 2015-08-20T18:36:41Z

This could turn pycaffe into a real, grown-up python module! 👍

Although it definitely can be fixed, I don't know if the maintainers are willing to accept such drastic change to their codebase.

Is that true?

shelhamer · 2015-08-26T00:22:54Z

Closing since the dev branch is deprecated. Please send PRs to master.

However this is still worth a conversation so I've posted issue #2976.

beniz added 3 commits January 6, 2015 16:49

turned fatal errors into exceptions, allowing error handling outside …

71e09ca

…of libcaffe

removing uneeded inclusion if glog/logging.h

deb41c8

removed sstream double inclusion

1af22c2

undef the CHECK and CHECK_OP_LOG macros before overriding them, avoid…

dfd90cd

…ing compiler's warnings

seanbell mentioned this pull request Mar 2, 2015

[Python] PythonLayer takes parameters by string and better exception handling #2001

Closed

shelhamer added the JL label Mar 10, 2015

beniz mentioned this pull request May 27, 2015

Document steps for using a custom version of Caffe along with DeepDetect jolibrain/deepdetect#2

Closed

2 tasks

longjon added the Python label Aug 5, 2015

jeffdonahue mentioned this pull request Aug 20, 2015

Expose LayerFactory::LayerTypeList in pycaffe #2930

Merged

shelhamer mentioned this pull request Aug 26, 2015

CHECKs vs. Exceptions #2976

Open

shelhamer closed this Aug 26, 2015

seanbell mentioned this pull request Apr 14, 2016

Matcaffe crashes on any CHECK(...), or on failing to parse prototxt #3986

Open

lukeyeager mentioned this pull request May 16, 2016

Logging system and throwing of exceptions instead of quitting. #4158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

turned fatal errors into exceptions, allowing error handling outside of ... #1683

turned fatal errors into exceptions, allowing error handling outside of ... #1683

beniz commented Jan 6, 2015

longjon commented Jan 6, 2015

beniz commented Jan 7, 2015

netheril96 commented Jan 9, 2015

beniz commented Jan 9, 2015

lukeyeager commented Aug 20, 2015

shelhamer commented Aug 26, 2015

turned fatal errors into exceptions, allowing error handling outside of ... #1683

turned fatal errors into exceptions, allowing error handling outside of ... #1683

Conversation

beniz commented Jan 6, 2015

longjon commented Jan 6, 2015

beniz commented Jan 7, 2015

netheril96 commented Jan 9, 2015

beniz commented Jan 9, 2015

lukeyeager commented Aug 20, 2015

shelhamer commented Aug 26, 2015