Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turned fatal errors into exceptions, allowing error handling outside of ... #1683

Closed
wants to merge 4 commits into from

Conversation

beniz
Copy link

@beniz beniz commented Jan 6, 2015

Hi brewers!

This PR allows to transform all fatal glog's fatal CHECK from Caffe code into exceptions. In general, my understanding is that it is regarded as bad behavior for a library to trigger fatal errors. For my own purpose (code to be released soon), I needed to handle Caffe errors outside of the libcaffe.

This PR has two drawbacks that I know of:

  • triggers a lot of warning at compile time since it does redefine a couple of macros from libglog
  • the LOG_IF(ERROR,false) part of the code is useless and in-code custom error messages are not passed to the exception.

However, the exception captures the file and line of the error, which in practice allows to trace back the problem to the check without difficulty.

I could not find how to do simpler and better, but in case the problem of fatal errors within libcaffe should be handled differently, please let me know as I might well (re)take the token.

@longjon
Copy link
Contributor

longjon commented Jan 6, 2015

This is interesting... I've thought about doing something like this in the past, since glog crashes don't play nicely with the Python module.

Some notes:

  • Our non-use of exceptions probably derives from http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Exceptions, which is actually sort of bullish on exceptions, although they are disallowed in Google code. So we might decide to go our own way here, or perhaps have an option to switch between fatal glog errors and exceptions, much like this PR.
  • Are there instances of fatal errors that don't make sense as exceptions, i.e., that cannot be recovered from? If so, there is nothing in the code right now distinguishing those from recoverable errors.
  • There is no guarantee that our current code behaves nicely or even correctly if allowed to continue executing after (what is currently) a fatal error; we would need to read carefully and think about that.

Re: warnings, shouldn't redefinition be okay if you #undef first?

@beniz
Copy link
Author

beniz commented Jan 7, 2015

Our non-use of exceptions probably derives from http://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Exceptions, which is actually sort of bullish on exceptions, although they are disallowed in Google code. So we might decide to go our own way here, or perhaps have an option to switch between fatal glog errors and exceptions, much like this PR.

Agreed. I certainly do not want to troll about exceptions :) However, in the present case of the widespread use of CHECK_XX macros throughout Caffe's code and many PRs, my current understanding is that there's not much choice. If only because using error codes would require so many changes of an already fast evolving codebase. I might be missing something, but exceptions seemed to be the only hope here...

Now, if this can stir up a discussion on an error handling mode for Caffe, I believe this is for the best. I don't expect this little PR to be more than a first micro step.

Are there instances of fatal errors that don't make sense as exceptions, i.e., that cannot be recovered from? If so, there is nothing in the code right now distinguishing those from recoverable errors.

I thought of this. My understanding is that a CHECK macro does in fact prevent a (real) fatal error. In that sense, I was sort of angry at glog for not allowing non-fatal checks...

In more details, fatal errors at large can happen in a small set of operations, roughly: net creation, net processing of the inputs, and net processing (inner forward and/or backward pass). Net creation failure should be recoverable by destroying the net's embryo. Net input processing should be recoverable as well since it only affects an isolated area. The few fatal errors I did experience in net processing at large are either the missing GPU or the overflow of GPU memory. My understanding is that these two are not pre-checked or captured yet, but that they could be (please let me know how to help on this front).

Of course, you may know much more than I do about this, but I don't see why most if not all of today's fatal errors should not be recoverable in some manner.

There is no guarantee that our current code behaves nicely or even correctly if allowed to continue executing after (what is currently) a fatal error; we would need to read carefully and think about that.

Yes, understood. What is needed is the ability to clean up object's embryos in which the error took place. A clean abort before trying to continue execution should be a good start! Despite libcaffe using a singleton Caffe object in common.hpp (1), my understanding is that it is possible to utterly destroy a Solver and or Net objects, correct ? If yes, what minimal object to destroy should probably be handled by catching the exception at different levels, typically, Net, Solver and Caffe object ?

Re: warnings, shouldn't redefinition be okay if you #undef first?

Good point! If only FTR, I'll update the PR soon.

(1) what is the core reason why we're not allowed to instantiate multiple, isolated, Caffe objects ?

@netheril96
Copy link
Contributor

The problems with this is that most of the code in caffe is not written with exception safety in mind. Turning them into exceptions could easily leak memory or other resources, and result in data structure corruption. Although it definitely can be fixed, I don't know if the maintainers are willing to accept such drastic change to their codebase.

Besides, one of the most important reasons that Google bans exceptions is that their legacy code is not exception-safe. It may be the case here as well.

@beniz
Copy link
Author

beniz commented Jan 9, 2015

Again, not making the case for or against exceptions here, just I needed to capture these fatal errors for my own usage, and shared the minimal result. What I would find interesting discussing however is the roadmap for transitioning from CHECK macros and fatal errors, even if not immediately applicable.

@lukeyeager
Copy link
Contributor

This could turn pycaffe into a real, grown-up python module! 👍

Although it definitely can be fixed, I don't know if the maintainers are willing to accept such drastic change to their codebase.

Is that true?

@shelhamer
Copy link
Member

Closing since the dev branch is deprecated. Please send PRs to master.

However this is still worth a conversation so I've posted issue #2976.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants