-
Notifications
You must be signed in to change notification settings - Fork 50
Ignore EPIPE and ECONNRESET errors from stdin #35
Conversation
Thx for reporting this. It seems like this is a Linux problem because it does not occur on OSX. My assumption is that Phantomjs receives the SIGINT signal and handles it by itself. Thus it is not necessary that phridge also tries to shutdown the process. However, I don't like to just silently ignore errors, because that is usually just a quickfix and will haunt you when you try to debug something without getting an error :) I think I'll rewrite your PR so that phridge only ignores these errors when a SIGINT is received. Why do you also ignore ECONNRESET? This error does not occur in my test setup... |
As discussed in #34, Linux shells tend to group child processes into 'process groups'. When the shell gets a ctrl-c, it doesn't just send it to the current foreground process, it sends it to the entire process group, which means the children receive it as well. If you send the signal with I agree entirely about not silently ignoring errors :). HOWEVER, in this case I believe that this may quite likely be the correct behaviour. EPIPE/ECONNRESET are fired because the process has died - it's just not been cleaned up because This can only be solved in one of the following ways:
Setting to ignore only once a SIGINT is received doesn't handle the case when just phantomJS is killed. I didn't notice ECONNRESET until I started writing the tests - where I ended up randomly getting EPIPE or ECONNRESET about 50:50. I am not sure whether I have also seen this outside of the mocha test cases or not. That one probably does need more investigation. Thanks. |
I should say that if you wish to reproduce the wider test case, start a node process which runs phridge to create a phantomJS, then kill (SIGTERM, SIGINT, SIGKILL, whatever) the phantomJS, and then allow node to run phantom.dispose(). It causes node to die horribly with EPIPE errors from nowhere. |
Yeah, I guess you're right.
When is this case? I think, this should never be the case under normal circumstances... |
Probably not under normal circumstances. However, if I do If a phantomJS process dies, it makes sense for any tasks it was running to return errors. But the errors should be specific to those requests - they shouldn't blow up the whole server. Possibly phridge or the phantomjs object should also emit some kind of error, but it needs to be a clear, documented, catchable error so that a server-type process can catch it, print a warning, spawn a new instance and carry on. |
👍 I get ECONNRESET if phantom process is killed while evaluating a page, this would fix it |
Could you guys check out the current master? I've improved the error handling. Does especially this commit fix your problems? I don't like to discard errors. That's why I've written an error handler which waits 300ms for the process to shut down. If that's not the case, the error is emitted on the Phantom instance where the developer is able to catch it. If the error is not handled, node throws it. |
Glad to see However, I think the timeout is fragile and confusing and just introduces race conditions.
I really think you should just remove this. To my mind, there are only 3 places when we care about errors:
Thanks. |
Thx for reading through the source code and for providing feedback. That's why I didn't just publish a new version 😀 I have to admit, using a timeout seems a bit hacky, but I don't think that it's too fragile. It basically just says: "Ok, I'll wait 300ms for the process to exit (e.g. because a SIGTERM was received and the process is busy shutting down). In that case we don't want to report anything because the error was kind of expected. After that delay we emit the error and pass responsibility back to the user, because we ahve to assume that it was not expected". To answer your questions:
Than you have to handle the error event in order to prevent the process to crash. It's just an assumption that most applications are able to exit within 300ms.
Again, than you have to handle the error event.
I don't think that this is true. There are error conditions without explicit calls, like receiving SIGTERM from "the outside". Using an error event for these situations is kind of nodejs standard.
I don't think that this is a good behavior. The error would be completely out of context. Imho it's best practice to report errors as soon as possible – and within the correct context.
Agreed. The current solution does that already. |
A SIGTERM/SIGINT/SIGWHATEVER is an event, not an error. If a process is told to exit by the sysadmin, operating system, etc, I don't consider that to be an error. We could be shutting down. It's only an error if you then try to use this process.
This is my issue with the timeout approach. It feels like the error is out of context, since it's not being thrown immediately (I realise my "storing the error and throwing it to the next caller" approach has the same issue, but at least it was given to someone who cares about it, and likely within the context of a request which will almost certainly have its own error handler rather than crashing the whole process). If you are going to throw errors EventEmitter-style, this will need to be very clearly documented as something the user has to think about. To me it seems that throwing a timeout into the mix here just makes things more complicated. The purpose of the timeout is to decrease the number of things the user has to think about, so that the user doesn't need to care about random errors occuring for the simple case of SIGINT/SIGTERM to process groups. But we haven't helped the situation for anyone. All we've done is made the error unreliable - we've introduced a race condition. Everyone has to read the documentation and handle the possible error because it could happen in production even if it works fine in development/testing. If the machine is slow or busy, you might get an unexpected crash. What we've done is taken a possible uncaught error that everyone has to deal with and made it more complicated. Everyone still has to handle it because it still might happen. Except now it's 300ms late and unreliable. I agree that it would be nice to handle this process group issue for SIGINT/SIGTERM so that most users don't have to think about it, but I don't think there is any way to distinguish this case. This leaves only two options:
|
I agree to all of your arguments. Adding a timeout only makes it easier for the group of users who are lucky to have a process which exits within 300ms 😉. What about a mixture of both options? We could emit an Of course, this needs to be documented in the README. |
An opt-in error handler sounds like a good idea to me. So by default it won't crash the process. Any pending promises will be rejected, and any new calls will throw the "already exited" error. If the process is exiting when the user didn't expect for some reason, they can investigate by listening to the unexpectedExit event. Am I right in assuming that with this approach you would be removing the timeout and that When rejecting promises or throwing the "already exited" error, it would be nice to either throw the error that caused it, or possibly better, attach the original error as a property on the error that gets thrown. Thanks very much :) |
Error events on
Yes, I'll do that, but I'm afraid they won't be very helpful either, like |
- Remove setTimeout for error handling - Change event name "error" to "unexpectedExit" See discussion at #35
Just published this with |
Fixes #34.
Thanks.