-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent infinite recursions in echoandcheck and SyWriteandcheck #3102
prevent infinite recursions in echoandcheck and SyWriteandcheck #3102
Conversation
While I agree with the direction of this, I don't think this is the way to fix it. There are two main problems:
|
I also wonder what the best thing to do in this case is.. if you can't print error messages, terrible things are happening. While I understand a segfault is bad, I'm not sure what's better. I wouldn't want the error message just being thrown away for example. |
For example, I'd we committed this, we could end up silently dropping writes to files, without any feedback. |
I had the same concerns, but this patch has been applied, in some form or other, in Sage's GAP package for quite a long time without any concerns. That was before multi-threaded GAP though: Perhaps it should instead it should use a thread-local variable for this in one of the state slots: Something specifically for detecting recursion into I/O functions while trying to handle error I/O...
That's an exaggeration. Perfectly normal things may be happening and I gave such an example: Where stderr is replaced with a pipe and one end of the pipe is closed. This should not cause a segfault.
In most cases where this might happen it would be during process shutdown anyways. This might even happen if GAP would have otherwise exited with a normal exit code. Instead it results in the GAP process hanging for a long time in a loop until it segfaults. I think it would be better in this case to just exit silently. |
Out of curiosity I looked into how Python handles this sort of case in the roughly equivalent parts of its code, and it mostly just ignores such write errors and doesn't make a big deal out of them. It will still work as best it can. Even if you run the Python REPL like Typing input still echos characters if readline is enabled, since it will still enable terminal echo if at least stdin is a TTY. Same on GAP with this patch--at least enough to type |
Also,
That's not really a problem, since that only occurs if there is an error writing to the error stream; you would basically never recover in this case and it would stop trying to go into In other words, it's try once, and once an error has occurred trying to output to some stream it won't try to raise an error about it again. Other approaches would be to either not raise an error in this case at all, at least if |
I wouldn't worry if we exited cleanly, instead of segfaulting, but I worry about hiding errors. What would Sage like? We could exit cleanly, or refuse to do any more work, but I'm worried about continuing to compute and produce answers, with error messages never shown to the user. |
I pressed a bad button. Ignore. |
Codecov Report
@@ Coverage Diff @@
## master #3102 +/- ##
==========================================
- Coverage 83.66% 83.66% -0.01%
==========================================
Files 687 687
Lines 336685 336689 +4
==========================================
Hits 281696 281696
- Misses 54989 54993 +4
|
I wouldn't worry too much about that. Again, we're talking about situations that are not likely to occur for normal users of GAP in an interactive context. There's no reason any process needs to be able to output anything to standard I/O, such as a process that is only writing to some other files. If, for some reason, you can't write to stderr then you just can't write to stderr and that's that. |
I feel there is a difference in philosophy here -- my personal preference is, in a system where people want to trust the results, if we reach a state where we can't tell the user information (like errors), then the best option is to quit loudly and quickly (we could try sending a useful message to C's stdout/stderr and then call exit/abort rather than a stack overflow of course, or try and recover and redirect errors to stderr if the user has redirected them elsewhere). What exactly is the need for this in libgap? I realise there is the bigger picture (someone closes stderr), but that doesn't (to me) seem like a super-urgent problem, so I assume this is coming up more often in Sage for some reason? |
Also, the best fix for GAP (when someone has redirected output) would probably be to force error output back to stderr, and if that disappears then just abort(), but that's probably not what Sage/libgap would want. |
As @embray said, this is mostly meant to support a clean shutdown of a process which is already going to shut down anyway. In detail: in Sage, we run GAP inside a pseudo-tty (pty). When we shut down that pty, the stdin/stdout/stderr streams are all closed. Whenever that happens, GAP will exit anyway because stdin is closed. Now if GAP tries to write anything to stdout/stderr before actually shutting down, we end up in this infinite error loop. |
This PR is unrelated to libGAP. |
Some history: this bug was first noticed in 2012 (long before anything like libGAP existed!) when upgrading Sage to GAP-4.5.6 There is a long discussion thread in the Sage tracker starting at https://trac.sagemath.org/ticket/13211#comment:58 for why this patch was needed. Supposedly, this bug was already reported as http://tracker.gap-system.org/issues/125 (but that link no longer seems to work) |
I haven't gotten around to it yet but I planned to today: I have a simpler version of this patch that will simply not go into the default error-handling code if an error occurs in writing to stdout or stderr since that will obviously result in an unwanted recursion. Instead I believe it should just fail silently, though there is no "right" answer there. But it would help to get out of the mindset of GAP being a command-line REPL first, but rather have low-level functionality not assume interactivity, but build interactivity on top of lower-level functionality. The problem here is that low-level I/O code can jump directly into an interactive error-handling break loop. For the sake of backwards-compatibility we should keep that behavior for now until we've had time to do deeper refactoring. But an explicit case to prevent infinite recursion is still needed here. Cases where stdout/stderr can't be written to were likely never interactive in the first place. |
…where the default stdio output streams cannot be written to
9042d29
to
56157cb
Compare
Here's a simplified alternate fix to this, which does not explicitly rely on any recursion detection, and just focuses narrowly on the affected use cases where standard I/O cannot be successfully written to. With this fix the example from the issue description works like:
The fact that it causes a "panic" is fine for the time being, since the point is to cause the program to exit cleanly. If stderr is also redirected to a broken file, then the Panic message won't be displayed either, but that's sort of a given:
|
I think this is fine for merging, the x is just to coveralls confusion. |
@ChrisJefferson Can you please add backport-4.10 to this as well? |
Backported to stable-4.10 in aef2214 |
I'm not convinced by this patch because there are still plenty of ways how error handling could run into an infinite loop (given that |
@jdemeyer I have no idea what the original version of this PR looked like, and I don't think there is a way to let GitHub show it to me. So it'd be best if you opened a fresh PR (with a references to this one) for further discussion. Or, if that's non-trivial work, at least open a new issue (this PR is closed and most people won't see a discussion here), with a link to some version of the patch you are talking about. |
@jdemeyer : I don't mind looking at different ones to this, but I am fundementally opposed to any patch which would discard errors, if that is the PR you want to bring back (of course, I don't have solo veto power, but I really don't like it). |
@jdemeyer It is well known that there are lots of ways you can get infinite loops in GAP's error handling. It's also been discussed on other issues that what is really needed here is a more thorough overhaul of GAP's error handling (for which I have some ideas, as I'm sure do you). The reason we didn't use the original version of this patch is precisely because even it was misleading in the extent to which it purposed to fix infinite loops in GAP's error handling (it doesn't). It also wasn't thread safe in any way. |
This can occur in cases where the stdio streams (specifically stderr, but it can also happen with just stdout) cannot be written to for some reason, resulting in EIO errors on
write()
calls to their file descriptors.The recursions occur because if an error occurs on writing to the stream,
ErrorQuit
is called which may in turn attempt to write to the same stream. This is similar to, but a somewhat different manifestation of #3028.I'm not really happy with this fix since it's more of a band-aid; what is really needed is broader refactoring to error handling, as discussed partly in #2487. But that's a longer-term effort. In the meantime, this is needed for libgap, preferably in 4.10.1.
To demonstrate the problem this fixes, one easy way is to just run:
With the patch, the same example will result in a few error messages as well, but then wait at the prompt (which just isn't displayed).
This is just one example. A more realistic use case is one where GAP is started as a subprocess with stdout and stderr redirected to pipes, and then one of those pipes is closed.