prevent infinite recursions in echoandcheck and SyWriteandcheck #3102

embray · 2018-12-11T12:17:08Z

This can occur in cases where the stdio streams (specifically stderr, but it can also happen with just stdout) cannot be written to for some reason, resulting in EIO errors on write() calls to their file descriptors.

The recursions occur because if an error occurs on writing to the stream, ErrorQuit is called which may in turn attempt to write to the same stream. This is similar to, but a somewhat different manifestation of #3028.

I'm not really happy with this fix since it's more of a band-aid; what is really needed is broader refactoring to error handling, as discussed partly in #2487. But that's a longer-term effort. In the meantime, this is needed for libgap, preferably in 4.10.1.

To demonstrate the problem this fixes, one easy way is to just run:

$ ./gap > /dev/full
$ ./gap > /dev/full
Error, Cannot write to file descriptor 1, see 'LastSystemError();'
 in
  Print( " ", btop, "   ", gap, " ", GAPInfo.BuildVersion, " of ", sysdate, "\n", " ", vert, "  GAP  ", vert,
 "   https://www.gap-system.org\n", " ", bbot, "   Architecture: ", GAPInfo.Architecture, "\n"
 ); at /home/embray/src/gap-system/gap/lib/init.g:510 called from
ShowKernelInformation(  ); at /home/embray/src/gap-system/gap/lib/init.g:565 called from
func(  ); at /home/embray/src/gap-system/gap/lib/system.g:215 called from
<function "CallAndInstallPostRestore">( <arguments> )
 called from read-eval loop at /home/embray/src/gap-system/gap/lib/init.g:567
Error, Cannot write to file descriptor 1, see 'LastSystemError();'

< ... some more errors ... >

<function "SESSION">( <arguments> )
 called from read-eval loop at *stdin*:1
Syntax error: ; expected

Segmentation fault (core dumped)

With the patch, the same example will result in a few error messages as well, but then wait at the prompt (which just isn't displayed).

This is just one example. A more realistic use case is one where GAP is started as a subprocess with stdout and stderr redirected to pipes, and then one of those pipes is closed.

ChrisJefferson · 2018-12-11T21:07:25Z

While I agree with the direction of this, I don't think this is the way to fix it. There are two main problems:

this isn't thread safe for hpcgap.
I worry we could longjmp out of this function without decreasing depth.

ChrisJefferson · 2018-12-11T21:17:15Z

I also wonder what the best thing to do in this case is.. if you can't print error messages, terrible things are happening. While I understand a segfault is bad, I'm not sure what's better. I wouldn't want the error message just being thrown away for example.

ChrisJefferson · 2018-12-12T08:38:53Z

For example, I'd we committed this, we could end up silently dropping writes to files, without any feedback.

embray · 2018-12-12T09:33:24Z

I had the same concerns, but this patch has been applied, in some form or other, in Sage's GAP package for quite a long time without any concerns. That was before multi-threaded GAP though: Perhaps it should instead it should use a thread-local variable for this in one of the state slots: Something specifically for detecting recursion into I/O functions while trying to handle error I/O...

if you can't print error messages, terrible things are happening.

That's an exaggeration. Perfectly normal things may be happening and I gave such an example: Where stderr is replaced with a pipe and one end of the pipe is closed. This should not cause a segfault.

While I understand a segfault is bad, I'm not sure what's better. I wouldn't want the error message just being thrown away for example.

In most cases where this might happen it would be during process shutdown anyways. This might even happen if GAP would have otherwise exited with a normal exit code. Instead it results in the GAP process hanging for a long time in a loop until it segfaults. I think it would be better in this case to just exit silently.

embray · 2018-12-12T11:01:01Z

Out of curiosity I looked into how Python handles this sort of case in the roughly equivalent parts of its code, and it mostly just ignores such write errors and doesn't make a big deal out of them. It will still work as best it can. Even if you run the Python REPL like python > /dev/full it will mostly work.

Typing input still echos characters if readline is enabled, since it will still enable terminal echo if at least stdin is a TTY. Same on GAP with this patch--at least enough to type quit;.

embray · 2018-12-12T11:23:42Z

Also,

I worry we could longjmp out of this function without decreasing depth.

That's not really a problem, since that only occurs if there is an error writing to the error stream; you would basically never recover in this case and it would stop trying to go into ErrorQuit where it can't display errors.

In other words, it's try once, and once an error has occurred trying to output to some stream it won't try to raise an error about it again.

Other approaches would be to either not raise an error in this case at all, at least if fid == 3 or maybe if fid == 1, or to somehow handle this from within the error-handling code, but that seems fragile.

ChrisJefferson · 2018-12-12T12:09:04Z

I wouldn't worry if we exited cleanly, instead of segfaulting, but I worry about hiding errors. What would Sage like? We could exit cleanly, or refuse to do any more work, but I'm worried about continuing to compute and produce answers, with error messages never shown to the user.

ChrisJefferson · 2018-12-12T12:09:21Z

I pressed a bad button. Ignore.

codecov · 2018-12-12T12:09:41Z

Codecov Report

Merging #3102 into master will decrease coverage by <.01%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #3102      +/-   ##
==========================================
- Coverage   83.66%   83.66%   -0.01%     
==========================================
  Files         687      687              
  Lines      336685   336689       +4     
==========================================
  Hits       281696   281696              
- Misses      54989    54993       +4

Impacted Files	Coverage Δ
src/sysfiles.c	`41.03% <0%> (-0.12%)`	⬇️
hpcgap/lib/hpc/stdtasks.g	`71.77% <0%> (-0.41%)`	⬇️
src/stats.c	`95.3% <0%> (+0.2%)`	⬆️
src/objset.c	`85.35% <0%> (+0.22%)`	⬆️

embray · 2018-12-12T14:48:23Z

I'm worried about continuing to compute and produce answers, with error messages never shown to the user

I wouldn't worry too much about that. Again, we're talking about situations that are not likely to occur for normal users of GAP in an interactive context. There's no reason any process needs to be able to output anything to standard I/O, such as a process that is only writing to some other files. If, for some reason, you can't write to stderr then you just can't write to stderr and that's that.

ChrisJefferson · 2018-12-12T15:51:40Z

I feel there is a difference in philosophy here -- my personal preference is, in a system where people want to trust the results, if we reach a state where we can't tell the user information (like errors), then the best option is to quit loudly and quickly (we could try sending a useful message to C's stdout/stderr and then call exit/abort rather than a stack overflow of course, or try and recover and redirect errors to stderr if the user has redirected them elsewhere).

What exactly is the need for this in libgap? I realise there is the bigger picture (someone closes stderr), but that doesn't (to me) seem like a super-urgent problem, so I assume this is coming up more often in Sage for some reason?

ChrisJefferson · 2018-12-12T16:20:18Z

Also, the best fix for GAP (when someone has redirected output) would probably be to force error output back to stderr, and if that disappears then just abort(), but that's probably not what Sage/libgap would want.

jdemeyer · 2018-12-18T12:43:11Z

As @embray said, this is mostly meant to support a clean shutdown of a process which is already going to shut down anyway.

In detail: in Sage, we run GAP inside a pseudo-tty (pty). When we shut down that pty, the stdin/stdout/stderr streams are all closed. Whenever that happens, GAP will exit anyway because stdin is closed. Now if GAP tries to write anything to stdout/stderr before actually shutting down, we end up in this infinite error loop.

jdemeyer · 2018-12-18T12:49:26Z

What exactly is the need for this in libgap?

This PR is unrelated to libGAP.

jdemeyer · 2018-12-18T12:58:42Z

Some history: this bug was first noticed in 2012 (long before anything like libGAP existed!) when upgrading Sage to GAP-4.5.6

There is a long discussion thread in the Sage tracker starting at https://trac.sagemath.org/ticket/13211#comment:58 for why this patch was needed.

Supposedly, this bug was already reported as http://tracker.gap-system.org/issues/125 (but that link no longer seems to work)

embray · 2018-12-19T10:22:34Z

I haven't gotten around to it yet but I planned to today: I have a simpler version of this patch that will simply not go into the default error-handling code if an error occurs in writing to stdout or stderr since that will obviously result in an unwanted recursion.

Instead I believe it should just fail silently, though there is no "right" answer there. But it would help to get out of the mindset of GAP being a command-line REPL first, but rather have low-level functionality not assume interactivity, but build interactivity on top of lower-level functionality. The problem here is that low-level I/O code can jump directly into an interactive error-handling break loop.

For the sake of backwards-compatibility we should keep that behavior for now until we've had time to do deeper refactoring. But an explicit case to prevent infinite recursion is still needed here. Cases where stdout/stderr can't be written to were likely never interactive in the first place.

…where the default stdio output streams cannot be written to

embray · 2018-12-28T16:50:49Z

Here's a simplified alternate fix to this, which does not explicitly rely on any recursion detection, and just focuses narrowly on the affected use cases where standard I/O cannot be successfully written to.

With this fix the example from the issue description works like:

$ ./gap > /dev/full
Panic in src/sysfiles.c:1558: Could not write to stdout/stderr.
$ echo $?
1

The fact that it causes a "panic" is fine for the time being, since the point is to cause the program to exit cleanly. Panic() as currently implemented is a problem from the perspective of the libgap API since it forces the program to exit, but right now the use-case this issue is concerned about is just spawning gap as a subprocess.

If stderr is also redirected to a broken file, then the Panic message won't be displayed either, but that's sort of a given:

$ ./gap > /dev/full 2>&1
$ echo $?
1

coveralls · 2018-12-28T17:31:05Z

Coverage decreased (-0.3%) to 82.827% when pulling 56157cb on embray:io/echoandcheck-recursion into 68f012b on gap-system:master.

ChrisJefferson · 2019-01-08T12:03:51Z

I think this is fine for merging, the x is just to coveralls confusion.

embray · 2019-01-08T13:33:10Z

@ChrisJefferson Can you please add backport-4.10 to this as well?

fingolfin · 2019-02-13T14:42:36Z

Backported to stable-4.10 in aef2214

jdemeyer · 2019-02-14T22:31:20Z

I'm not convinced by this patch because there are still plenty of ways how error handling could run into an infinite loop (given that ErrorInner is actually a GAP function which is customizable and can generate an error in plenty of ways). I prefer the original version with the explicit check for recursion.

fingolfin · 2019-02-14T22:42:26Z

@jdemeyer I have no idea what the original version of this PR looked like, and I don't think there is a way to let GitHub show it to me. So it'd be best if you opened a fresh PR (with a references to this one) for further discussion. Or, if that's non-trivial work, at least open a new issue (this PR is closed and most people won't see a discussion here), with a link to some version of the patch you are talking about.

ChrisJefferson · 2019-02-14T23:02:43Z

@jdemeyer : I don't mind looking at different ones to this, but I am fundementally opposed to any patch which would discard errors, if that is the PR you want to bring back (of course, I don't have solo veto power, but I really don't like it).

embray · 2019-02-21T11:05:14Z

@jdemeyer It is well known that there are lots of ways you can get infinite loops in GAP's error handling. It's also been discussed on other issues that what is really needed here is a more thorough overhaul of GAP's error handling (for which I have some ideas, as I'm sure do you).

The reason we didn't use the original version of this patch is precisely because even it was misleading in the extent to which it purposed to fix infinite loops in GAP's error handling (it doesn't). It also wasn't thread safe in any way.

ChrisJefferson closed this Dec 12, 2018

ChrisJefferson reopened this Dec 12, 2018

prevent infinite recursions in echoandcheck SyWriteandcheck in cases …

56157cb

…where the default stdio output streams cannot be written to

embray force-pushed the io/echoandcheck-recursion branch from 9042d29 to 56157cb Compare December 28, 2018 16:46

ChrisJefferson approved these changes Jan 6, 2019

View reviewed changes

ChrisJefferson added the backport-to-4.10 label Jan 8, 2019

ChrisJefferson merged commit 48b871b into gap-system:master Jan 8, 2019

embray deleted the io/echoandcheck-recursion branch January 10, 2019 13:47

fingolfin added the backport-to-4.10-DONE label Feb 13, 2019

fingolfin removed the backport-to-4.10 label Feb 13, 2019

gap-system deleted a comment from codecov bot Feb 14, 2019

olexandr-konovalov added this to the GAP 4.10.1 milestone Feb 23, 2019

olexandr-konovalov changed the title ~~prevent infinite recursions in echoandcheck SyWriteandcheck~~ prevent infinite recursions in echoandcheck and SyWriteandcheck Feb 23, 2019

olexandr-konovalov added release notes: added PRs introducing changes that have since been mentioned in the release notes topic: libgap things related to libgap labels Feb 23, 2019

nthiery mentioned this pull request Jan 13, 2023

Upgrade to GAP 4.10 sagemath/sage#22626

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prevent infinite recursions in echoandcheck and SyWriteandcheck #3102

prevent infinite recursions in echoandcheck and SyWriteandcheck #3102

embray commented Dec 11, 2018 •

edited

Loading

ChrisJefferson commented Dec 11, 2018

ChrisJefferson commented Dec 11, 2018

ChrisJefferson commented Dec 12, 2018

embray commented Dec 12, 2018

embray commented Dec 12, 2018

embray commented Dec 12, 2018

ChrisJefferson commented Dec 12, 2018 •

edited

Loading

ChrisJefferson commented Dec 12, 2018

codecov bot commented Dec 12, 2018 •

edited

Loading

embray commented Dec 12, 2018

ChrisJefferson commented Dec 12, 2018

ChrisJefferson commented Dec 12, 2018

jdemeyer commented Dec 18, 2018

jdemeyer commented Dec 18, 2018

jdemeyer commented Dec 18, 2018

embray commented Dec 19, 2018 •

edited

Loading

embray commented Dec 28, 2018

coveralls commented Dec 28, 2018

ChrisJefferson commented Jan 8, 2019

embray commented Jan 8, 2019

fingolfin commented Feb 13, 2019

jdemeyer commented Feb 14, 2019

fingolfin commented Feb 14, 2019

ChrisJefferson commented Feb 14, 2019

embray commented Feb 21, 2019

prevent infinite recursions in echoandcheck and SyWriteandcheck #3102

prevent infinite recursions in echoandcheck and SyWriteandcheck #3102

Conversation

embray commented Dec 11, 2018 • edited Loading

ChrisJefferson commented Dec 11, 2018

ChrisJefferson commented Dec 11, 2018

ChrisJefferson commented Dec 12, 2018

embray commented Dec 12, 2018

embray commented Dec 12, 2018

embray commented Dec 12, 2018

ChrisJefferson commented Dec 12, 2018 • edited Loading

ChrisJefferson commented Dec 12, 2018

codecov bot commented Dec 12, 2018 • edited Loading

Codecov Report

embray commented Dec 12, 2018

ChrisJefferson commented Dec 12, 2018

ChrisJefferson commented Dec 12, 2018

jdemeyer commented Dec 18, 2018

jdemeyer commented Dec 18, 2018

jdemeyer commented Dec 18, 2018

embray commented Dec 19, 2018 • edited Loading

embray commented Dec 28, 2018

coveralls commented Dec 28, 2018

ChrisJefferson commented Jan 8, 2019

embray commented Jan 8, 2019

fingolfin commented Feb 13, 2019

jdemeyer commented Feb 14, 2019

fingolfin commented Feb 14, 2019

ChrisJefferson commented Feb 14, 2019

embray commented Feb 21, 2019

embray commented Dec 11, 2018 •

edited

Loading

ChrisJefferson commented Dec 12, 2018 •

edited

Loading

codecov bot commented Dec 12, 2018 •

edited

Loading

embray commented Dec 19, 2018 •

edited

Loading