-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow disabling the GAP kernel's signal handling when using the API #3072
Allow disabling the GAP kernel's signal handling when using the API #3072
Conversation
Oops, doesn't even compile. Weird, I must have accidentally hit 'x' in vim or something. |
7417a58
to
8a9207f
Compare
Need to fix libgap-api test. Alternatively, if we don't want to modify the signature for Either way some version of this fix would be nice to have backported to 4.10.x (cc @alex-konovalov) |
I also just discovered the Has this been handled in any way by any of the other GAP interfaces? |
The IO package handles this. The appropriate function is |
I don't know what you mean "The IO package handles this". The problem is that merely initialing the GAP library should not, at least optionally (as in this PR), register any signal handlers in the first place. For cases like calling |
What I mean is, what would be the appropriate context in which to call I wonder if it would make sense if, rather that unconditionally installing this |
CheckChildStatusChanged should be called if you install your own SIGCHLD handler, to pass information about any SIGCHLD events you receive to GAP. It could be added to the libgap-abi. We could also add some kind of global switch to stop the signal handler being installed. Unfortunately avoiding the "collect up any other zombie children" loop, or removing the SIGCHLD once we are done, is a major change, because GAP at the moment doesn't track children it doesn't care about anymore, but just throws all information about them away. Note that this won't solve problems with the IO package, which installs it's own signal handler. It overwrite's GAPs, and then uses CheckChildStatusChanged to pass signals to GAP (we do it that way around, which might seem strange, because the IO package does keep track of the children it cares about, so it can check all children it cares about before forwarding any remaining children onto GAP). However, this is only installed if you use the functions in the IO package which create children, like IO_fork, which I suspect will create other horrible issues in Sage anyway, as forking GAP in Sage would be, I guess, a bad idea? |
I'm not too worried about the IO package right now since it's not part of the standard GAP install in Sage. But it is important to deal with sooner or later. I don't think "forking GAP" is such a problem. When GAP is just being used as a library you're not "forking GAP", you're just forking the whole process, which happens all the time for various reasons and generally works fine. |
This has to be rebased. |
8a9207f
to
99fe35c
Compare
Codecov Report
@@ Coverage Diff @@
## master #3072 +/- ##
==========================================
+ Coverage 85.16% 85.16% +<.01%
==========================================
Files 696 696
Lines 344234 344235 +1
==========================================
+ Hits 293159 293160 +1
Misses 51075 51075
|
Rebased. I still think |
What's the status of this? This has to be rebased (again)... |
Mostly just waiting on approval... My question in #3072 (comment) could use some response if anyone has any ideas. I don't know what the right way is--if any--to pass flags set via But resolution of that issue can still come separately, as this already partially fixes the issue. |
@embray it can not be approved without rebasing and then analysing up-to-date CI tests. |
99fe35c
to
74ff0a5
Compare
Rebased, anyways. |
Now test fails:
|
Yes. That's weird. I thought I fixed those. |
I see. I have a local copy of this branch on a different computer where I did fix that. |
…in other applications
74ff0a5
to
1ba3127
Compare
Backported to 4.10 in 9b73f71 |
There is still the |
@jdemeyer Yep, I never got a satisfactory answer on what to do about that. |
Note that the problem with GAP's |
Can you explain the problem with calling waitpid on all processes? |
Unfortunately, a very long-standing GAP issue (which is, I believe, unfixable at this point, as too much code relies on the current behaviour) is that GAP spawns child processes which it simply "forgets" about. Obviously when those processes end, they become zombies. Therefore at some point those processes have to be cleaned up with waitpid, else we will run out of slots. Does sage require nothing ever calls waitpid(-1,...)? I'm afraid if that's the case, we may have a very difficult to fix problem. |
Absolutely. It is quite reasonable to |
Is that really so difficult to fix? Are there that many places where GAP spawns child processes? |
Note: This is not just about "sage" but any software that integrates GAP and that happens to spawn processes. E.g. a generic python-gap interface would have severe problems when trying to use Python's multiprocessing module, because of this. This will be a problem for gap-julia as well. |
One could in principle find all the places where the kernel (and kernel extensions) stop keeping track of processes, add all those PIDs to a list, then go through that list calling waitpid on every element. However, that would obviously be a lot more work than using waitpid(-1) when GAP is being used as normal. |
Where are those places where a child process is created and then forgotten? I tried to look for that, but couldn't find anything in the GAP sources (excluding HPC-GAP). There is only one place where static void ChildStatusChanged(int whichsig)
{
UInt i;
int status;
int retcode;
assert(whichsig == SIGCHLD);
HashLock(PtyIOStreams);
for (i = 0; i < MAX_PTYS; i++) {
if (PtyIOStreams[i].inuse) {
retcode = waitpid(PtyIOStreams[i].childPID, &status,
WNOHANG | WUNTRACED);
if (retcode != -1 && retcode != 0 &&
(WIFEXITED(status) || WIFSIGNALED(status))) {
PtyIOStreams[i].changed = 1;
PtyIOStreams[i].status = status;
PtyIOStreams[i].blocked = 0;
}
}
}
HashUnlock(PtyIOStreams);
#if !defined(HPCGAP)
/* Collect up any other zombie children */
do {
retcode = waitpid(-1, &status, WNOHANG);
if (retcode == -1 && errno != ECHILD)
Pr("#E Unexpected waitpid error %d\n", errno, 0);
} while (retcode != 0 && retcode != -1);
signal(SIGCHLD, ChildStatusChanged);
#endif
} There is also one /* turn off the SIGCHLD handling, so that we can be sure to collect this child
`After that, we call the old signal handler, in case any other children have died in the
meantime. This resets the handler */
func2 = signal( SIGCHLD, SIG_DFL ); I don't find any calls of |
There have been moves in the last few versions of GAP to peck away at these problems. Having a look, we might actually have about cleaned most of them up. We do use popen in the profiling code, but I think it cleans up after itself. The big remaining problem is IO_fork in the IO package. The IO package calls waitpid in a list, and makes a list of the PIDs it finds. GAP needs to "clean up" the PIDs of the IO package makes if it's signal handler is uninstalled as well. |
Folks, discussing these things on a PR that was closed almost two month ago is a great way to make sure few people will see it or participate... |
Not being able to do this easily is a problem for embedding in other applications that have their own signal handling. This can be worked around by saving/restoring the signal handling state before/after
GAP_Initialize
, but it would be better if we could just disable GAP from setting its own SIGINT handler completely.