-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype for GAP_Enter/Leave macros #3096
Conversation
To add: I developed this prototype against the GAP interface for Sage (which I am still planning to rework as a generic Python interface, so there is nothing particularly specific to Sage here; this should solve similar problems for other interfaces that embed GAP). I mention this though because the Sage/GAP interface tests this out quite extensively, and for that purpose at least this prototype works very well, and solves many problems I was having: In particular segfaults and other errors due to local variables being garbage-collected, and other problems related to error handling. It feels quite stable. If it helps I can post some additional sample code and/or tests that demonstrate the kinds of problems this solves. As it is, there is certainly also a need for more documentation in this code. |
Codecov Report
@@ Coverage Diff @@
## master #3096 +/- ##
==========================================
- Coverage 84.76% 84.76% -0.01%
==========================================
Files 687 688 +1
Lines 335864 335887 +23
==========================================
+ Hits 284690 284705 +15
- Misses 51174 51182 +8
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested as outlined in the previous review.
Sorry I haven't had time to address the review comments: I will look at them now. |
Cleaned up most of the minor nitpicks, but now doing some larger refactoring surrounding management of the |
I think this works, but I want to do more testing later when I have more time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this seems to be shaping up nicely!
Finally finished testing this version of the patch against Sage and it looks good there too. |
The Julia build failed again, but in this case due to a failed
|
The julia issue is resolved on master. It might be a good idea to rebase this PR on the latest master anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your continued work on this, the code seems to be shaping up quite well, the patch is small and clean. Still, I have a bunch of comments.
} | ||
|
||
|
||
static volatile sig_atomic_t EnterStackCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be sig_atomic_t
? Which signal handler is calling code modifying this variable? If none, revert to int
; otherwise, please add a comment explaining it (can be brief)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None specifically, but my concern here is that one could: E.g. a signal handler installed by GAP which runs and results in an error, which can in turn result in running code that does modify this variable. I could try to concoct a real example, but something like that seems plausible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GAP never installs such a signal handler, and indeed, calling any kind of GAP code from a signal handler would be very dangerous and is unlikely to work.
The reason I am asking is not that I want to pain the bikeshed; but rather it is that using sig_atomic_t
IMHO adds considerable to the complexity of this code. If I or somebody else need to edit that code, I may wonder "but why is it using sig_atomic_t?", and not knowing the answer means that I don't understand the code, and hence better not touch it; or touch it but with a bad feeling that I might be breaking something.
This is why I ask whether it is really necessary, and if so, that a comment be added explaining it. Even if that comment says something like "// It is unclear whether we need sig_atomic_t here, but just out of paranoia, we use it anyway".
That said, I'd still prefer if it was not used; I honestly can't think of any example where it might be used that is not horribly unsafe anyway. So if you can think of any, I'd be most interested to hear about it, even if it is just a rough sketch.
The argument that one "could" need it seems week to me: with that argument, perhaps we should also add a mutex protecting access to EnterStackCount
(or require C11 atomics, or something like that)? After all, multiple threads might want to use the GAP API. Sure, it's unlikely that this can work, but one could try it anyway.... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure on what basis you can assert that it "adds considerable...complexity". Exactly what complexity is it adding? sig_atomic_t
is standard POSIX, and all it means is that it's an instruction to the compiler to only manipulate this variable with single instructions. Furthermore, according to the GCC docs, "In practice, you can assume that int
is atomic...these assumptions are true on all of the machines that the GNU C Library supports and on all POSIX systems we know of."
It's really just a guarantee that if this code happens to be interrupted by a signal (with a signal handler that returns) it will leave things in a sane state. For a signal handler that doesn't return--i.e. exits the process or longjumps to somewhere arbitrary) you can't make any further guarantees, but at the very least you can safely increment and decrement it even if a signal occurs. That's all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the very least you can safely increment and decrement it even if a signal occurs. That's all.
Just to clarify this: the type sig_atomic_t
means that it can be read from atomically and written to atomically. Any operation more complicated than that (such as an increment/decrement) may not be atomic. For example, the operations EnterStackCount = -EnterStackCount;
and EnterStackCount++;
may not be atomic.
So I actually agree with @fingolfin here: better not make any guarantees about atomicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True--in that case it's probably best avoided. I know in Cysignals itself sig_block()/unblock()
just set a flag that determines what to do in the actual signal handler should a signal occur. But in the case of libgap we want to leave most details of signal handling up to the embedding application, so it would be tricky to implement something like this in a way that can be sensibly embedded in arbitrary applications. So perhaps it's best to just keep it simple.
Regarding your idea, I'm not saying it can't be done. I just don't understand exactly what you're proposing. Keep in mind these macros should be able to work properly recursively. Though the exact value of EnterStackCount is less important than just tracking whether or not we're in a recursive call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obvious question: do we really need to support nesting for these macros? It would certainly simplify things if we could say that nesting is not allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory one could write code such that they never have to be nested. For example there are cases where we could refactor a function some_func_that_uses_gap()
into some_func_that_uses_gap_outer()
and `some_func_that_uses_gap_inner(), where the latter can be called directly in other code that is already bracketed by the Enter/Leave macros.
So technically it's possible. But it does put a higher strain on the user in exactly how they structure their code, and it's ugly (imagine if sig_on and sig_off could not be nested). You would also have to actively prevent the user from nesting them somehow if it did not work (otherwise this would lead to hard to trace bugs). I think it's better to allow nesting. Any complication that arises from that seems better and less probable than not allowing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I propose a simple and concrete solution:
- In GAP, don't try to make these macros signal-safe. Do you know about other applications using libGAP which also need to handle signals?
- In SageMath, wrap
GAP_Enter
/GAP_Leave
insig_block
/sig_unblock
calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is essentially the status quo (we don't have the Sage part yet but that's easy, and I agree with it). So I'm fine with that for lack of a better solution.
Perhaps at most this should be documented, and left to individual users to worry about. E.g. for the GAP-Julia interface something might want to be done, but it would be just as Julia-specific as Cysignals is Python-specific. Other applications may not care as much about handling signals at all.
if (buffer) | ||
printf("%s\n", buffer); | ||
ok = GAP_Enter(); | ||
if (ok) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... I wonder if the generic fallback code for GAP_EnterStack
will work in this scenario? After all, we only reference the location of that particular stack variable. But we don't bother walking back to the start of the stack of the current function. So e.g. res
could be outside the stack range considered by the GASMAN stack scanner, couldn't it? In which case there'd be a risk of a crash again.
To test this, I'd disable the GNU C version for it, and then change the loop below to perform some allocations and trigger garbage collections explicitly (we may need to expose a GAP_GC(int full)
function for that; it would also be useful for some other uses, I believe).
Anyway, the "fix" in calling code would be to move all local variables storing GAP object references to inside the block between GAP_Enter
and GAP_Leave
(and to explicitly document this, too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you might be right. Do you even know off the top of your head (I'm not sure) if there is any guarantee whatsoever that there is any relation between the order some local variables are declared in C code, and what order they are placed on the stack (of course it could use registers for some of them too in which case it's irrelevant)? AFAIK the C standard doesn't even know anything about a "stack" as that's a platform-specific implementation detail. But realistically speaking this discussion is only relevant where there is one.
So I think there's only so much we can do about this--it's just a best guess but not fool-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already talked about exposing CollectBags
in #3030.
It's not exactly clear to me what you want me to do here though. Just manually test this case, or actually update the test to test this case in general somehow (e.g. have a copy of these tests that are compiled in such a way that forces the fallback)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C standard definately puts no requirements on the order of variables on the stack, and will mix them up however it likes. If you are capturing a reference to a local variable in your fallback, the old remotely safe option would be to call GAP_enter, and then call a non-inlined function to actually do the work.
I'd probably just not put in a fallback which captures a reference to a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm 100% positive that fallback is going to lead to hard-to-track and horrible crashes, if there are any GAP variables in the current function scope, or in any inlined functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and of course we don't expect you, @embray , to provide fallbacks fo "every imaginable architecture". My stance on that is that it's better to wait for somebody who actually uses those architectures anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that in practice most compilers do seem to allocate locals in order of definition on the stack
I have noted this as well, at least with gcc. That's why I thought it was an "acceptable" fallback in the first place.
but taking an address of it and passing it to a non-inline function should avoid that
It does, again at least in the case of gcc (and actually I can't think of any other reasonable thing for any compiler to do in that case: how are you supposed to pass a pointer to a value in a register?) And of course, for locals that the compiler puts in registers this is a moot point (GASMAN handles that case separately with its setjmp hack).
So indeed, it seems best to remove the generic fallback, and wait for people to complain that their favorite compiler does not work anymore with it.
+1 I don't like it, but it does seem like the safest option for now, alas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW Boehm GC does use a similar fallback (they also declare the dummy variable volatile, which I intended to do as well but forgot): https://github.com/ivmai/bdwgc/blob/c50de12ab045b57953e152545836c23f76e7bc24/mark_rts.c#L502
It also implements a GC_get_stack_base()
for quite a number of different platforms, that I think is also supposed to work properly on per-thread stcks.
they would also break GAP by itself. If this ever happens, we'll have to go back to the drawing board anyway
I think that may be the case anyways. Is there really an advantage to continuing to maintain a separate GC when something more heavily used and robust like Boehm can be used? Or at least, maybe, a customized fork thereof if there's really some fine-tuning necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gasman is much faster than Boehm, and various people's attempt to tune it haven't closed the gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth at least borrowing bits from, such as those linked above, where applicable (and assuming license compatibility). It would be great, longer-term, to get rid of this EnterStack/LeaveStack stuff entirely.
b0325e6
to
6005543
Compare
51a4ad8
to
4bb3063
Compare
The one failed Travis build appears to be a fluke:
Not sure if a coverage decrease of -0.9% is real or not. |
I'm sorry to ask, but you could rebase on top of master? We've had some churn of tests and coverage, and it would be nice to have a clean check? If you aren't happy with that, I can do it for you (just because we've messed things around for a while!) |
(I'd also consider squashing it down into a single commit; but that's not a requirement, we can perform that squash when merging it, too) |
Okay, I'll rebase and squash. |
4bb3063
to
548ff1a
Compare
Rebased and squashed; also updated the commit message a bit to more accurately reflect the final product. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So from my POV, this could be merged now (ideally, I'd like to rename MarkStackBottomBags
, but I can do that after this PR is merged, too).
Would be nice to hear if @ChrisJefferson and @markuspf are OK with this, too. But if I hear no objections from anybody soon, I'll just merge this :)
I'm happy for this to be merged, we can always cycle more later |
Why don't I just go ahead and rename |
…bjects in code which embeds libgap There are two parts to this: First, the outer-most GAP_Enter() must set the StackBottom variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see gap-system#3089). Second, the outer-most GAP_Enter() call should set a jump point for longjmps to STATE(ReadJmpError). Code within the GAP kernel may reset this, but we should set it here in case any unexpected errors occur within the GAP kernel that are not already handled appropriately by a TRY_IF_NO_ERROR. Or, more precisely, since TRY_IF_NO_ERROR does not currently *clear* the jump buffer it sets even when leaving the block without any errors found, a subsequent error that is not within a TRY_IF_NO_ERROR block can still result in a jump to the last TRY_IF_NO_ERROR block which was entered, which can be quite arbitrary (e.g. inside IntializeGap). For the first issue, we add GAP_EnterStack() and GAP_LeaveStack() macros which implement *just* the StackBottom handling without any other error handling. We also add a function to gasman.c called SetStackBottomBags which just updates the StackBottom variable. Then the GAP_EnterStack() macro can be used within a function to set StackBottom to somewhere at or near the beginning of that function's stack frame. This uses GCC's __builtin_frame_address, but supported is probably needed for other platforms that don't have this. The global variable EnterStackCount in libgap-api.c is used to track recursive calls into GAP_EnterStack(). We only want to set StackBottom on the outer-most call. The count is decremented on GAP_LeaveStack(). For setting the STATE(ReadJmpError) jump buffer we provide a macro called GAP_Error_Setjmp() which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly with GAP_EnterStack(). In particular, if returning to the site of a GAP_Error_Setjmp() call we do not want to accidentally re-increment the EnterStackCount. Finally, the higher-level GAP_Enter() and GAP_Leave() macros are provided: The latter is just an alias for GAP_LeaveStack(), but the former carefully combines *both* GAP_Error_Setjmp() and GAP_EnterStack(). Although called like a function, the GAP_Enter() macro expands to a compound statement (necessary for both GAP_EnterStack() and GAP_Error_Setjmp() to work properly). The order of expansion is also deliberate so that this can be used like: jmp_retval = GAP_Enter(); so that the return value of the setjmp call can be stored in a variable while using GAP_Enter(), and can be checked to see whether an error occurred. However, this requires some care to ensure that the following GAP_EnterStack() doesn't increment the EnterStackCount following a return to this point via a longjmp. Also updates the libgap API tests to demonstrate usage of GAP_Enter/Leave(). Note: The whole EnterStackCount manipulation is really really only in service to GASMAN and can be disabled entirely when compiled for a different GC, but it still has some use as a sanity check on GAP_Enter/Leave() so for not it is mostly but not entirely a no-op on GAP built with other GCs.
548ff1a
to
fef88b4
Compare
Thank you! 🥂 |
Thank you |
Backported via PR #3232 |
Prototype for GAP_Enter/Leave macros to bracket use of libgap and stack local GAP objects in code which embeds libgap. See #3089 for discussion.
There are two parts to this:
First, the outer-most
GAP_Enter()
must set theStackBottom
variable for GASMAN, without which objects tracked by GASMAN that are allocated on the stack are properly tracked (see #3089).Second, the outer-most
GAP_Enter()
call should set a jump point for longjmps toSTATE(ReadJmpError)
. Code within the GAP kernel may reset this, but we shouldset it here in case any unexpected errors occur within the GAP kernel that are
not already handled appropriately by a
TRY_IF_NO_ERROR
.For the first issue, we add
GAP_EnterStack()
andGAP_LeaveStack()
macros which implement just theStackBottom
handling without any other error handling. We also add a function to gasman.c called_MarkStackBottomBags
which just updates the StackBottom variable (this could be implemented for other GC's as well, either as a no-op or whatever else is appropriate). Then the macro calledMarkStackBottomBags
(same name without underscore) can be used within a function to setStackBottom
to somewhere at or near the beginning of that function's stack frame. This usesGCC's
__builtin_frame_address
, but support is probably needed for other platforms that don't have this.The state variable
STATE(EnterStackCount)
is used to track recursive calls intoGAP_EnterStack()
. We only want to setStackBottom
on the outer-most call. The count is decremented onGAP_LeaveStack()
. Some functions are provided for manipulating the counter from the API without directly exposing the GAP state, but I'm not sure if this is necessary or desirable, especially since it meansEnterStackCount
isn't updated atomically. My hope was to avoid exposing too many GAP internals, but it may be necessary in order to implement these as macros.For setting the
STATE(ReadJmpError)
jump buffer we provide a macro calledGAP_Error_Setjmp()
which is fairly straightforward, except that it needs to be written in such a way that it can be used in concert correctly withGAP_EnterStack()
. In particular, if returning to the site of aGAP_Error_Setjmp()
call we do not want to accidentally re-increment theEnterStackCount
.Finally, the higher-level
GAP_Enter()
andGAP_Leave()
macros are provided: The latter is just an alias forGAP_LeaveStack()
, but the former carefully combines bothGAP_Error_Setjmp()
andGAP_EnterStack()
. Although called like a function, theGAP_Enter()
macro expands to a compound statement (necessary for bothGAP_EnterStack()
andGAP_Error_Setjmp()
to work properly). The order of expansion is also deliberate so that this can be used like:so that the return value of the setjmp call can be stored in a variable while using
GAP_Enter()
, and can be checked to see whether an error occurred. However, this requires some care to ensure that the followingGAP_EnterStack()
doesn't increment theEnterStackCount
following a return to this point via a longjmp.