-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory canary #2293
Memory canary #2293
Conversation
1832e2d
to
fb085fa
Compare
8088880
to
4d0c41d
Compare
Codecov Report
@@ Coverage Diff @@
## master #2293 +/- ##
==========================================
- Coverage 72.33% 72.3% -0.03%
==========================================
Files 479 479
Lines 250960 251956 +996
==========================================
+ Hits 181538 182185 +647
- Misses 69422 69771 +349
|
The first commit makes perfect sense to me, and I'd merge it instantly if it was a separate PR -- in case the second commit takes longer to review, it might be worthwhile to get the first one in ASAP, to avoid bitrot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks sane and very useful to me. I have not yet actually tried it yet, though. Only have minor code nitpicks.
[AS_HELP_STRING([--enable-memory-checking], [enable memory checking])], | ||
[AC_DEFINE([GAP_MEM_CHECK], [1], [define if building with memory checking])], | ||
[enable_memory_checking=no] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option is not available in HPC-GAP, is it? Both because it is GASMAN specific, and because it does not perform any thread locking.
It would be kind of nice if this gave an error here if HPCGAP is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And perhaps also add in "WARNING: this makes GAP super slow" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing: you already put in some kind of valgrind based memory checking in the past. Looking at this configure option, it is not clear which of the various memory debugging techniques this refers to...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope to tidy up and document the valgrind stuff at some point. I have added a message on GAP startup saying it takes a long time to start up.
lib/init.g
Outdated
@@ -560,6 +560,9 @@ BindGlobal( "ShowKernelInformation", function() | |||
if GAPInfo.KernelInfo.KernelDebug then | |||
Add( config, "KernelDebug" ); | |||
fi; | |||
if GAPInfo.KernelInfo.MemCheck then | |||
Add(config, "MemCheck (EXPERIMENTAL)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the "EXPERIMENTAL"? When would it be removed, if ever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimental might be a bad word. If we later changed GCs to boehm it would get removed, and I could imagine there could be later changes which significantly change the behaviour. Also I don't really want people to start using this as their standard GAP, because it is slower and more limited in a number of ways (doesn't give memory back to the OS for example).
src/gap.c
Outdated
Obj FuncGASMAN_MEM_CHECK(Obj self, Obj newval) | ||
{ | ||
EnableMemCheck = INT_INTOBJ(newval); | ||
return True; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this return true
, instead of 0?
src/gasman.c
Outdated
* a reference to the internals of a GAP object are kept across | ||
* a garbage collection (which moves GAP objects around). | ||
* | ||
* MultiHeap provides a method of detecting such problems, at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time the word "MultiHeap" appears, which I find confusing. Either get rid of this; or else "define" it before use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change this. I used to call this MultiHeap.
src/gasman.c
Outdated
* | ||
* MultiHeap provides a method of detecting such problems, at | ||
* the cost of GREATLY decreased performance (Starting GAP in | ||
* MultiHeap mode takes hours, rather than seconds). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR, you say startup takes "days", here "hours" -- which is it? On what kind of system (clearly I can find old systems where it's much slower, so here really some info on the machine would be useful to gauge the severity of this slowdown)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on the machine. about 18 hours on Linux, 4 days on the linux subsystem on windows, haven't timed it on a mac yet. I'll just say 'days'.
src/gasman.c
Outdated
return; | ||
|
||
Int newBase = oldBase + 1; | ||
// 0 is special, skip it when cycling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation seems to start in the middle of a thought, i.e. the reader has to already understand what's going on generally to make sense of it. Perhaps this is clearer?
// cycle to next memory area, wrapping around at end, and skipping area 0 (which is special)
Int newBase = oldBase + 1;
if (newBase >= GetMembufCount())
newBase = 1;
As you can see, I also use >=
instead of ==
for paranoia -- the interface does not seem to guarantee that the return value of GetMembufCount()
is constant. This could be "fixed" by replacing GetMembufCount()
by a global constant (enum).
src/gasman.c
Outdated
|
||
MoveBagMemory(GetMembuf(oldBase), GetMembuf(newBase)); | ||
|
||
// Block access to old memory, provide access to new memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consistency: "Block" -> "block", remove trailing full stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, perhaps "provide" -> "enable" ?
* | ||
* The following code is used by GAP_MEM_CHECK support, which is | ||
* documented in gasman.c | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps move this comment before the #if/#error
, as it is already relevant to those?
src/system.c
Outdated
} | ||
} | ||
|
||
// Make the first membufs the earlier in memory. This is where the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that valid english? In any case, I find it confusing, how about this alternative instead? "Sort the memory buffers ascending by their address, so that the first buffer comes first in memory. This is the one where the master pointers will live".
src/system.c
Outdated
{ 0 , "quitonbreak", toggle, &SyQuitOnBreak, 0}, /* Quit GAP if we enter the break loop */ | ||
{ 0, "", 0, 0, 0}}; | ||
{ 0, "", 0, 0, 0 } | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that running git clang-format
is quite convenient, but here it just makes reviewing this part of the PR hard (or impossible). I also find the formatted output quite ugly in parts, du to the way it wraps. So perhaps first have a commit which reformats this struct (ideally with some manual movement of comments, to make it less ugly, but that's optional); and then modify it in a commit after that?
485d291
to
1c33c19
Compare
The only missing part from the PR now (I believe) is that it does not forbid building with HPC-GAP, but the better fix there is I believe to block use with BOEHM_GC, which is related to another open patch. |
1c33c19
to
928e4ab
Compare
This needs to be rebased now. Also, that other PR which introduced |
src/gasman.c
Outdated
|
||
UInt GetMembufCount(void); | ||
void * GetMembuf(UInt i); | ||
UInt GetMembufSize(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit wary about declaring external function prototypes outside a .h file -- this way, it is very easy to change the function implementation, while forgetting to adjust the prototype here, and there will be no warning. So I'd feel better if these were declared in e.g. system.h
, inside an #ifdef GAP_MEM_CHECK / #endif
block.
This allows for a seperation between the end of the master pointers, and the beginning of the bag data area
928e4ab
to
fa302e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work!
This adds a new configure option to GAP which enables an extremely slow debug mode, which can find problems where references to memory are incorrectly kept across garbage collections.
fa302e7
to
9550010
Compare
This is an old piece of code I had, simplified and resurrected.
This PR is split into two pieces:
The first, bigger but simpler commit, introduces to GASMAN the concept of
MptrEndBags
, which denotes the end of the master pointers. Previously, the start of bag memory (calledOldBags
) was also the end of the master pointers. This lets the two move independantly. This is useful for part 2, but also might be useful for other things (I know I, and others, have considered changing GASMAN to rearrange objects for locality as it GCs. That would be much easier after this PR).The second, crazier but more self contained part, adds a configure-time switch (
--enable-memory-checking
), which adds to GAP the ability to find places where pointers into bag data are incorrectly accessed after a call to GASMAN.The easiest way to see this in action is using the 3rd commit (which I will remove before merging!) which adds a simple but extremely hard to track down bug (if you didn't know where it was) in GASMAN_LIMITS(), where we try to write to a bag location after making a new plist (note the code also catches invalid reads as well as writes).
Do:
If you run GAP in a debugger, it will drop you onto the bad line.
You can start checks at startup by running GAP with
--enableMemCheck
. Note that GAP will take a LONG time to start (I'm talking a couple of days).While this code has some slightly crazy memory things, I've tried to keep them tightly defined within
#ifdef GAP_MEM_CHECK
. I've also tried to keep the code short (there are a number of things one could do more cleverly, but the code is already quite scary).