-
-
Notifications
You must be signed in to change notification settings - Fork 421
Rework of -profile=gc calculations #1846
Rework of -profile=gc calculations #1846
Conversation
e429767
to
b1b02d8
Compare
(going to abuse auto-tester a bit) |
68324f6
to
91b7fb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the small comments and the non-in-depth review of the code generating bits. LGTM. It is much more sensible than the previous implementation.
src/gc/impl/conservative/gc.d
Outdated
@@ -529,7 +529,7 @@ class ConservativeGC : GC | |||
} | |||
|
|||
gcx.log_malloc(p, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some distant future it might be good to gcx.log_malloc
alloc_size
too...
src/rt/tracegc.d
Outdated
extern (C) void* _d_arrayliteralTX(const TypeInfo ti, size_t length); | ||
extern (C) void* _d_assocarrayliteralTX(const TypeInfo_AssociativeArray ti, void[] keys, void[] vals); | ||
mixin(generateTraceWrappers()); | ||
//pragma(msg, generateTraceWrappers()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess commented pragmas should be gone.
src/rt/tracegc.d
Outdated
accumulate(file, line, funcname, "closure", sz); | ||
return _d_allocmemory(sz); | ||
void foo(int x, double y) { } | ||
static assert (Arguments!foo == "x, y, "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I got completely lost with all the code generation. I will let someone else more used to black magic to review it.
Anyone interested in having a look at this? Maybe @MartinNowak ? |
FYI: current state is "works like a charm for me but I haven't found time to rework how test suite runs so that it can pass on all platforms with more extensive tests added" |
I think this is quite important, the current profiling information given by the GC is pretty much garbage, and giving wrong information could be more dangerous than not giving any information at all. So any help here is very appreciated. Also, besides the currently not ideal testing situation and failure on some platforms, it would be nice to know if the general solution looks good for other druntime maintainers. ping @dlang/team-druntime |
High level design seems to reasonable to me. |
I just had a look at the results for FreeBSD 32 and it's just an ordering issue. I guess something is wrong (or incomplete) with b64ac1f. According to the auto-tester, the BSD 32 results are:
And the one in the test case:
Neither seem to be sorted by "name" for the 16 "allocated" values. |
Just for the records, the correct order should be:
|
Having a quick look, But then I saw a pattern, so I think we have 2 problems here:
|
OK, checking:
|
Instead of relying on the rather inaccurate callbacks by the compiler (e.g. anything allocated in the precompiled runtime will not be detected), I recently used a proxy GC that just forwards the interface calls to the actual GC. It records allocation information in malloc/calloc/qalloc calls (including the call stack which can then be used to determine the call location). This also avoids modifing the GC interface and implementation. |
But can you really know exactly how much you allocated at that level of abstraction? How would you do that, just querying the GC to know what's the real capacity of the reserved block? |
I guess you are referring to the difference between requested allocation size, returned size and actually used size, e.g. in array code in rt.lifetime. With intercepting the GC interface, you know the requested and allocated memory precisely (using qalloc), but can only estimate how much of it is in actual use. The array interface functions yield some other information, but without knowing the exact implementation in rt.lifetime, calculating memory usage is quite some guesswork. For example, how do you interpret setlength? Will it allocate new memory, extend existing memory without allocation, or will it do nothing? How much memory will a new allocation actually use? |
For our purpose it does not matter how much exactly is used - only impact it makes on total memory usage by GC. With current implementation Uncoincidentally, |
I kind of like the idea of using proxy GC instead of compiler support but don't want to lose type information current approach provides. Having to investigate backtrace for each logged allocation is a lot of work. |
The TypeInfo parameter is supposed to give you that information, though it might not always be passed correctly now, e.g. when extending a block. The precise GC fixes that. |
7047357
to
9ec72d9
Compare
Huh, green |
9ec72d9
to
46aedf2
Compare
Ping. Any review/requests for this one? It is good to go from my side in its basic form. |
Intermediate step for generating tracegc handlers automatically intended to simplify reviewing by reducing size of each individual diff.
Replaces imprecise/wrong calculation based exclusively on checking TypeInfo - now the data is retrieved directly from the GC and represents real amount of bytes allocated. Fix issue https://issues.dlang.org/show_bug.cgi?id=17294 Fix issue https://issues.dlang.org/show_bug.cgi?id=16280 Fix issue https://issues.dlang.org/show_bug.cgi?id=15481
Ordering by name as last measure means order of lines in the generate trace will be reproducible, allowing for easy test cases.
Moves code snippet which refers to various language features that allocate to actual test to ensure better functionality coverage.
f439ea0
to
e1a6984
Compare
Causes closure allocation despite being marked as scope
Right now biggest performance problem comes from how crazy expensive calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM despite the few comments.
@@ -41,12 +42,11 @@ __gshared | |||
|
|||
extern (C) void profilegc_setlogfilename(string name) | |||
{ | |||
logfilename = name; | |||
logfilename = name ~ "\0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unrelated to this commit.
@@ -143,8 +152,7 @@ shared static ~this() | |||
if (counts.length) | |||
{ | |||
qsort(counts.ptr, counts.length, Result.sizeof, &Result.qsort_cmp); | |||
|
|||
FILE* fp = logfilename.length == 0 ? stdout : fopen((logfilename ~ '\0').ptr, "w"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, is because of this. I would maybe make the commit message a bit more verbose that this commit is about avoid using language features that indirectly use the GC, not about removing some explicit recursive usage.
src/rt/profilegc.d
Outdated
@@ -48,6 +48,9 @@ extern (C) void profilegc_setlogfilename(string name) | |||
public void accumulate(string file, uint line, string funcname, string type, | |||
size_t sz) @nogc | |||
{ | |||
if (sz == 0) | |||
return; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just an optimization, right? Also some more verbose commit message could help.
Maybe do that in a separate PR to avoid this one to get stalled? Specially because that change has to be done very carefully, as one mistake might trigger wrong results and it's very hard to track/find out (speaking from experience). |
ping @DmitryOlshansky @MartinNowak @andralex @rainers @wilzbach Can you merge or give us some feedback if anything needs to be changed? |
I would like to expand this PR much further to get a lot of other stats that are already kept by the GG (at least when profiling is enabled). Since keeping some stats might be a bit expensive, in particular timing stats (we need to overcome not having Conceptually, I think New
(and this PR introduces Ideally we should be able to enable, disable and reset individual stats (and reset to a specific value rather than zero might be useful in some cases), so this can be done in a "only pay for what you need" style. But if it's too complicated, I guess we can also have a couple of categories (like "allocation" or "collection" or "counters" or "timing") or levels of detail (like "shallow" to only get the very basic stuff, like total collection time, and "deep" to get more fine-grained stats like "markTime" and other intermediate timings). It might be useful to discuss this briefly at DConf to reduce the back and forth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leandro-lucarella-sociomantic LMK when this is ready - we can pull it now and expand it later.
A simple way to enable/disable collection of specific stats is to write -1
or some invalid value to the values that are not of interest, i.e. if numReallocs
is set to -1
the reallocations won't be counted. This is of course of interest mostly for the more costly stats.
@@ -122,6 +122,9 @@ __gshared long extendTime; | |||
__gshared long otherTime; | |||
__gshared long lockTime; | |||
|
|||
// thread-local counter | |||
size_t bytesAllocated; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ulong
is better here; size_t
is for sizes of objects in memory, whereas bytesAllocated
is a tally. It could go over 4 GB on a 32-bit system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copy&pasted what the GC is already currently gathering (when some profiling options are included). But yeah, we could review how these are being stored too (there is also the inconsistency of some timing being long
and some Duration
).
This has been superseded by #2607 |
Another attempt for #1806, now using
thread-local counter inside GC to circumvent sync issues.