Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks (bbMemAlloc & bbMemFree) #310

Open
thareh opened this issue Feb 22, 2024 · 10 comments
Open

Memory leaks (bbMemAlloc & bbMemFree) #310

thareh opened this issue Feb 22, 2024 · 10 comments

Comments

@thareh
Copy link
Contributor

thareh commented Feb 22, 2024

Good day,

Seems there is an issue with bbMemAlloc & bbMemFree.

Works:

Framework BRL.Blitz
Import BRL.StandardIO

Repeat
    Local str1:String = "foofoofoofoofoofoofoofoo"
    
    Local a:Byte Ptr = str1.ToUTF8String()
    
    MemFree(a)
    
    For Local i:Int = 0 Until 10
        GCCollect()
    Next
    
    Print GCMemAlloced()
Forever

Leaks:

Framework BRL.Blitz
Import BRL.StandardIO

Repeat
    Local str1:String = "foofoofoofoofoofoofoofoo"
    Local str2:String = "barbarbarbarbarbarbarbar"
    
    Local a:Byte Ptr = str1.ToUTF8String()
    Local b:Byte Ptr = str2.ToUTF8String()
    
    MemFree(a)
    MemFree(b)
    
    For Local i:Int = 0 Until 10
        GCCollect()
    Next
    
    Print GCMemAlloced()
Forever

When using debug it works as expected.

Cheers.

@thareh
Copy link
Contributor Author

thareh commented Feb 22, 2024

Framework BRL.Blitz
Import BRL.StandardIO

Repeat
    Local str1:String = "foofoofoofoofoofoofoofoo"
    Local str2:String = "barbarbarbarbarbarbarbar"
    
    Local a:Byte Ptr = str1.ToUTF8String()
    MemFree(a)
    
    Local b:Byte Ptr = str2.ToUTF8String()
    MemFree(b)
    
    For Local i:Int = 0 Until 10
        GCCollect()
    Next
    
    Print GCMemAlloced()
Forever

Works as expected as well, seems making 2 pointers in a row confuses the GC or something.

Cheers.

@GWRon
Copy link
Contributor

GWRon commented Feb 22, 2024

I checked for the generated C code ... and replaced the content of it with the fundamentals being done "inside".

Framework BRL.Blitz
Import BRL.StandardIO

Local runs:int
Repeat
'direct C-code to ease "experimenting"
'! void *p1 = bbMemAlloc(1000);
'! void *p2 = bbMemAlloc(1000);
'! bbMemFree(p1);
'! bbMemFree(p2);

	For Local i:Int = 0 Until 10
		GCCollect()
	Next
    
	If runs mod 100 = 0
		Print GCMemAlloced()
	EndIf
	runs :+ 1
Until runs = 1000

output:

Building untitled1
[  9%] Processing:untitled1.bmx
[ 91%] Compiling:untitled1.bmx.console.release.linux.x64.c
[100%] Linking:untitled1
Executing:untitled1
45056
249856
454656
659456
864256
1069056
1273856
1478656
1683456
1888256

Process complete

If you remove the second memory bbmemalloc and bbmemfree - and the printed value stays constant.

also using one "non-gc-malloc" makes it constant:

'! void *p1 = bbMemAlloc(1000);
'! void *p2 = malloc(1000);
'! bbMemFree(p1);
'! free(p2);

using other GC-malloc variants also result in constant values:

'! void *p1 = GC_malloc_atomic(1000);
'! void *p2 = GC_malloc_atomic(1000);
'! bbMemFree(p1);
'! bbMemFree(p2);

(but I did not dig about the differences yet).

'! void *p1 = GC_malloc_uncollectable(1000);
'! void *p2 = GC_malloc_uncollectable(1000);

results similar to the atomic-version to increased values.

@GWRon
Copy link
Contributor

GWRon commented Feb 22, 2024

I think I found something ...

Checking aboves GC_malloc_uncollectable I found this source:
https://github.com/ivmai/bdwgc/blob/master/malloc.c

there it defines that gc_malloc_uncollectable is this:

/* Allocate lb bytes of pointerful, traced, but not collectible data.   */
GC_API GC_ATTR_MALLOC void * GC_CALL GC_malloc_uncollectable(size_t lb)
{
  return GC_generic_malloc_uncollectable(lb, UNCOLLECTABLE);
}

So it at the end calls GC_generic_malloc_uncollectable - which is:

GC_API GC_ATTR_MALLOC void * GC_CALL GC_generic_malloc_uncollectable(
                                                        size_t lb, int k)
{
    void *op;
    size_t lb_orig = lb;

    GC_ASSERT(k < MAXOBJKINDS);
    if (EXTRA_BYTES != 0 && EXPECT(lb != 0, TRUE)) lb--;
                /* We do not need the extra byte, since this will   */
                /* not be collected anyway.                         */

    if (SMALL_OBJ(lb)) {
        void **opp;

[...]
    } else {
      op = GC_generic_malloc_aligned(lb, k, 0 /* flags */, 0 /* align_m1 */);
      if (op /* != NULL */) { /* CPPCHECK */
        hdr * hhdr = HDR(op);

Wait ... what... SMALL_OBJ ?

So ... I simply increased the amount of to allocate memory: from 1000 to 10000

'! void *p1 = GC_malloc_uncollectable(100000);
'! void *p2 = GC_malloc_uncollectable(100000);
'! bbMemFree(p1);
'! bbMemFree(p2);

et voila.. printed values stay low (in this case alternating):

40960
45056
40960
45056
40960
45056
40960
45056
40960
45056

so for small objects the GC behaviour might either be bugged - or it somehow decides to keep them for almost forever.

PS: SMALL_OBJ is:

/*  Max size objects supported by freelist (larger objects are  */
/*  allocated directly with allchblk(), by rounding to the next */
/*  multiple of HBLKSIZE).                                      */
#define CPP_MAXOBJBYTES (CPP_HBLKSIZE/2)
#define MAXOBJBYTES ((size_t)CPP_MAXOBJBYTES)

Edit: regarding "keeping them for almost forever".
If you replace the "Until runs = 1000" with a bigger value (eg 100000) it will become slower and slower for each addition 100 runs. This means the objects seem to not be freed actually hogging the GC.

@HurryStarfish
Copy link
Member

This seems to leak as well:

SuperStrict
Framework BRL.Blitz
Import BRL.StandardIO

Global Arr:Size_T[300]
For Local i:Int = 0 Until Arr.length
	F
	Function F()
		For Local j:Int = 1 To 10
			G()
			Function G()
				Local a:Byte Ptr = MemAlloc(100)
				MemFree(a)
				
				Local b:Byte Ptr = MemAlloc(100)
				MemFree(b)
				
				GCCollect()
				GCCollect()
			End Function
		Next
	End Function
	
	Arr[i] = GCMemAlloced()
Next

For Local i:Int = 0 Until Arr.length
	Print Arr[i]
Next

So it isn't related to having Print in the loop, it isn't anything to do with strings, it isn't confined to the current function call and it isn't just because of the two allocations "being in a row". Having two or more GCCollect() in the loop seems to be necessary to trigger it. With only one or none, the value stays constant. Also an additional GC allocation like Local c:Int[]=New Int[123] somewhere in the loop changes stops the memory from increasing every iteration of the loop and instead causes it to grow in larger chunks. If it's not a bug, my best guess is that it's either a bug or some sort of usage heuristic in boehmgc, but it's strange.

@GWRon
Copy link
Contributor

GWRon commented Feb 22, 2024

If my findings are correct then it looks like it is either an incorrect usage or inappropriate choice of bwdgc functions in NG ...or a bug in bwdgc itself.

Saying this based on the direct use of their functions as written in my previous replies

@thareh
Copy link
Contributor Author

thareh commented Feb 23, 2024

Ah I see, so it only happens when using GCCollect multiple times... I was having high memory use so I found somewhere on the internet that you can use the collect function of BDWGC multiple times to clear as much memory as possible. But still... it shouldn't bug out like that I guess.

@GWRon
Copy link
Contributor

GWRon commented Feb 23, 2024

regarding calling GCCollect multiple times ...

Why does it then not increase if you do:

    Local a:Byte Ptr = str1.ToUTF8String()
    MemFree(a)
    
    Local b:Byte Ptr = str2.ToUTF8String()
    MemFree(b)

instead of

    Local a:Byte Ptr = str1.ToUTF8String()
    Local b:Byte Ptr = str2.ToUTF8String()

    MemFree(a)
    MemFree(b)

... if it is connected to collecting memory then this indicates a memfree() optionally resulting in a "gccollect" (indirectly) too.


This aside.
If you replace GCCollect() with GCCollectALittle() then it also results in constant memory usage.

Framework BRL.Blitz
Import BRL.StandardIO

Local runs:int
Repeat
'direct C-code to ease "experimenting"
'! void *p1 = bbMemAlloc(1000);
'! void *p2 = bbMemAlloc(1000);
'! bbMemFree(p1);
'! bbMemFree(p2);

	For Local i:Int = 0 Until 10
		GCCollectALittle()
	Next
    
	If runs mod 100 = 0
		Print GCMemAlloced()
	EndIf
	runs :+ 1
Until runs = 1000

I assume this happens because GCCollectALittle() can simply do nothing regarding collection if the GC thinks it is not approbriate to do so / not needed for now:

	For Local i:Int = 0 Until 10
		If Not GCCollectALittle() then print "runs="+runs+"  i="+i+": no need to GCCollect a little"
	Next

Results in a lot of prints ... I tried the opposite (printing when it has done something ) - and this line is not printed at all, so during the loops GCCollectALittle() always returns 0 / seems to decide to not do anything.

Maybe the GC thinks it is not worth to "clean up" albeit it should do somewhen?


So ... I increased the "Until runs =" part to 10000 - with GCCollectALittle() this is still blazing fast (while the GCCollect() starts to ... stutter and becomes slower and slower).
The interesting part for me is now that the GCMemAlloced() value changes after some iterations - while the "print" never happens - it is automatically / periodically running GCCollect() behind the scenes as we are running in "auto mode".

We should call GCSetMode(2) at the begin of the code to avoid auto-GC-ing interferring with our experiments:

Framework BRL.Blitz
Import BRL.StandardIO

'disable auto-gc-ing!
GCSetMode(2)

Local runs:int
Repeat
'direct C-code to ease "experimenting"
'! void *p1 = bbMemAlloc(1000);
'! void *p2 = bbMemAlloc(1000);
'! bbMemFree(p1);
'! bbMemFree(p2);

	For Local i:Int = 0 Until 10
		If GCCollectALittle() then print "runs="+runs+"  i="+i+": need to GCCollect a little"
	Next
    
	If runs mod 100 = 0
		Print GCMemAlloced()
	EndIf
	runs :+ 1
Until runs = 10000

(this never spits out that print-line - but commenting out the GCSetMode() line would result in the GCMemAlloced getting lower after some runs - because "auto gc" kicked in)

@GWRon
Copy link
Contributor

GWRon commented Feb 23, 2024

Regarding GCCollectALittle ...

blitz.mod/bdwgc/tests/gctest.c contains both calls - GC_gcollect() (which is GCCollect() for us) and GC_collect_a_little() (no "g" before collect) which is our GCCollectALittle().

The code there is for example this:

    /* Garbage collect repeatedly so that all inaccessible objects      */
    /* can be finalized.                                                */
      while (GC_collect_a_little()) { }
      for (i = 0; i < 16; i++) {
        GC_gcollect();
#       ifndef GC_NO_FINALIZATION
#         ifdef FINALIZE_ON_DEMAND
            late_finalize_count +=
#         endif
                GC_invoke_finalizers();
#       endif

So they mix it in this specific test - in other tests they simply call GC_gcollect() which means the users are ought to use GC_gcollect() in most situations - as we do.

@GWRon
Copy link
Contributor

GWRon commented Feb 23, 2024

Just in case someone is tinkering with it ... you can receive some GC stats (pay attention that "print blabla" is allocating memory too... - so take care when printing out things during your loop iterations)

local gcstats:SGCStats 'defined in brl.mod/blitz.mod/blitz.bmx
GCGetStats(gcstats)

print "GC Stats:"
print "  heapsize: " + gcstats.heapsize 'Heap size in bytes (including the area unmapped to OS)
print "  freeBytes: " + gcstats.freeBytes 'Total bytes contained in free and unmapped blocks.
print "  unmappedByes: " + gcstats.unmappedBytes 'Amount of memory unmapped to OS.
print "  bytesAllocedSinceGC: " + gcstats.bytesAllocedSinceGC 'Number of bytes allocated since the recent collection. 
print "  allocedBytesBeforeGC: " + gcstats.allocedBytesBeforeGC 'Number of bytes allocated before the recent garbage collection.
print "  nonGCBytes: " + gcstats.nonGCBytes 'Number of bytes not considered candidates for garbage collection.
print "  GCCycleNo: " + gcstats.GCCycleNo 'Garbage collection cycle number.
print "  markersM1: " + gcstats.markersM1 'Number of marker threads (excluding the initiating one) (0 if single-threaded)
print "  bytesReclaimedSinceGC: " + gcstats.bytesReclaimedSinceGC 'Approximate number of reclaimed bytes after recent GC.
print "  reclaimedBytesBeforeGC: " + gcstats.reclaimedBytesBeforeGC 'Approximate number of bytes reclaimed before the recent garbage collection.
print "  freedBytesSinceGC: " + gcstats.freedBytesSinceGC 'Number of bytes freed explicitly since the recent GC.
print "  obtainedFromOSBytes: " + gcstats.obtainedFromOSBytes 'Total amount of memory obtained from OS, in bytes.

I printed stats before running the "repeat" loop and another print afterwards:

[100%] Linking:test_gc
Executing:test_gc
GC Stats:
  heapsize: 262144
  freeBytes: 221184
  unmappedByes: 0
  bytesAllocedSinceGC: 5504
  allocedBytesBeforeGC: 0
  nonGCBytes: 1888
  GCCycleNo: 1
  markersM1: 11
  bytesReclaimedSinceGC: 0
  reclaimedBytesBeforeGC: 0
  freedBytesSinceGC: 160
  obtainedFromOSBytes: 1638400
Initial: 262144
45056
249856
454656
659456
864256
1069056
1273856
1478656
1683456
1888256
Final: 2088960
GC Stats:
  heapsize: 2502656
  freeBytes: 409600
  unmappedByes: 0
  bytesAllocedSinceGC: 112
  allocedBytesBeforeGC: 2055408
  nonGCBytes: 18446744073708529504
  GCCycleNo: 10001
  markersM1: 11
  bytesReclaimedSinceGC: 4032
  reclaimedBytesBeforeGC: 1024000
  freedBytesSinceGC: 0
  obtainedFromOSBytes: 4141056

seems nonGCBytes: 18446744073708529504 looks a bit ... seems I have around 16,777,216 TB of memory :p

Edit: that odd value is not an "wrong struct definition"-bug within NG - the GC itself spits it out too:

'!printf("%lu", GC_get_non_gc_bytes());fflush(stdout);

as the result should be of type "unsigned long" (on my linux box) according to this:

#ifdef _WIN64
# if defined(__int64) && !defined(CPPCHECK)
    typedef unsigned __int64 GC_word;
    typedef __int64 GC_signed_word;
# else
    typedef unsigned long long GC_word;
    typedef long long GC_signed_word;
# endif
#else
  typedef unsigned long GC_word;
  typedef long GC_signed_word;
#endif

@GWRon
Copy link
Contributor

GWRon commented Feb 23, 2024

Just to rule out an already fixed bug in the bdwgc lib itself I updated blitz.mod/bdwgc to the latest libatomic_ops (libatomic should be provided by the compiler now) and latest bdwgc from their git repo.

Results stay the same - so either a config thing at our side or a still lurking bug in the bdwgc repo.

Did someone of you already have a .c-file prepared to test it out more easily (config of the gc etc) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants