Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings subsystem generates (hard to detect) memory-leakages. Garbage Collector update request #774

Open
Informate opened this issue Jul 6, 2024 · 0 comments

Comments

@Informate
Copy link

Version

v22.4.0 and previous

Platform

Linux 6.5.0-41-generic

Subsystem

Strings, Garbage Collector

What steps will reproduce the bug?

At OoM in some example just 1%-2% of the heap is really used, a 5%-10% waste could be usual.
In the following at the edge example a memory waste of 99.9984% should be achieved.

$ node --max-old-space-size=6 urls-7.js
// urls-7.js

function getRandomBuffer(){
 const bufferSize = 1024 * 1024; // 1 MB
 const randomBuffer = Buffer.alloc(bufferSize);
 for (let i = 0; i < bufferSize; i++) {
  randomBuffer[i] = Math.floor(Math.random() * 256);
 }
 return randomBuffer;
}

let slices = [];
while (true) {
  let string = getRandomBuffer().toString('utf-8');
  slice = string.slice(50000,50016);
  slices.push(slice);
  __heaplog();
}

// Helper function to show memory leakages
function __heaplog(){
  let m=process.memoryUsage();
  console.log('\nHeap: '+((m.heapUsed/2**20)|0)+'/'+(m.heapTotal/2**20|0)+' Mb - RSS: '+(m.rss/2**20|0)+' Mb');
}

How often does it reproduce? Is there a required condition?

At OoM.

<--- Last few GCs --->

[5918:0x70ef000]     1805 ms: Mark-Compact 6.0 (8.1) -> 5.1 (9.9) MB, pooled: 0 MB, 3.53 / 0.00 ms  (average mu = 0.980, current mu = 0.994) task; scavenge might not succeed
[5918:0x70ef000]     1965 ms: Mark-Compact 5.9 (10.2) -> 5.4 (10.2) MB, pooled: 0 MB, 10.56 / 0.00 ms  (average mu = 0.968, current mu = 0.934) background allocation failure; GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xe21092 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [nodejs]
 2: 0x12224f0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [nodejs]
 3: 0x12227c7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [nodejs]
 4: 0x1452305  [nodejs]
 5: 0x146bb79 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [nodejs]
 6: 0x13bca68 v8::internal::StackGuard::HandleInterrupts(v8::internal::StackGuard::InterruptLevel) [nodejs]
 7: 0x187ab97 v8::internal::Runtime_StackGuardWithGap(int, unsigned long*, v8::internal::Isolate*) [nodejs]
 8: 0x1f31576  [nodejs]

What is the expected behavior? Why is that the expected behavior?

Recover from the error or treat it before it occurs.
Java and JavaScript are programming languages that manage memory and for which the programmer does not need to keep track of memory. Instead, in this situation, the programmer needs to keep track of memory and handle the situation manually.

What do you see instead?

Keeping the string resulting from RegExp operations generates a Memory Leakages.
The RegExp extracts short parts from an http document retrived from https.
The bug was already present some year ago (And never solved I suppose).
Cleaning the string will solve the memory leakages (this will leave JSON related leakages):

clean_string = JSON.parse(JSON.stringify( memory_leaking_string ));

or better (without other leaksges):

clean_string = Buffer.from(memory_leaking_string, 'utf-8').toString('utf-8');

The problem is not solved by other simple operations as .slice(0) or .toString().

Additional information

The Strings subsystem waste memory and the garbage collector is not able to recover from the heap crash.

To solve the problem and recover the garbage collector could require memory maps.

  • Bit Checking Memory Maps for memory holes of 256 bytes (256/2 [Nyquist] * 8 [Bit per Byte] = 1024) wuold require 1 Mb for each Gb of heap. But progressive memory scan could be performed with just few kbytes.
  • Full/Free Lenght Queque Memory Maps could be constructed with arbitrary memory size but could need progressive scan.
  • Random Sampling Free Memory Maps Boundries could be started with just 1K of free memory and then exploit found holes, in progressive scans.

Random Sampling operations could be possible in the standard GC scheduling and frequency could be increased on matching.
Random Sampling with just 1 sample a time would introduce a very limited overhead on GC. The sampling test would require a full GC cycle on the old space.
Exploiting a fractal partitioning algorithm ( 1° middle of the space, 2° middle of first half, 3° middle of second half, 4° middel of fisrt quarter, ...) could reduce random sampling overhead.
When the sampling test fails, succesive iterations up to the completion of GC cycle could be used to propose a new candidate (continuing the partial test on the next candidate).
With a 10% memory waste the 1 sample random sampling should match in approximately 11 full GC cycles.

Running the new GC task in a separate low-priority GC thread on the old heap space would require a limited ammount of resources.
For example using 32 sample from Random Sampling or Fractal Partitioning with free space bounds checks in the task would require [ 32 * sizeof(memory_address) * 3 bytes = 768 ] less than 1kb, plus a stack as deep as the string hierarchy tree structure multiplied by a costand that could be as small as 2,3 or just 1 (inlineing the next strings to checks in an array).

@Informate Informate changed the title Strings subsystem generate (hard to detect) memory-leakages. Garbage Collector update request Strings subsystem generates (hard to detect) memory-leakages. Garbage Collector update request Jul 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant