Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pooling and request limiting to prevent memory leaks #110

Closed
Closed
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
6c12e78
Add pooling and request limiting to prevent memory leaks
seanmorris Sep 14, 2023
e968403
Adding Pool class.
seanmorris Sep 14, 2023
b515f13
Reverting change.
seanmorris Sep 14, 2023
5c7b52d
Removing console.log
seanmorris Sep 14, 2023
a33b0c5
Max request options.
seanmorris Sep 14, 2023
f8326cd
Tweaks.
seanmorris Sep 14, 2023
cb383fe
Tweaks.
seanmorris Sep 14, 2023
ac4ea1b
Initialize all instances
seanmorris Sep 14, 2023
0b9044f
Tweak.
seanmorris Sep 14, 2023
fc0f995
Tweak.
seanmorris Sep 14, 2023
e18b1fd
Consolidating FINALLY block.
seanmorris Sep 15, 2023
0ffaad7
Cleaner pooling algorithm.
seanmorris Sep 15, 2023
3d26067
Comments
seanmorris Sep 15, 2023
909d08a
Comments.
seanmorris Sep 15, 2023
1c91f34
Comments.
seanmorris Sep 15, 2023
a86c4c9
Catch spawn failures
seanmorris Sep 15, 2023
4b634bb
Making private methods privater.
seanmorris Sep 15, 2023
3f7b796
Tweaks.
seanmorris Sep 15, 2023
db3c5d6
Tweaks.
seanmorris Sep 15, 2023
68f6290
Correcting CLI switches, killing instances when done.
seanmorris Sep 16, 2023
9167e77
Detect error in test.
seanmorris Sep 16, 2023
70c3a22
Correcting multi-await
seanmorris Sep 19, 2023
aebc3d5
Correcting multi-await
seanmorris Sep 19, 2023
96a2e82
Scaling back default maxRequests
seanmorris Sep 27, 2023
0cb98b7
Pull request tweaks.
seanmorris Oct 3, 2023
a73874f
Doc Comments.
seanmorris Oct 5, 2023
326b410
Doc Comments.
seanmorris Oct 5, 2023
966a9df
Testing tweaks.
seanmorris Oct 5, 2023
488ad3c
Testing tweaks.
seanmorris Oct 5, 2023
a7708af
Testing tweaks.
seanmorris Oct 5, 2023
e01d6d4
All tests pass.
seanmorris Oct 5, 2023
eca5995
Passing linter
seanmorris Oct 5, 2023
3d8a1c2
Passing linter
seanmorris Oct 5, 2023
271c935
Passing linter
seanmorris Oct 5, 2023
a7581dc
Temporarily skip typecheck
seanmorris Oct 5, 2023
8d968d7
Revert temp change.
seanmorris Oct 5, 2023
04a3e5e
Send 500 error to browser instead of re-queueing request.
seanmorris Oct 6, 2023
65c28e2
Correcting default fatal handler
seanmorris Oct 6, 2023
a1b809e
Abstracting node pools
seanmorris Oct 6, 2023
102adbc
Tweaks.
seanmorris Oct 6, 2023
954d33f
Tweaks
seanmorris Oct 6, 2023
6bc105d
Tweaks
seanmorris Oct 6, 2023
b3c6a08
Tweaks
seanmorris Oct 6, 2023
be0c0cb
Merge branch 'trunk' into sm-preventing-memory-leaks
seanmorris Nov 22, 2023
fabcafa
Bumping versions.
seanmorris Dec 4, 2023
56484b5
Non-working copy operation
seanmorris Dec 4, 2023
c59bea8
Revering extra change.
seanmorris Dec 6, 2023
c133a1e
PR comments.
seanmorris Dec 7, 2023
a657040
PR comments.
seanmorris Dec 8, 2023
75bd3eb
Separating debug flags into its own PR
seanmorris Dec 8, 2023
c3b2269
Incrementing verion numbers
seanmorris Dec 21, 2023
1f9f696
Formatting.
seanmorris Dec 21, 2023
f352bb8
Lint & Tests.
seanmorris Dec 21, 2023
0e6c8ba
Merge conflict.
seanmorris Dec 27, 2023
0ad0b9d
PR comment tweaks.
seanmorris Dec 27, 2023
5d18b81
Restoring codemirror dependency
seanmorris Dec 27, 2023
b09b13b
Revering extra change
seanmorris Dec 27, 2023
0bd1222
PR comment tweaks.
seanmorris Dec 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
import type { LoadingStatus } from '../../types';
import {
consumeAPI,
spawnPHPWorkerThread,
} from '@php-wasm/web';
import { consumeAPI, spawnPHPWorkerThread } from '@php-wasm/web';

import type { PHPClient } from '../../php-worker';

Expand All @@ -24,9 +21,7 @@ export class PHPLoader extends EventTarget {
/** @ts-ignore */
'../../php-worker.ts?url&worker'
);
const worker = await spawnPHPWorkerThread(
workerScriptUrl
);
const worker = await spawnPHPWorkerThread(workerScriptUrl);
const php = consumeAPI<PHPClient>(worker);
php?.onDownloadProgress((e) => {
const { loaded, total } = e.detail;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ const dirname = __dirname;

export default async function runExecutor(options: BuiltScriptExecutorSchema) {
const args = [
...(options.debug ? ['--inspect-brk'] : []),
'--loader',
join(dirname, 'loader.mjs'),
options.scriptPath,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
export interface BuiltScriptExecutorSchema {
scriptPath: string;
debug: boolean;
__unparsed__: string;
} // eslint-disable-line
5 changes: 5 additions & 0 deletions packages/nx-extensions/src/executors/built-script/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@
"description": "Path of the script to run.",
"x-prompt": "What script would you like to run?"
},
"debug": {
"type": "boolean",
"description": "Use devtools as a debugger.",
"x-prompt": "Would you like to use devtools?"
adamziel marked this conversation as resolved.
Show resolved Hide resolved
},
"__unparsed__": {
"hidden": true,
"type": "array",
Expand Down
4 changes: 1 addition & 3 deletions packages/wp-now/public/with-node-version.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,7 @@ import fs from 'fs';
// Set the minimum required/supported version of node here.

// Check if `--blueprint=` is passed in proccess.argv
const hasBlueprint = process.argv.some((arg) =>
arg.startsWith('--blueprint=')
);
const hasBlueprint = process.argv.some((arg) => arg.startsWith('--blueprint='));

const minimum = {
// `--blueprint=` requires node v20
Expand Down
4 changes: 4 additions & 0 deletions packages/wp-now/src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ export interface CliOptions {
port?: number;
blueprint?: string;
reset?: boolean;
maxRequests?: number;
}

export const enum WPNowMode {
Expand All @@ -43,6 +44,7 @@ export interface WPNowOptions {
wpContentPath?: string;
wordPressVersion?: string;
numberOfPhpInstances?: number;
maxRequests?: number;
blueprintObject?: Blueprint;
reset?: boolean;
}
Expand All @@ -54,6 +56,7 @@ export const DEFAULT_OPTIONS: WPNowOptions = {
projectPath: process.cwd(),
mode: WPNowMode.AUTO,
numberOfPhpInstances: 1,
maxRequests: 512,
reset: false,
};

Expand Down Expand Up @@ -111,6 +114,7 @@ export default async function getWpNowConfig(
phpVersion: args.php as SupportedPHPVersion,
projectPath: args.path as string,
wordPressVersion: args.wp as string,
maxRequests: args.maxRequests as number,
adamziel marked this conversation as resolved.
Show resolved Hide resolved
port,
reset: args.reset as boolean,
};
Expand Down
261 changes: 261 additions & 0 deletions packages/wp-now/src/pool.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
import { NodePHP } from '@php-wasm/node';

const Fatal = Symbol('Fatal');
const Spawn = Symbol('Spawn');
const Reap = Symbol('Reap');

let childCount = 0;

export class PoolInfo {
id = childCount++;
requests = 0;
started = Date.now();
active = false;
}

/**
* Maintains and refreshes a list of php instances
* such that each one will only be fed X number of requests
* before being discarded and replaced.
Comment on lines +109 to +111
Copy link
Collaborator

@adamziel adamziel Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any insights about the root cause of the leak? I don't think php-fpm needs to do the same kind of pooling&discarding so it must be something on our side. Emscripten issue perhaps?

Copy link
Contributor Author

@seanmorris seanmorris Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.php.net/manual/en/install.fpm.configuration.php#pm.max-requests

This pathological behavior does exist in native PHP, but won't cause problems nearly as quickly since its not restricted to a single gigabyte of memory.

Emscripten definitely introduces unique challenges. Since we're dealing with a linear, "physical" memory array, as opposed to a virtual memory system afforded by most modern OSes, we're prone to things like memory fragmentation. In that situation, we could have the entire gigabyte empty except for a few sparse allocations. If no contiguous region of memory exists for the length requested, memory allocations will fail. This tends to happen when a new request attempts to initialize a heap structure but cannot find a contiguous 2mb chunk of memory.

We can go as far as debugging PHP itself, and contributing the fix upstream. But even in this case we cannot guarantee that a third party extension will not introduce a leak sometime in the future.

Therefore, we should have a solution robust to memory leaks that come from upstream code. I think that following the native strategy is the best way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh of course, third party extensions! Sean, that's such a good answer, thank you. I'd love to see your GitHub comment as a part of this PR in an actual comment block – it clearly explains the rationale.

We can go as far as debugging PHP itself, and contributing the fix upstream. But even in this case we cannot guarantee that a third party extension will not introduce a leak sometime in the future.

Agreed, let's just stick with discarding PHP instances here.

Copy link
Contributor Author

@seanmorris seanmorris Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* Maintains and refreshes a list of php instances
* such that each one will only be fed X number of requests
* before being discarded and replaced.
*
* Since we're dealing with a linear, "physical" memory array, as opposed to a
* virtual memory system afforded by most modern OSes, we're prone to things
* like memory fragmentation. In that situation, we could have the entire
* gigabyte empty except for a few sparse allocations. If no contiguous region
* of memory exists for the length requested, memory allocations will fail.
* This tends to happen when a new request attempts to initialize a heap
* structure but cannot find a contiguous 2mb chunk of memory.
*
* We can go as far as debugging PHP itself, and contributing the fix upstream.
* But even in this case we cannot guarantee that a third party extension will
* not introduce a leak sometime in the future. Therefore, we should have a
* solution robust to memory leaks that come from upstream code. I think that
* following the native strategy is the best way.
*
* https://www.php.net/manual/en/install.fpm.configuration.php#pm.max-requests
*
*/

*/
export class Pool {
instanceInfo = new Map(); // php => PoolInfo

spawner: () => Promise<any>; // Callback to create new instances.
maxRequests: number; // Max requests to feed each instance
maxJobs: number; // Max number of instances to maintain at once.

notifiers = new Map(); // Inverted promises to notify async code of backlogged item processed.
adamziel marked this conversation as resolved.
Show resolved Hide resolved
running = new Set(); // Set of busy PHP instances.
backlog = []; // Set of request callbacks waiting to be run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same note here on the comments. these are nice to have, but we're so close to JSDoc and if we swap then the comments will follow the variables around the code in an IDE.


constructor({
spawner = async (): Promise<any> => {},
maxRequests = 2000,
maxJobs = 5,
} = {}) {
this.spawner = spawner;
this.maxRequests = maxRequests;
this.maxJobs = maxJobs;
this[Reap]();
this[Spawn]();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of using these symbols instead of simply using reap, spawn, and fatal as names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just a pattern I use for private methods.

Copy link
Contributor Author

@seanmorris seanmorris Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell This has been refactored:

const spawn = async (pool: Pool) => {
const newInstances = new Set();
if (pool.maxJobs <= 0) return newInstances;
while (pool.instanceInfo.size < pool.maxJobs) {
const info = new PoolInfo();
const instance = await pool.spawn();
pool.instanceInfo.set(instance, info);
info.active = true;
newInstances.add(instance);
}
return newInstances;
};
/**
* Reaps children if they've passed the maxRequest count.
* @param pool the pool object to work on
* @private
*/
const reap = (pool: Pool) => {
for (const [instance, info] of pool.instanceInfo) {
if (pool.maxRequests > 0 && info.requests >= pool.maxRequests) {
info.active = false;
pool.instanceInfo.delete(instance);
pool.reap(instance);
continue;
}
}
};
/**
* Handle fatal errors gracefully.
* @param pool the pool object to work on
* @param instance the php instance to clean up
* @param error the actual error that got us here
* @private
*/
const fatal = (pool: Pool, instance: PhpInstance, error: Error) => {
console.error(error);
if (instance && pool.instanceInfo.has(instance)) {
const info = pool.instanceInfo.get(instance);
info.active = false;
pool.instanceInfo.delete(instance);
}
return pool.fatal(instance, error);
};

}

/**
* Find the next available idle instance.
*/
getIdleInstance() {
const sorted = [...this.instanceInfo].sort(
(a, b) => a[1].requests - b[1].requests
);

for (const [instance, info] of sorted) {
if (this.running.has(instance)) {
continue;
}

if (!info.active) {
continue;
}
return instance;
}

return false;
}

/**
* Queue up a callback that will make a request when an
* instance becomes idle.
*/
async enqueue(item: (php: NodePHP) => Promise<any>) {
const idleInstance = this.getIdleInstance();

if (!idleInstance) {
// Defer the callback if we don't have an idle instance available.
this.backlog.push(item);

// Split a promise open so it can be accepted or
// rejected later when the item is processed.
const notifier = new Promise((accept, reject) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's resolvers now, it seems natural to rename accept to resolve.

Copy link
Collaborator

@adamziel adamziel Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, it's also used in a few other places around this class. It would be lovely to use the same words to refer to the same concepts, e.g. notifier here is a Promise, but in line 205 a notifier is an {accept, reject} object. Since it comes from resolvers, the name in like 205 could be resolver.

I know these wording changes may seem minor, but naming consistency goes a long way for a new person or when debugging a tricky issue. For example, this Gutenberg PR saved me a ton of hours down the road.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.notifiers.set(item, {accept, reject})
);

// Return the notifier so async calling code
// can still respond correctly when the item
// is finally processed.
return notifier;
} else {
adamziel marked this conversation as resolved.
Show resolved Hide resolved
// If we've got an instance available, run the provided callback.

// Given an instance, create a new callback that will clean up
// after the instance processes a request, and optionally
// will also kick off the next request.
const onCompleted = (instance) => async () => {
this.running.delete(instance);

this[Reap]();
const newInstances = this[Spawn]();

// Break out here if the backlog is empty.
if (!this.backlog.length) {
return;
}

// This is the instance that completed
// so we can re-use it...
let nextInstance = instance;

// ... but, if we've just spanwed a fresh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: spanwed -> spawned

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we preferring a newly-spawned instance over the one that just finished? is there any advantage to this inside our WASM world?

I would have guessed that passing the next request to the existing worker would exhaust the worker faster and prevent needing to create additional workers, which would lead to a lower overall memory use.

this also leads me to wonder about the pre-allocation of workers and if that's necessary. if we have maxJobs (or any more suitable name) and we're going to eagerly pay the costs of allocating them, why not defer allocation and initialization until those workers are required: that is, until we have a request come in when there are no non-active workers.

would you see any problem in reaping workers as soon as they complete a request and are over the max-requests threshold, and then only spawning new workers once there are no available non-active workers in the pool?

// instance, use that one instead.
if (newInstances.size)
for (const instance of newInstances) {
nextInstance = instance;
break;
}

const next = this.backlog.shift();
const info = this.instanceInfo.get(nextInstance);

this.running.add(nextInstance);
info.requests++;

let request;

try {
// Don't ACTUALLY do anything until the
// instance is done spawning.
request = next(await nextInstance);
}
catch(error) {
Copy link
Contributor Author

@seanmorris seanmorris Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell Let me know what you think of this logic. The alternative to re-queuing after a failed spawn is to return a 502 bad gateway.

Interested to know your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused on what happens in these cases and probably need to think about it more. One thing that worries me is that if there is something intrinsic in the request that led to the failure, and we unshift it to the beginning of the queue, could we get into situations that lead to infinite and infinitely-fast re-death of newly spawned processes. I'm not sure if that worry is sound though, as I don't fully understand what's happening with these workers. It's not like they are OS processor or have the same characteristics of overhead and startup time.

In general, a 500 seems like an acceptable response. Why specifically a 502? The lines here are blurry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly an HTTP daemon will serve a 502 if it can't properly instantiate PHP to handle a request, but a 500 seems valid as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell @adamziel Lets revisit this, I don't want to forget about it.

Copy link
Contributor Author

@seanmorris seanmorris Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamziel @dmsnell Refactored to return the error to calling code:

// Grab the reject handler from the notfier
// promise and run it if the request rejects.
request.catch((error) => {
const resolver = this.resolvers.get(next);
this.resolvers.delete(next);
resolver.reject(fatal(this, nextInstance, error));
});

// Re-queue the request if the instance
// failed initialization.
this.backlog.unshift(next);

// Catch any errors and log to the console.
// Deactivate the instance.
this[Fatal](nextInstance, error);

// Return the notifier so async calling code
// can still respond correctly when the item
// is finally processed.
return this.notifiers.get(next);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given how we're using this, why not return an array of newly-spawned processes, or return the most-recently spawned process? that way we wouldn't have to do the jumping jacks here to get the last one. it looks like this is the only place we're relying on the value returned by spawn


const completed = onCompleted(nextInstance);

// Make sure onComplete & running.delete run
// no matter how the request resolves.
request.finally(() => {
this.running.delete(nextInstance);
completed();
});

// Grab the accept handler from the notfier
// promise and run it if the request resolves.
request.then((ret) => {
const notifier = this.notifiers.get(next);
this.notifiers.delete(next);
notifier.accept(ret);
});

// Grab the reject handler from the notfier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: notfier -> notifier

// promise and run it if the request rejects.
request.catch((error) => {
const notifier = this.notifiers.get(next);
this.notifiers.delete(next);
notifier.reject(error);
// Catch any errors and log to the console.
// Deactivate the instance.
this[Fatal](nextInstance, error);
});
};

const info = this.instanceInfo.get(idleInstance);

this.running.add(idleInstance);
info.requests++;

let request;

try {
// Don't ACTUALLY do anything until the
// instance is done spawning.
request = item(await idleInstance);
}
catch(error) {
// Re-queue the request if the instance
// failed initialization.
this.backlog.unshift(item);

// Catch any errors and log to the console.
// Deactivate the instance.
this[Fatal](idleInstance, error);

// Split a promise open so it can be accepted or
// rejected later when the item is processed.
const notifier = new Promise((accept, reject) =>
this.notifiers.set(item, {accept, reject})
);

// Return the notifier so async calling code
// can still respond correctly when the item
// is finally processed.
return notifier;
}

// Make sure onComplete runs no matter how the request resolves.
request.finally(onCompleted(idleInstance));

// Catch any errors and log to the console.
// Deactivate the instance.
request.catch((error) => this[Fatal](idleInstance, error));

return request;
}
}

/**
* PRIVATE
* Spawns new instances if the pool is not full.
* Returns a list of new instances.
*/
[Spawn]() {
const newInstances = new Set();
while (this.maxJobs > 0 && this.instanceInfo.size < this.maxJobs) {
const info = new PoolInfo();
const instance = this.spawner();
this.instanceInfo.set(instance, info);
info.active = true;
newInstances.add(instance);
}
return newInstances;
}

/**
* PRIVATE
* Reaps children if they've passed the maxRequest count.
*/
[Reap]() {
for (const [instance, info] of this.instanceInfo) {
if (this.maxRequests > 0 && info.requests >= this.maxRequests) {
info.active = false;
this.instanceInfo.delete(instance);
// instance.then(unwrapped => unwrapped.destroy());
continue;
}
}
}

/**
* PRIVATE
* Handle fatal errors gracefully.
*/
[Fatal](instance, error) {
console.error(error);
if (this.instanceInfo.has(instance)) {
const info = this.instanceInfo.get(instance);
info.active = false;
this.instanceInfo.delete(instance);
}
}
}
6 changes: 6 additions & 0 deletions packages/wp-now/src/run-cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,11 @@ export async function runCli() {
'Create a new project environment, destroying the old project environment.',
type: 'boolean',
});
yargs.option('maxRequests', {
describe:
'Max number of requests before refreshing PHP instance.',
type: 'number',
});
},
async (argv) => {
const spinner = startSpinner('Starting the server...');
Expand All @@ -87,6 +92,7 @@ export async function runCli() {
port: argv.port as number,
blueprint: argv.blueprint as string,
reset: argv.reset as boolean,
maxRequests: argv.maxRequests as number,
});
portFinder.setPort(options.port as number);
const { url } = await startServer(options);
Expand Down
Loading