Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pooling and request limiting to prevent memory leaks #110
Add pooling and request limiting to prevent memory leaks #110
Changes from 16 commits
6c12e78
e968403
b515f13
5c7b52d
a33b0c5
f8326cd
cb383fe
ac4ea1b
0b9044f
fc0f995
e18b1fd
0ffaad7
3d26067
909d08a
1c91f34
a86c4c9
4b634bb
3f7b796
db3c5d6
68f6290
9167e77
70c3a22
aebc3d5
96a2e82
0cb98b7
a73874f
326b410
966a9df
488ad3c
a7708af
e01d6d4
eca5995
3d8a1c2
271c935
a7581dc
8d968d7
04a3e5e
65c28e2
a1b809e
102adbc
954d33f
6bc105d
b3c6a08
be0c0cb
fabcafa
56484b5
c59bea8
c133a1e
a657040
75bd3eb
c3b2269
1f9f696
f352bb8
0e6c8ba
0ad0b9d
5d18b81
b09b13b
0bd1222
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any insights about the root cause of the leak? I don't think
php-fpm
needs to do the same kind of pooling&discarding so it must be something on our side. Emscripten issue perhaps?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.php.net/manual/en/install.fpm.configuration.php#pm.max-requests
This pathological behavior does exist in native PHP, but won't cause problems nearly as quickly since its not restricted to a single gigabyte of memory.
Emscripten definitely introduces unique challenges. Since we're dealing with a linear, "physical" memory array, as opposed to a virtual memory system afforded by most modern OSes, we're prone to things like memory fragmentation. In that situation, we could have the entire gigabyte empty except for a few sparse allocations. If no contiguous region of memory exists for the length requested, memory allocations will fail. This tends to happen when a new request attempts to initialize a heap structure but cannot find a contiguous 2mb chunk of memory.
We can go as far as debugging PHP itself, and contributing the fix upstream. But even in this case we cannot guarantee that a third party extension will not introduce a leak sometime in the future.
Therefore, we should have a solution robust to memory leaks that come from upstream code. I think that following the native strategy is the best way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh of course, third party extensions! Sean, that's such a good answer, thank you. I'd love to see your GitHub comment as a part of this PR in an actual comment block – it clearly explains the rationale.
Agreed, let's just stick with discarding PHP instances here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamziel https://github.com/WordPress/playground-tools/pull/110/files#diff-95251f05603f485be0377cd57ce90fe740821ca840c6053a115cb48c8725341bR103
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
playground-tools/packages/wp-now/src/pool.ts
Lines 110 to 131 in f352bb8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same note here on the comments. these are nice to have, but we're so close to JSDoc and if we swap then the comments will follow the variables around the code in an IDE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of using these symbols instead of simply using
reap
,spawn
, andfatal
as names?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just a pattern I use for private methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmsnell This has been refactored:
playground-tools/packages/wp-now/src/pool.ts
Lines 36 to 85 in 75bd3eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's
resolvers
now, it seems natural to rename accept to resolve.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, it's also used in a few other places around this class. It would be lovely to use the same words to refer to the same concepts, e.g.
notifier
here is a Promise, but in line 205 anotifier
is an{accept, reject}
object. Since it comes fromresolvers
, the name in like 205 could beresolver
.I know these wording changes may seem minor, but naming consistency goes a long way for a new person or when debugging a tricky issue. For example, this Gutenberg PR saved me a ton of hours down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamziel https://github.com/WordPress/playground-tools/pull/110/files#diff-95251f05603f485be0377cd57ce90fe740821ca840c6053a115cb48c8725341bR137
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo:
spanwed
->spawned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we preferring a newly-spawned instance over the one that just finished? is there any advantage to this inside our WASM world?
I would have guessed that passing the next request to the existing worker would exhaust the worker faster and prevent needing to create additional workers, which would lead to a lower overall memory use.
this also leads me to wonder about the pre-allocation of workers and if that's necessary. if we have
maxJobs
(or any more suitable name) and we're going to eagerly pay the costs of allocating them, why not defer allocation and initialization until those workers are required: that is, until we have a request come in when there are no non-active workers.would you see any problem in reaping workers as soon as they complete a request and are over the max-requests threshold, and then only spawning new workers once there are no available non-active workers in the pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmsnell Let me know what you think of this logic. The alternative to re-queuing after a failed spawn is to return a 502 bad gateway.
Interested to know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused on what happens in these cases and probably need to think about it more. One thing that worries me is that if there is something intrinsic in the request that led to the failure, and we
unshift
it to the beginning of the queue, could we get into situations that lead to infinite and infinitely-fast re-death of newly spawned processes. I'm not sure if that worry is sound though, as I don't fully understand what's happening with these workers. It's not like they are OS processor or have the same characteristics of overhead and startup time.In general, a
500
seems like an acceptable response. Why specifically a 502? The lines here are blurry.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly an HTTP daemon will serve a 502 if it can't properly instantiate PHP to handle a request, but a 500 seems valid as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmsnell @adamziel Lets revisit this, I don't want to forget about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamziel @dmsnell Refactored to return the error to calling code:
playground-tools/packages/wp-now/src/pool.ts
Lines 267 to 273 in 75bd3eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given how we're using this, why not return an array of newly-spawned processes, or return the most-recently spawned process? that way we wouldn't have to do the jumping jacks here to get the last one. it looks like this is the only place we're relying on the value returned by
spawn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo:
notfier
->notifier