Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: do not leak error listeners #43587

Closed
wants to merge 1 commit into from

Conversation

ShogunPanda
Copy link
Contributor

This PR prevents addition of a listener if not absolutely necessary, preventing a memory leak.

Fixes: #43548

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/http
  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run. labels Jun 27, 2022
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mscdex
Copy link
Contributor

mscdex commented Jun 27, 2022

Maybe the commit message should instead be something like http: do not leak error listeners since we're not actually removing anything?

@ShogunPanda ShogunPanda changed the title http: remove all internal listeners http: do not leak error listeners Jun 27, 2022
lib/_http_server.js Outdated Show resolved Hide resolved
test/parallel/test-http-socket-listeners.js Outdated Show resolved Hide resolved
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

lib/_http_server.js Outdated Show resolved Hide resolved
@Slayer95
Copy link

Slayer95 commented Jun 29, 2022

Is this pattern considered reliable error-handling? I have used it in some application/script code, but it shouldn't be resilient enough for internal Node.js code.

With this patch applied, check, for instance, the following test case

const http = require('http');

function customErrorHandler(err) {
	console.log(`Connection error handled: ${err.stack}`);
}

const app = (req, res) => {
  res.socket.once('error', customErrorHandler);
  res.end(JSON.stringify({
    message: 'Hello World!'
  }));

  res.socket.emit('error', new Error(`something happened`)); // once() handler consumed
  res.socket.emit('error', new Error(`something happened again :(`)); // Unhandled 'error' event
};

const httpServer = http.createServer(app);

httpServer.listen(80, () => {
    console.log('HTTP Server running on port 80');
});

Hence, if the user applies some dynamic error handling, the app may eventually throw on an unhandled error event.

@ShogunPanda
Copy link
Contributor Author

Well, the thing is that in case of error events on the socket you should destroy it because the communication is not reliable.
So, in the real world, two error events should not happen.

That said I anyway believe that the app should crash rather than leak memory as it will have an healthier runtime environment and will make error detection faster when monitoring it.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot

This comment was marked as outdated.

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

targos pushed a commit that referenced this pull request Sep 5, 2022
PR-URL: #43587
Fixes: #43548
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ricky Zhou <0x19951125@gmail.com>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
@BethGriggs
Copy link
Member

This got lost in my notifications. But, the PR was released in Node.js v18.8.0.

I just reran CITGM on Koa, and it's still failing on all platforms with the same error shown in #43587 (comment):

FAIL __tests__/application/index.js
    app  should handle socket errors
     Unhandled error. (Error: boom
       12 |     app.use((ctx, next) => {
       13 |       // triggers ctx.socket.writable == false
     > 14 |       ctx.socket.emit('error', new Error('boom'));
          |                                ^
       15 |     });
       16 |
       17 |     app.on('error', err => {
       at __tests__/application/index.js:14:32
       at dispatch (node_modules/koa-compose/index.js:42:32)
       at fnMiddleware (node_modules/koa-compose/index.js:34:12)
       at Application.handleRequest (lib/application.js:168:12)
       at Server.handleRequest (lib/application.js:150:19)
       at HTTPParser.parserOnHeadersComplete (node:_http_common:117:17) {
         headerSent: true
       })

I'm guessing it's possible that Koa may just need their test updating as Express did (refs: #43587 (comment)) to account for this PR.

@dougwilson
Copy link
Member

dougwilson commented Oct 7, 2022

I was just recently looking in to the Express.js tests and I think there is an actual issue here apart from updating the tests. There seems to be a bug introduced with this change were errors that were previously suppressed with the noop added are no longer suppressed if something added a once listener. This is because the logic here doesn't take in to account that after the error is emitted, other error listeners may then be removed by Node.js since they were added with .once, so then the error ends up getting thrown as unhandled instead of suppressed with the noop. If nothing added any error listener then it works fine still suppressing the errors, as the logic here adds the noop.

The change I mentioned in Express.js was just updating the error count to expect that the Node.js noop may not always be there, but it would seem weird that this change is now also causing unhandled exceptions to bubble up due to the lack of that noop handler.

I asked above that why would it be desired to change the behavior to sometimes have the noop there and sometimes not (and leaving it with zero error listeners), but no one answered that question, so oh well. We had to add some error listeners in production apps due to new crashes for Node.js 18.9 due to this change, so again, not sure if that is intentional or not. We didn't have time to argue it more here since I already asked above and it wasn't answered and then landed anyway. I am only bring it up again due to the new post there with the same error.

lpinca added a commit to lpinca/node that referenced this pull request Oct 8, 2022
This reverts commit 736a7d8.

The patch attempted to fix an issue, signaled by a warning, caused by
an invalid API usage. However it introduced a behavior change that is
breaking userland modules.

It is a user error, so restore the original behavior and have the user
investigate and fix the issue in their code.

Refs: nodejs#43587 (comment)
Refs: nodejs#43587 (comment)
juanarbol pushed a commit that referenced this pull request Oct 10, 2022
PR-URL: #43587
Fixes: #43548
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ricky Zhou <0x19951125@gmail.com>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
@juanarbol juanarbol mentioned this pull request Oct 11, 2022
@lpinca
Copy link
Member

lpinca commented Oct 11, 2022

@dougwilson I've opened a PR to revert this, but I fail to think of a scenario where this leads to an unhandled error, unless socket.emit('error') is called manually which is invalid usage or the user does something weird.

Even if an 'error' listener is added with emitter.once() and the noop listener is not added, an 'error' event can only be emitted once.

The process can crash if the user adds a one time listener for the 'error' event, then a parsing error occurs, then the user removes the added listener from a listener of the 'clientError' event and then they destroy the socket with an error, but in this case the crash is expected.

@dougwilson
Copy link
Member

dougwilson commented Oct 11, 2022

I don't know the cause, but can add it. This is a production web app and nothing that I am aware of is manually emitting errors on the socket using socket.emit('error').

But if you follow the code flow, an error emitted on the stocket will trigger the error listener added by Node.js and at the end, it calls .destroy(e) with the error which then emits the error again on the socket object. Because the app had a .once on it, the noop was not added and then when .destroy(e) reemitted the error the second time, the .once was auto removed by node.js already and no listeners were left.

That is how far I debugged the issue, and our solution was to add a req.socket.on('error', noop) to every request as a quick fix to iur dos issue. Once that fixed the fire, I'm not sure how much time I have to dig in more on it myself.

You're welcome to not make any changes to Node.js, as we already implemented a work around to the problem. It was just a surprising change in behavior all of a sudden. I'm not trying to say how you should treat errors in node.js and what is or is not invalid api usage. I was just trying to point out a sudden behavior change in like a patch version that lead to a dos in our prod app and maybe help someone else out who has the same issue when upgrading.

RafaelGSS pushed a commit to RafaelGSS/node that referenced this pull request Oct 11, 2022
PR-URL: nodejs#43587
Fixes: nodejs#43548
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Ricky Zhou <0x19951125@gmail.com>
Reviewed-By: Mohammed Keyvanzadeh <mohammadkeyvanzade94@gmail.com>
@lpinca
Copy link
Member

lpinca commented Oct 11, 2022

But if you follow the code flow, an error emitted on the stocket will trigger the error listener added by Node.js and at the end, it calls .destroy(e) with the error which then emits the error again on the socket object. Because the app had a .once on it, the noop was not added and then when .destroy(e) reemitted the error the second time, the .once was auto removed by node.js already and no listeners were left.

Yes, but AFAIK a call to .destroy(e), after an 'error' event is emitted, does not emit another 'error' event.

@dougwilson
Copy link
Member

Yes, but AFAIK a call to .destroy(e), after an 'error' event is emitted, does not emit another 'error' event.

Then it sounds like maybe this uncovered a bug in that logic, as that is exactly what is happening.

@lpinca
Copy link
Member

lpinca commented Oct 11, 2022

Maybe there is a reentrancy bug. It would be nice to have a test case. I'll try to write one when I have some time.

nodejs-github-bot pushed a commit that referenced this pull request Oct 12, 2022
This reverts commit 736a7d8.

The patch attempted to fix an issue, signaled by a warning, caused by
an invalid API usage. However it introduced a behavior change that is
breaking userland modules.

It is a user error, so restore the original behavior and have the user
investigate and fix the issue in their code.

Refs: #43587 (comment)
Refs: #43587 (comment)
PR-URL: #44921
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Beth Griggs <bethanyngriggs@gmail.com>
Reviewed-By: Tierney Cyren <hello@bnb.im>
danielleadams pushed a commit that referenced this pull request Dec 30, 2022
This reverts commit 736a7d8.

The patch attempted to fix an issue, signaled by a warning, caused by
an invalid API usage. However it introduced a behavior change that is
breaking userland modules.

It is a user error, so restore the original behavior and have the user
investigate and fix the issue in their code.

Refs: #43587 (comment)
Refs: #43587 (comment)
PR-URL: #44921
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Beth Griggs <bethanyngriggs@gmail.com>
Reviewed-By: Tierney Cyren <hello@bnb.im>
danielleadams pushed a commit that referenced this pull request Jan 3, 2023
This reverts commit 736a7d8.

The patch attempted to fix an issue, signaled by a warning, caused by
an invalid API usage. However it introduced a behavior change that is
breaking userland modules.

It is a user error, so restore the original behavior and have the user
investigate and fix the issue in their code.

Refs: #43587 (comment)
Refs: #43587 (comment)
PR-URL: #44921
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Robert Nagy <ronagy@icloud.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Beth Griggs <bethanyngriggs@gmail.com>
Reviewed-By: Tierney Cyren <hello@bnb.im>
@ShogunPanda
Copy link
Contributor Author

Hello!
I'm digging into this problem (again, sob) and I can confirm what @dougwilson said. The problem is in _http_server.js, where .destroy(e) is called.

@lpinca Apparently it is a reentrancy bug on destroy. I'm currently investigating it.

@ShogunPanda
Copy link
Contributor Author

ShogunPanda commented Jan 19, 2023

I found the problem. It's pretty unique and it was hidden so far from the memleak-leading behavior I fixed in this PR.

A similar way to reproduce is by running this (even on current Node version)

const { createConnection, createServer } = require('net')

const server = createServer()

server.listen(8000, () => {
  const client = createConnection({ host: '127.0.0.1', port: 8000 })

  function fn(err) {
    console.log('HANDLED', err)
    client.removeListener('error', fn)
    client.destroy(err)
  }

  client.on('error', fn)
  client.emit('error', new Error('ERROR'))
})

The problem with reentrancy when using express is that ee-first (used by on-finished) uses the same function (https://github.com/jonathanong/ee-first/blob/master/index.js#L45) to handle both end and finish on the HTTP message and error and close on the socket. When any of these events is triggered, it removes all the listeners using the same function for both the HTTP message and the socket (https://github.com/jshttp/on-finished/blob/master/index.js#L96).

In our case, by following the flow it seems that when erroring end is before the emitCloseErrorNT. At that point, when the error is finally emitted, there are no handlers and thus the process crashes.

As a solution to this, since I suspect there might be other edge cases likes this, I propose to keep the current noop handler (which mitigates the issue) but remove the memory leak by creating a new api in EventEmitter called ensureListener (and similar for onceListener) that makes sure a listener is installed only if it's not already existing.

@mcollina @mhdawson WDYT?

@lpinca
Copy link
Member

lpinca commented Jan 19, 2023

A similar way to reproduce is by running this (even on current Node version)

That is not a reentrancy issue. The crash in that example is expected as per #43587 (comment).

@ShogunPanda
Copy link
Contributor Author

I agree on that.
But since I guess we don't want to break express AND we want to fix the memleak, I think we have to limit the problem.

How do you see the solution I proposed? (Which I think might be useful for EE anyway)

@lpinca
Copy link
Member

lpinca commented Jan 19, 2023

From my understanding the "leak" is a user error caused by people that keep the socket alive after the 'clientError' event is emitted. I think we should improve the documentation and clarify that the socket must be closed when the event is emitted.

I'm not very happy to add an ensureListener() method because what it does can already be done with the existing API, no?

@mcollina
Copy link
Member

@ShogunPanda the snippet in #43587 (comment) is correct, and it should not be a problem.

@ShogunPanda
Copy link
Contributor Author

@mcollina @lpinca I think my example above was misleading.
Once I have my laptop back (long story, don't ask), I'll try to produce a better example.
The problem in ee-first is that the end event is emitted from the http response. And that handler is also used for error on the socket.
The usage is weird. But as it is used within express we had to revert my fix for the memleak.

Now, I agree that we can improve the docs. But how do we both fix the memleak and NOT break express?

@lpinca
Copy link
Member

lpinca commented Jan 24, 2023

My thoughts:

  1. Fix the issue in ee-first / express. I think a workaround already exists as per http: do not leak error listeners #43587 (comment)
  2. Improve the documentation clarifying that the socket must be closed when the 'clientError' event is emitted.
  3. There is no leak if the socket is closed.
  4. Fixing the "leak" actually hides the usage error when the socket is left open.

@ShogunPanda
Copy link
Contributor Author

ShogunPanda commented Feb 6, 2023

@lpinca and others.

I was finally able to write a test that reproduced the error like in ee-first/on-finished.
Now, I agree that usage is weird but I coded a small solution that will fix both issues.

See: #46523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flooding requests results in memory leak