Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns: report out of memory properly #20317

Closed
wants to merge 5 commits into from

Conversation

BridgeAR
Copy link
Member

This addresses a part of a TODO by properly reporting an out of
memory error to the user instead of reporting ENOTFOUND.

I only changed this single entry as this is super rare and should be safe to change.
Since it is a OOM, we also have no test for it.

The other two are likely used a lot and a lot of userland code will rely on that.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@BridgeAR BridgeAR added the semver-major PRs that contain breaking changes and should be released in the next major version. label Apr 26, 2018
@nodejs-github-bot nodejs-github-bot added the errors Issues and PRs related to JavaScript errors originated in Node.js core. label Apr 26, 2018
@BridgeAR BridgeAR requested a review from a team April 26, 2018 03:05
@BridgeAR
Copy link
Member Author

@Trott
Copy link
Member

Trott commented Apr 26, 2018

Since it is a OOM, we also have no test for it.

I added a test to your branch. It's in its own commit so you can remove that commit if you're 👎 on the test, but I think it's probably useful.

@@ -639,8 +638,7 @@ function dnsException(code, syscall, hostname) {
if (typeof code === 'number') {
// FIXME(bnoordhuis) Remove this backwards compatibility nonsense and pass
// the true error to the user. ENOTFOUND is not even a proper POSIX error!
if (code === UV_EAI_MEMORY ||
code === UV_EAI_NODATA ||
if (code === UV_EAI_NODATA ||
code === UV_EAI_NONAME) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fits on one line now.

BridgeAR and others added 2 commits April 26, 2018 12:48
This addresses a part of a TODO by properly reporting an out of
memory error to the user instead of reporting `ENOTFOUND`.
Add test that confirms that when libuv reports a memory error on a DNS
query, that the memory error is passed through and not replaced with
ENOTFOUND.
@BridgeAR
Copy link
Member Author

Nit addressed. New CI: https://ci.nodejs.org/job/node-test-pull-request/14520/

@Trott I guess it is best to land this as separate commits and not squash?

@BridgeAR
Copy link
Member Author

Ah, I fixed the commit message @Trott (typo in memory).

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 26, 2018
@Trott
Copy link
Member

Trott commented Apr 26, 2018

@Trott I guess it is best to land this as separate commits and not squash?

@BridgeAR That's what I'd do, but either way is fine by me.

@Trott
Copy link
Member

Trott commented Apr 26, 2018

Hmmm...puzzling. A relevant test fails but only SmartOS.

@Trott
Copy link
Member

Trott commented Apr 26, 2018

Re-running SmartOS just in case it was some kind of weird quirk: https://ci.nodejs.org/job/node-test-commit-smartos/17213/

@Trott
Copy link
Member

Trott commented Apr 26, 2018

So, test-http-dns-error (which is not the test added in this PR) fails on SmartOS with this code change because an error that used to be ENOTFOUND is now showing up as EAI_MEMORY on SmartOS only. Paging @nodejs/platform-smartos for advice/debugging/something.

@BridgeAR BridgeAR removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Apr 27, 2018
@BridgeAR
Copy link
Member Author

@bnoordhuis could you have a look at the test failure? You added it originally.

I personally can not see anything special or wrong in the test, so it seems to be a low level platform specific issue.

/cc @nodejs/libuv

@bnoordhuis
Copy link
Member

Not much to look at. The system resolver reports EAI_MEMORY for the invalid hostname. I guess the logic here is that '*'.repeat(256) is longer than what DNS allows because that's only 255 bytes.

@BridgeAR
Copy link
Member Author

@bnoordhuis thanks for clariying. But is UV_EAI_MEMORY really the best error in such a case? And do you know why it only fails on SmartOS?

@BridgeAR
Copy link
Member Author

@BridgeAR
Copy link
Member Author

@bnoordhuis reducing the hostname to a byteLength of 250 still returns the OOM error.

08:45:42     AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
08:45:42     + expected - actual
08:45:42     
08:45:42     - 'EAI_MEMORY'
08:45:42     + 'ENOTFOUND'

@@ -29,7 +29,7 @@ const assert = require('assert');
const http = require('http');
const https = require('https');

const host = '*'.repeat(256);
const host = '*'.repeat(250);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you were doing this just to see if the OOM error went away on SmartOS, but I think 256 was chosen specifically because 255 is the maximum length for a hostname. I believe the idea here is to trigger a hostname-too-long error specifically. So this change should not land, I suspect.

@Trott
Copy link
Member

Trott commented Apr 27, 2018

Here's a simple test case:

const cares = process.binding('cares_wrap');
const {
  UV_EAI_MEMORY,
  UV_EAI_NODATA,
  UV_EAI_NONAME
} = process.binding('uv');

var req = new cares.GetAddrInfoReqWrap();
req.family = 0;
req.hostname = '*'.repeat(256);
req.oncomplete = (e) => { 
 console.log(`error code was ${e}`);
 console.log(`UV_EAI_MEMORY: ${UV_EAI_MEMORY}`);
 console.log(`UV_EAI_NODATA: ${UV_EAI_NODATA}`);
 console.log(`UV_EAI_NONAME: ${UV_EAI_NONAME}`);
}

var err = cares.getaddrinfo(req, req.hostname, req.family, 1024, false);

This straightforward test case does not result in UV_EAI_MEMORY on the SmartOS hosts in CI:

[root@0410827e-87c7-449e-b16c-d626f9743aa6 /home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos16-64]# /var/tmp/node /var/tmp/test.js 
error code was -3002
UV_EAI_MEMORY: -3006
UV_EAI_NODATA: -3007
UV_EAI_NONAME: -3008
[root@0410827e-87c7-449e-b16c-d626f9743aa6 /home/iojs/build/workspace/node-test-commit-smartos/nodes/smartos16-64]#

So something more....subtle...is going on with the SmartOS hosts in CI.

Trott
Trott previously requested changes Apr 27, 2018
Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defensively giving this a red X while the 256 is down to 250 because I do think that may invalidate the intended test case.

@Trott
Copy link
Member

Trott commented Apr 27, 2018

Oh, wait, -3002 is UV_EAI_BADFLAGS so there's something wrong with my "simple test case" code above...

@Trott
Copy link
Member

Trott commented Apr 27, 2018

OK, here's code to replicate the problem:

const cares = process.binding('cares_wrap');
const {
  UV_EAI_MEMORY,
  UV_EAI_NODATA,
  UV_EAI_NONAME
} = process.binding('uv');

var req = new cares.GetAddrInfoReqWrap();
req.family = 0;
req.hostname = '*'.repeat(256);
req.oncomplete = (e) => { 
 console.log(`error code was ${e}`);
 console.log(`UV_EAI_MEMORY: ${UV_EAI_MEMORY}`);
 console.log(`UV_EAI_NODATA: ${UV_EAI_NODATA}`);
 console.log(`UV_EAI_NONAME: ${UV_EAI_NONAME}`);
}

cares.getaddrinfo(req, req.hostname, req.family, 0, false);

Running this on SmartOS:

error code was -3006
UV_EAI_MEMORY: -3006
UV_EAI_NODATA: -3007
UV_EAI_NONAME: -3008

UV_EAI_MEMORY? That's no good.

For comparison, running it locally on my Macbook results -3008 which is UV_EAI_NONAME which is what is expected.

Back to SmartOS:

  • Changing '*'.repeat(256) to 'a'.repeat(256) does not change the result.
  • Changing '*'.repeat(256) to '*'.repeat(255) does change the result to UV_EAI_NONAME.

So there's something specific about length of 256 that results in an OOM error but I have to imagine that OOM error is spurious. (And I don't know why @BridgeAR got different results when changing the hostname length in the test to 250.)

@BridgeAR
Copy link
Member Author

@Trott I did not plan on landing it with the test change. I am not really sure what to do with your findings though due to the difference with my change...

@Trott
Copy link
Member

Trott commented Apr 28, 2018

@BridgeAR It seems like this is probably either a SmartOS quirk or else a libuv bug on SmartOS. I'm not really sure how to debug. The intersection of SmartOS and libuv would seem to be @cjihrig so maybe we can try to get their input... But while we're at it: @nodejs/platform-smartos @nodejs/libuv

@Trott
Copy link
Member

Trott commented Apr 28, 2018

Oh, for newly-looped-in people who want the TL;DR, I think the crux of the problem is described in #20317 (comment).

@Trott
Copy link
Member

Trott commented Apr 28, 2018

(I suppose it could also be a cares bug?)

@bnoordhuis
Copy link
Member

Libuv calls getaddrinfo(), the system resolver. It's a smartos quirk / bug.

@Trott
Copy link
Member

Trott commented Apr 28, 2018

Libuv calls getaddrinfo(), the system resolver. It's a smartos quirk / bug.

I bet this SmartOS behavior is probably why EAI_MEMORY was mapped to ENOTFOUND in the first place!

@Trott
Copy link
Member

Trott commented Apr 28, 2018

Sure enough this C program confirms it's specific to SmartOS (or at least our SmartOS hosts in CI):

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
struct addrinfo hints, *infoptr; // So no need to use memset global variables

int main() {
  hints.ai_family = AF_INET; // AF_INET means IPv4 only addresses

  int result = getaddrinfo("****************************************************************************************************************************************************************************************************************************************************************", NULL, &hints, &infoptr);
  if (result) {
    fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(result));
    exit(1);
  }

  struct addrinfo *p;
  char host[256];

  for(p = infoptr; p != NULL; p = p->ai_next) {

    getnameinfo(p->ai_addr, p->ai_addrlen, host, sizeof(host), NULL, 0,
NI_NUMERICHOST);
    puts(host);
  }

  freeaddrinfo(infoptr);
  return 0;
}

(This is trivially modified from sample code at https://github.com/angrave/SystemProgramming/wiki/Networking,-Part-2:-Using-getaddrinfo.)

On SmartOS, I put it in test.c and compiled it with cc -lnsl -lsocket -lresolv test.c.

[root@527b0138-1d1d-6bcc-88a3-9968feaab1eb /var/tmp]# ./a.out
getaddrinfo: memory allocation failure
[root@527b0138-1d1d-6bcc-88a3-9968feaab1eb /var/tmp]#

For comparison, when running on macOS:

$ ./a.out
getaddrinfo: nodename nor servname provided, or not known
$ 

@Trott
Copy link
Member

Trott commented Apr 28, 2018

Looking through git history, I'm unconvinced of the correctness of my recollection that 256 length is significant. 64 is long enough to guarantee the hostname is invalid as long as there's no . in it. And at that length, we don't get a memory error on SmartOS. Want to try 64, @BridgeAR?

@Trott
Copy link
Member

Trott commented Apr 28, 2018

(If you change it to 64 and the test starts passing, feel free to dismiss my request for changes.)

@BridgeAR
Copy link
Member Author

@BridgeAR
Copy link
Member Author

BridgeAR commented Apr 29, 2018

I changed the wrong test...
I missed one test that has to be fixed as well.

@BridgeAR
Copy link
Member Author

@Trott Trott dismissed their stale review April 30, 2018 03:34

tests are green

BridgeAR added a commit to BridgeAR/node that referenced this pull request Apr 30, 2018
This addresses a part of a TODO by properly reporting an out of
memory error to the user instead of reporting `ENOTFOUND`.

PR-URL: nodejs#20317
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Apr 30, 2018
Add test that confirms that when libuv reports a memory error on a DNS
query, that the memory error is passed through and not replaced with
ENOTFOUND.

PR-URL: nodejs#20317
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
@BridgeAR
Copy link
Member Author

Landed in a158d41 and 0deb27b

@BridgeAR BridgeAR closed this Apr 30, 2018
@BridgeAR BridgeAR deleted the dns-properly-report-oom branch January 20, 2020 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
errors Issues and PRs related to JavaScript errors originated in Node.js core. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants