Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetReport takes very long times after http requests #55576

Closed
Tofandel opened this issue Oct 28, 2024 · 23 comments · Fixed by #55602
Closed

GetReport takes very long times after http requests #55576

Tofandel opened this issue Oct 28, 2024 · 23 comments · Fixed by #55602
Labels
confirmed-bug Issues with confirmed bugs. libuv Issues and PRs related to the libuv dependency or the uv binding. report Issues and PRs related to process.report.

Comments

@Tofandel
Copy link
Contributor

Tofandel commented Oct 28, 2024

Version

22.10.0

Platform

Linux Tofandel 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

What steps will reproduce the bug?

console.time("report");
process.report.getReport();
console.timeEnd("report");
await fetch('https://example.com/');
console.time("report");
process.report.getReport();
console.timeEnd("report");
await fetch('https://example.com/');
await fetch('https://example.com/');
console.time("report");
process.report.getReport();
console.timeEnd("report");
console.time("report");
process.report.getReport();
console.timeEnd("report");

How often does it reproduce? Is there a required condition?

Always, I can reproduce on windows and on linux

What is the expected behavior? Why is that the expected behavior?

The report to not take more than a few milliseconds

What do you see instead?

The report takes 20 additional seconds to generate per fetch request that occured before it (up to 2 fetch, after that it keeps taking 40s)

report: 1.926ms
report: 20.129s
report: 41.714s
report: 40.311s

Additional information

This still happens even with node --report-exclude-network

@avivkeller
Copy link
Member

avivkeller commented Oct 28, 2024

I can't reproduce a increase of that scale. I ran your code, and I got:

report: 2.095ms
report: 14.098ms
report: 25.988ms

When running at a larger scale, everything seems consistent:

Report (after fetching 0 times): 15.347ms
Report (after fetching 1 times): 24.828ms
Report (after fetching 2 times): 29.105ms
Report (after fetching 3 times): 27.147ms
Report (after fetching 4 times): 31.67ms
Report (after fetching 5 times): 29.602ms
Report (after fetching 6 times): 31.013ms
Report (after fetching 7 times): 29.853ms
Report (after fetching 8 times): 31.42ms
Report (after fetching 9 times): 26.445ms
Report (after fetching 10 times): 24.994ms
Report (after fetching 11 times): 27.865ms
Report (after fetching 12 times): 30.993ms
Report (after fetching 13 times): 30.189ms
Report (after fetching 14 times): 30.977ms
Report (after fetching 15 times): 29.551ms
Report (after fetching 16 times): 26.54ms
Report (after fetching 17 times): 25.168ms
Report (after fetching 18 times): 27.516ms
Report (after fetching 19 times): 27.163ms
Report (after fetching 20 times): 29.243ms

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 28, 2024

Which env are you on? Native linux? I'm reproducing it on windows and WSL (it always takes +20seconds for me after each fetch in both envs), this 20 seconds have to come from something env related

@avivkeller
Copy link
Member

Which env are you on? Native linux?

Yes. My machine is:

  System:
    OS: Linux 6.10 Kali GNU/Linux Rolling 2024.3
    CPU: (12) x64 AMD Ryzen 5 5600G with Radeon Graphics
    Memory: 18.63 GB / 27.31 GB

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 28, 2024

When running at a larger scale, everything seems consistent:

That's actually in line with me except your time increase is only 15ms

If I run 5 fetch it doesn't take more than 40 seconds, so it seems it's only a factor for the 2 first fetch

@avivkeller avivkeller added the report Issues and PRs related to process.report. label Oct 28, 2024
@avivkeller
Copy link
Member

I am experiencing an increase, just at a much much smaller scale, so maybe something is going on?

@Tofandel
Copy link
Contributor Author

Yes, I'd like to help debugging If I can but I'm not sure how, I'm not a C++ dev and there doesn't seem to be any output from NODE_DEBUG_NATIVE and compiling from source takes very long (it's been compiling for 2 hours right now)

@avivkeller
Copy link
Member

, I'm not a C++ dev and there doesn't seem to be any output from NODE_DEBUG_NATIVE and compiling from source takes very long (it's been compiling for 2 hours right now)

I can do some research, but I'm also not a CPP dev. As for compile times, as a rule-of-thumb, if you aren't already, make use of multithreading (make -j4 where 4 is the number of threads), and configure with ninja (./configure --ninja)

@Tofandel
Copy link
Contributor Author

I'm using make -j8 but it's still taking forever, I'll increase to -j16 next time

Is there caching between builds and only the first build will take this amount of time or is it going to take this long every time I make a single change?

@avivkeller
Copy link
Member

I always build with ninja and ccache, and there's caching between builds for me, IIRC that's due to ccache.

@richardlau
Copy link
Member

Does it reproduce if you configure process.report.excludeNetwork? (see #46060 for why that option was added.)

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 28, 2024

Yes it was in the description

This still happens even with node --report-exclude-network

Though it does look like the exact same issue and so it might not be fully solved

@avivkeller
Copy link
Member

I'm curious, what does the profiling output look like? (node --prof <file> followed by node --prof-process <log file>)

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 29, 2024

await fetch('https://example.com', {keepalive: false})
console.log(process._getActiveHandles()[1]);
console.time('report');
process.report.getReport();
console.timeEnd('report');
Here is the ouput of `NODE_DEBUG=net node --report-exclude-network test.mjs`
NET 763280: pipe false undefined
NET 763280: connect: find host example.com
NET 763280: connect: dns options { family: undefined, hints: 32 }
NET 763280: connect: autodetecting
NET 763280: _read - n 16384 isConnecting? true hasHandle? true
NET 763280: _read wait for connection
NET 763280: connect/multiple: will try the following addresses [
  { address: '93.184.215.14', family: 4 },
  { address: '2606:2800:21f:cb07:6820:80da:af6b:8b2c', family: 6 }
]
NET 763280: connect/multiple: attempting to connect to 93.184.215.14:443 (addressType: 4)
NET 763280: connect/multiple: setting the attempt timeout to 250 ms
NET 763280: connect/multiple: connection attempt to 93.184.215.14:443 completed with status 0
NET 763280: afterConnect
NET 763280: _read - n 16384 isConnecting? false hasHandle? true
NET 763280: Socket._handle.readStart
NET 763280: _read - n 16384 isConnecting? false hasHandle? true
<ref *1> TLSSocket {
  _tlsOptions: {
    allowHalfOpen: undefined,
    pipe: false,
    secureContext: SecureContext { context: SecureContext {} },
    isServer: false,
    requestCert: true,
    rejectUnauthorized: true,
    session: null,
    ALPNProtocols: <Buffer 08 68 74 74 70 2f 31 2e 31>,
    requestOCSP: undefined,
    enableTrace: undefined,
    pskCallback: undefined,
    highWaterMark: 16384,
    onread: undefined,
    signal: undefined
  },
  _secureEstablished: true,
  _securePending: false,
  _newSessionPending: false,
  _controlReleased: true,
  secureConnecting: false,
  _SNICallback: null,
  servername: 'example.com',
  alpnProtocol: 'http/1.1',
  authorized: true,
  authorizationError: null,
  encrypted: true,
  _events: [Object: null prototype] {
    close: [
      [Function: onSocketCloseDestroySSL],
      [Function (anonymous)],
      [Function (anonymous)]
    ],
    end: [ [Function: onReadableStreamEnd], [Function (anonymous)] ],
    error: [ [Function (anonymous)], [Function (anonymous)] ],
    newListener: [Function: keylogNewListener],
    connect: undefined,
    secure: [Function: onConnectSecure],
    session: [Function (anonymous)],
    secureConnect: undefined,
    readable: [Function (anonymous)]
  },
  _eventsCount: 7,
  connecting: false,
  _hadError: false,
  _parent: null,
  _host: 'example.com',
  _closeAfterHandlingError: false,
  _readableState: ReadableState {
    highWaterMark: 16384,
    buffer: [],
    bufferIndex: 0,
    length: 0,
    pipes: [],
    awaitDrainWriters: null,
    [Symbol(kState)]: 10004740
  },
  _writableState: WritableState {
    highWaterMark: 16384,
    length: 0,
    corked: 0,
    onwrite: [Function: bound onwrite],
    writelen: 0,
    bufferedIndex: 0,
    pendingcb: 0,
    [Symbol(kState)]: 17563908,
    [Symbol(kBufferedValue)]: null,
    [Symbol(kWriteCbValue)]: null
  },
  allowHalfOpen: false,
  _maxListeners: undefined,
  _sockname: null,
  _pendingData: null,
  _pendingEncoding: '',
  server: undefined,
  _server: null,
  ssl: TLSWrap {
    _parent: TCP {
      reading: [Getter/Setter],
      onconnection: null,
      [Symbol(owner_symbol)]: [Circular *1]
    },
    _parentWrap: null,
    _secureContext: SecureContext { context: SecureContext {} },
    reading: true,
    onkeylog: [Function: onkeylog],
    onhandshakestart: [Function: noop],
    onhandshakedone: [Function (anonymous)],
    onocspresponse: [Function: onocspresponse],
    onnewsession: [Function: onnewsessionclient],
    onerror: [Function: onerror],
    [Symbol(owner_symbol)]: [Circular *1]
  },
  _requestCert: true,
  _rejectUnauthorized: true,
  autoSelectFamilyAttemptedAddresses: [ '93.184.215.14:443' ],
  [Symbol(alpncallback)]: null,
  [Symbol(res)]: TLSWrap {
    _parent: TCP {
      reading: [Getter/Setter],
      onconnection: null,
      [Symbol(owner_symbol)]: [Circular *1]
    },
    _parentWrap: null,
    _secureContext: SecureContext { context: SecureContext {} },
    reading: true,
    onkeylog: [Function: onkeylog],
    onhandshakestart: [Function: noop],
    onhandshakedone: [Function (anonymous)],
    onocspresponse: [Function: onocspresponse],
    onnewsession: [Function: onnewsessionclient],
    onerror: [Function: onerror],
    [Symbol(owner_symbol)]: [Circular *1]
  },
  [Symbol(verified)]: true,
  [Symbol(pendingSession)]: null,
  [Symbol(async_id_symbol)]: 10,
  [Symbol(kHandle)]: TLSWrap {
    _parent: TCP {
      reading: [Getter/Setter],
      onconnection: null,
      [Symbol(owner_symbol)]: [Circular *1]
    },
    _parentWrap: null,
    _secureContext: SecureContext { context: SecureContext {} },
    reading: true,
    onkeylog: [Function: onkeylog],
    onhandshakestart: [Function: noop],
    onhandshakedone: [Function (anonymous)],
    onocspresponse: [Function: onocspresponse],
    onnewsession: [Function: onnewsessionclient],
    onerror: [Function: onerror],
    [Symbol(owner_symbol)]: [Circular *1]
  },
  [Symbol(lastWriteQueueSize)]: 0,
  [Symbol(timeout)]: null,
  [Symbol(kBuffer)]: null,
  [Symbol(kBufferCb)]: null,
  [Symbol(kBufferGen)]: null,
  [Symbol(shapeMode)]: true,
  [Symbol(kCapture)]: false,
  [Symbol(kSetNoDelay)]: true,
  [Symbol(kSetKeepAlive)]: true,
  [Symbol(kSetKeepAliveInitialDelay)]: 60,
  [Symbol(kBytesRead)]: 0,
  [Symbol(kBytesWritten)]: 0,
  [Symbol(connect-options)]: {
    rejectUnauthorized: true,
    ciphers: 'TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_128_GCM_SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:DHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA256:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!SRP:!CAMELLIA',
    checkServerIdentity: [Function: checkServerIdentity],
    minDHSize: 1024,
    highWaterMark: 16384,
    path: undefined,
    servername: 'example.com',
    session: null,
    localAddress: null,
    ALPNProtocols: [ 'http/1.1' ],
    socket: undefined,
    port: 443,
    host: 'example.com',
    singleUse: true
  },
  [Symbol(no ref)]: false,
  [Symbol(writing)]: false,
  [Symbol(reset)]: false,
  [Symbol(blocking)]: false,
  [Symbol(parser)]: Parser {
    llhttp: [Object: null prototype] {
      memory: Memory [WebAssembly.Memory] {},
      _initialize: [Function: 8],
      __indirect_function_table: Table [WebAssembly.Table] {},
      llhttp_init: [Function: 9],
      llhttp_should_keep_alive: [Function: 47],
      llhttp_alloc: [Function: 11],
      malloc: [Function: 49],
      llhttp_free: [Function: 12],
      free: [Function: 12],
      llhttp_get_type: [Function: 13],
      llhttp_get_http_major: [Function: 14],
      llhttp_get_http_minor: [Function: 15],
      llhttp_get_method: [Function: 16],
      llhttp_get_status_code: [Function: 17],
      llhttp_get_upgrade: [Function: 18],
      llhttp_reset: [Function: 19],
      llhttp_execute: [Function: 20],
      llhttp_settings_init: [Function: 21],
      llhttp_finish: [Function: 22],
      llhttp_pause: [Function: 23],
      llhttp_resume: [Function: 24],
      llhttp_resume_after_upgrade: [Function: 25],
      llhttp_get_errno: [Function: 26],
      llhttp_get_error_reason: [Function: 27],
      llhttp_set_error_reason: [Function: 28],
      llhttp_get_error_pos: [Function: 29],
      llhttp_errno_name: [Function: 30],
      llhttp_method_name: [Function: 31],
      llhttp_status_name: [Function: 32],
      llhttp_set_lenient_headers: [Function: 33],
      llhttp_set_lenient_chunked_length: [Function: 34],
      llhttp_set_lenient_keep_alive: [Function: 35],
      llhttp_set_lenient_transfer_encoding: [Function: 36],
      llhttp_message_needs_eof: [Function: 46]
    },
    ptr: 76304,
    client: Client {
      _events: [Object: null prototype],
      _eventsCount: 4,
      _maxListeners: undefined,
      [Symbol(shapeMode)]: false,
      [Symbol(kCapture)]: false,
      [Symbol(nodejs.stream.destroyed)]: false,
      [Symbol(onDestroyed)]: null,
      [Symbol(closed)]: false,
      [Symbol(onClosed)]: [],
      [Symbol(dispatch interceptors)]: [Array],
      [Symbol(url)]: URL {},
      [Symbol(connector)]: [Function: connect],
      [Symbol(pipelining)]: 1,
      [Symbol(max headers size)]: 16384,
      [Symbol(default keep alive timeout)]: 4000,
      [Symbol(max keep alive timeout)]: 600000,
      [Symbol(keep alive timeout threshold)]: 2000,
      [Symbol(keep alive timeout)]: 4000,
      [Symbol(server name)]: null,
      [Symbol(local address)]: null,
      [Symbol(resuming)]: 0,
      [Symbol(need drain)]: 2,
      [Symbol(host header)]: 'host: example.com\r\n',
      [Symbol(body timeout)]: 300000,
      [Symbol(headers timeout)]: 300000,
      [Symbol(strict content length)]: true,
      [Symbol(maxRedirections)]: undefined,
      [Symbol(maxRequestsPerClient)]: undefined,
      [Symbol(kClosedResolve)]: null,
      [Symbol(max response size)]: -1,
      [Symbol(max concurrent streams)]: 100,
      [Symbol(http context)]: [Object],
      [Symbol(queue)]: [Array],
      [Symbol(running index)]: 0,
      [Symbol(pending index)]: 1,
      [Symbol(resume)]: [Function (anonymous)],
      [Symbol(on error)]: [Function (anonymous)],
      [Symbol(Intercepted Dispatch)]: [Function: Intercept],
      [Symbol(connecting)]: false,
      [Symbol(needDrain)]: true,
      [Symbol(socket)]: [Circular *1]
    },
    socket: [Circular *1],
    timeout: FastTimer {
      _state: 0,
      _idleTimeout: 300000,
      _idleStart: -1,
      _onTimeout: [Function: onParserTimeout],
      _timerArg: WeakRef {},
      [Symbol(kFastTimer)]: true
    },
    timeoutValue: 300000,
    timeoutType: 5,
    statusCode: 200,
    statusText: 'OK',
    upgrade: false,
    headers: [],
    headersSize: 0,
    headersMaxSize: 16384,
    shouldKeepAlive: true,
    paused: false,
    resume: [Function: bound resume],
    bytesRead: 0,
    keepAlive: '',
    contentLength: '648',
    connection: '',
    maxResponseSize: -1
  },
  [Symbol(listeners)]: [
    [ 'error', [Function (anonymous)] ],
    [ 'readable', [Function (anonymous)] ],
    [ 'end', [Function (anonymous)] ],
    [ 'close', [Function (anonymous)] ]
  ],
  [Symbol(socket request counter)]: 0,
  [Symbol(maxRequestsPerClient)]: undefined,
  [Symbol(client)]: Client {
    _events: [Object: null prototype] {
      drain: [Function: onDrain],
      connect: [Function (anonymous)],
      disconnect: [Function (anonymous)],
      connectionError: [Function (anonymous)]
    },
    _eventsCount: 4,
    _maxListeners: undefined,
    [Symbol(shapeMode)]: false,
    [Symbol(kCapture)]: false,
    [Symbol(nodejs.stream.destroyed)]: false,
    [Symbol(onDestroyed)]: null,
    [Symbol(closed)]: false,
    [Symbol(onClosed)]: [],
    [Symbol(dispatch interceptors)]: [ [Function (anonymous)] ],
    [Symbol(url)]: URL {
      href: 'https://example.com/',
      origin: 'https://example.com',
      protocol: 'https:',
      username: '',
      password: '',
      host: 'example.com',
      hostname: 'example.com',
      port: '',
      pathname: '/',
      search: '',
      searchParams: URLSearchParams {},
      hash: ''
    },
    [Symbol(connector)]: [Function: connect],
    [Symbol(pipelining)]: 1,
    [Symbol(max headers size)]: 16384,
    [Symbol(default keep alive timeout)]: 4000,
    [Symbol(max keep alive timeout)]: 600000,
    [Symbol(keep alive timeout threshold)]: 2000,
    [Symbol(keep alive timeout)]: 4000,
    [Symbol(server name)]: null,
    [Symbol(local address)]: null,
    [Symbol(resuming)]: 0,
    [Symbol(need drain)]: 2,
    [Symbol(host header)]: 'host: example.com\r\n',
    [Symbol(body timeout)]: 300000,
    [Symbol(headers timeout)]: 300000,
    [Symbol(strict content length)]: true,
    [Symbol(maxRedirections)]: undefined,
    [Symbol(maxRequestsPerClient)]: undefined,
    [Symbol(kClosedResolve)]: null,
    [Symbol(max response size)]: -1,
    [Symbol(max concurrent streams)]: 100,
    [Symbol(http context)]: {
      version: 'h1',
      defaultPipelining: 1,
      write: [Function: write],
      resume: [Function: resume],
      destroy: [Function: destroy],
      destroyed: [Getter],
      busy: [Function: busy]
    },
    [Symbol(queue)]: [ [Request] ],
    [Symbol(running index)]: 0,
    [Symbol(pending index)]: 1,
    [Symbol(resume)]: [Function (anonymous)],
    [Symbol(on error)]: [Function (anonymous)],
    [Symbol(Intercepted Dispatch)]: [Function: Intercept],
    [Symbol(connecting)]: false,
    [Symbol(needDrain)]: true,
    [Symbol(socket)]: [Circular *1]
  },
  [Symbol(error)]: null
}
report: 20.420s
NET 763280: _read - n 16384 isConnecting? false hasHandle? true
And the result from profiling
Statistical profiling result from isolate-0x73e9000-761109-v8.log, (18803 ticks, 99 unaccounted, 0 excluded).

 [Shared libraries]:
   ticks  total  nonlib   name
     16    0.1%          /usr/lib/x86_64-linux-gnu/libc.so.6
      4    0.0%          /usr/local/bin/node
      1    0.0%          /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.30

 [JavaScript]:
   ticks  total  nonlib   name
      1    0.0%    0.0%  JS: ~Client node:internal/deps/undici/undici:7698:18
      1    0.0%    0.0%  JS: ~<anonymous> node:internal/crypto/util:119:32
      1    0.0%    0.0%  Builtin: ToBooleanForBaselineJump
      1    0.0%    0.0%  Builtin: LoadIC
      1    0.0%    0.0%  Builtin: KeyedLoadIC

 [C++]:
   ticks  total  nonlib   name
  18602   98.9%   99.0%  __poll@@GLIBC_2.2.5
     30    0.2%    0.2%  _IO_fwrite@@GLIBC_2.2.5
     10    0.1%    0.1%  __write@@GLIBC_2.2.5
      8    0.0%    0.0%  epoll_pwait@@GLIBC_2.6
      7    0.0%    0.0%  _IO_file_xsputn@@GLIBC_2.2.5
      6    0.0%    0.0%  std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@@GLIBCXX_3.4.9
      6    0.0%    0.0%  __libc_malloc@@GLIBC_2.2.5
      2    0.0%    0.0%  brk@@GLIBC_2.2.5
      2    0.0%    0.0%  __pthread_rwlock_unlock@GLIBC_2.2.5
      1    0.0%    0.0%  syscall@@GLIBC_2.2.5
      1    0.0%    0.0%  std::_Hash_bytes(void const*, unsigned long, unsigned long)@@CXXABI_1.3.5
      1    0.0%    0.0%  pthread_cond_signal@@GLIBC_2.3.2
      1    0.0%    0.0%  __pthread_rwlock_rdlock@GLIBC_2.2.5
      1    0.0%    0.0%  __pthread_mutex_unlock@GLIBC_2.2.5

 [Summary]:
   ticks  total  nonlib   name
      5    0.0%    0.0%  JavaScript
  18678   99.3%   99.4%  C++
      4    0.0%    0.0%  GC
     21    0.1%          Shared libraries
     99    0.5%          Unaccounted

 [C++ entry points]:
   ticks    cpp   total   name
  18602   99.7%   98.9%  __poll@@GLIBC_2.2.5
     30    0.2%    0.2%  _IO_fwrite@@GLIBC_2.2.5
      7    0.0%    0.0%  _IO_file_xsputn@@GLIBC_2.2.5
      6    0.0%    0.0%  std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)@@GLIBCXX_3.4.9
      6    0.0%    0.0%  __libc_malloc@@GLIBC_2.2.5
      5    0.0%    0.0%  __write@@GLIBC_2.2.5
      2    0.0%    0.0%  __pthread_rwlock_unlock@GLIBC_2.2.5
      1    0.0%    0.0%  brk@@GLIBC_2.2.5
      1    0.0%    0.0%  __pthread_rwlock_rdlock@GLIBC_2.2.5
      1    0.0%    0.0%  __pthread_mutex_unlock@GLIBC_2.2.5

 [Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 1.0% are not shown.

   ticks parent  name
  18602   98.9%  __poll@@GLIBC_2.2.5
  18602  100.0%    JS: ~getReport node:internal/process/report:37:12
  18602  100.0%      Script: ~<anonymous> file:///home/tofandel/test.mjs:1:1
  18602  100.0%        Builtin: AsyncFunctionAwaitResolveClosure
  18602  100.0%          Builtin: CallApiCallbackGeneric

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 29, 2024

If I run process._getActiveHandles()[1].destroy() before the report (which contains a TLSSocket) then it is generated straight away, so it doesn't like the fact that there is open sockets, the question is why is the tls socket still open when the request is finished? And why an open socket causes getReport to hang for 20 seconds?

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 29, 2024

If I run

import https from 'node:https'

await new Promise((r) => https.get('https://example.com', () => r()));
console.time('report');
process.report.getReport();
console.timeEnd('report');

Then the program only exits after 5minutes (which is the default timeout)

NET 769960: pipe false null
NET 769960: connect: find host example.com
NET 769960: connect: dns options { family: undefined, hints: 32 }
NET 769960: connect: autodetecting
NET 769960: _read - n 65536 isConnecting? true hasHandle? true
NET 769960: _read wait for connection
NET 769960: connect/multiple: will try the following addresses [
  { address: '93.184.215.14', family: 4 },
  { address: '2606:2800:21f:cb07:6820:80da:af6b:8b2c', family: 6 }
]
NET 769960: connect/multiple: attempting to connect to 93.184.215.14:443 (addressType: 4)
NET 769960: connect/multiple: setting the attempt timeout to 250 ms
NET 769960: connect/multiple: connection attempt to 93.184.215.14:443 completed with status 0
NET 769960: afterConnect
NET 769960: _read - n 65536 isConnecting? false hasHandle? true
NET 769960: Socket._handle.readStart
NET 769960: _read - n 65536 isConnecting? false hasHandle? true
report: 20.724s
NET 769960: _read - n 65536 isConnecting? false hasHandle? true
NET 769960: _final: not ended, call shutdown()
NET 769960: afterShutdown destroyed=false
NET 769960: destroy
NET 769960: close
NET 769960: close handle
NET 769960: emit close

Is it normal that https.get causes node to never exit (unless I set headers: { 'Connection': 'close' })?

@jeremyVignelles
Copy link

Seems related to WSL based on the comments:
npm/cli#7868
npm/cli#7814

@Tofandel Tofandel reopened this Oct 29, 2024
@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 29, 2024

I can reproduce it directly on Windows as well and not just WSL I think it's directly related to windows network stack because it's been reported as happening on docker as well, which I'm guessing were all docker running in windows

If I change the request to go to localhost for example, then this issue doesn't happen, the only difference I see is one read (or timeout) is still happening after the getReport takes place

localhost:

NET 777218: pipe false null
NET 777218: connect: find host localhost
NET 777218: connect: dns options { family: undefined, hints: 32 }
NET 777218: connect: autodetecting
NET 777218: _read - n 65536 isConnecting? true hasHandle? true
NET 777218: _read wait for connection
NET 777218: connect/multiple: only one address found, switching back to single connection
NET 777218: connect: attempting to connect to 127.0.0.1:80 (addressType: 4)
NET 777218: afterConnect
NET 777218: _read - n 65536 isConnecting? false hasHandle? true
NET 777218: Socket._handle.readStart
NET 777218: _read - n 65536 isConnecting? false hasHandle? true
report: 2.206ms
NET 777218: _final: not ended, call shutdown()
NET 777218: afterShutdown destroyed=false
NET 777218: destroy
NET 777218: close
NET 777218: close handle
NET 777218: emit close

Example.com

NET 778100: pipe false null
NET 778100: connect: find host example.com
NET 778100: connect: dns options { family: undefined, hints: 32 }
NET 778100: connect: autodetecting
NET 778100: _read - n 65536 isConnecting? true hasHandle? true
NET 778100: _read wait for connection
NET 778100: connect/multiple: will try the following addresses [
  { address: '93.184.215.14', family: 4 },
  { address: '2606:2800:21f:cb07:6820:80da:af6b:8b2c', family: 6 }
]
NET 778100: connect/multiple: attempting to connect to 93.184.215.14:443 (addressType: 4)
NET 778100: connect/multiple: setting the attempt timeout to 250 ms
NET 778100: connect/multiple: connection attempt to 93.184.215.14:443 completed with status 0
NET 778100: afterConnect
NET 778100: _read - n 65536 isConnecting? false hasHandle? true
NET 778100: Socket._handle.readStart
NET 778100: _read - n 65536 isConnecting? false hasHandle? true
report: 20.108s
NET 778100: _read - n 65536 isConnecting? false hasHandle? true
NET 778100: _final: not ended, call shutdown()
NET 778100: afterShutdown destroyed=false
NET 778100: destroy
NET 778100: close
NET 778100: close handle
NET 778100: emit close

If I turn on a VPN or connect to a different network the issue still persists but with different timings

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 29, 2024

I made an intersting discovery, I had the twingate service running on the machine (which is some kind of VPN for tunneling, though only for specific domain names or ip, which should not have covered this specific ip), with it off it now takes "only" 2seconds on WSL and 5s on windows instead of 20seconds for both, it's still way too long though

I tried again and indeed as soon as I enable any kind of VPN the issue is very pronounced

@fuhlich
Copy link

fuhlich commented Oct 30, 2024

Linux Tofandel 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

```js
console.time("report");
process.report.getReport();
console.timeEnd("report");
await fetch('https://example.com/');
console.time("report");
process.report.getReport();
console.timeEnd("report");
await fetch('https://example.com/');
await fetch('https://example.com/');
console.time("report");
process.report.getReport();
console.timeEnd("report");
console.time("report");
process.report.getReport();
console.timeEnd("report");

If I replace await fetch with await https.get with https = require('https') it runs through quick. The problem probably is in the fetch function.

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 30, 2024

No I already rulled that out, both https.get and fetch have the issue, the issue has to do with sockets that are still open when getReport runs

Your usage of https.get is just probably wrong because it does not return a promise and cannot be awaited you need to promisify it, so likely your get report runs before the request actually happens

See my comment a bit above

import https from 'node:https'

await new Promise((r) => https.get('https://example.com', () => r()));
console.time('report');
process.report.getReport();
console.timeEnd('report');

You can try this

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 30, 2024

I can also rule out a WSL only issue as it happens on a native Ubuntu 20 as well (on a completely different network, on a server in a datacenter)

But only the first time it is run. Also the node process not exiting with https.get starts happening on node 20 (and node 22) but doesn't happen on node 18 (on ubuntu still)

So yes it seems the common factor is any request that makes a DNS query still causes getReport to hang

@fuhlich
Copy link

fuhlich commented Oct 30, 2024

I overlooked that, my mistake.

@Tofandel
Copy link
Contributor Author

Tofandel commented Oct 30, 2024

After debugging from source, the very long time comes entirely from

uv_walk(env->event_loop(), WalkHandle, static_cast<void*>(&writer));

This is what #46060 (comment) was talking about, and looking at the exclude-network option that was added, it doesn't address this section at all

Specifically any open TCP or UDP UV still tries to query reverse dns records there

case UV_TCP:
case UV_UDP:
ReportEndpoints(h, writer);
break;

if (uv_getnameinfo(h->loop, &endpoint, nullptr, addr, NI_NUMERICSERV) == 0) {

@avivkeller avivkeller added the libuv Issues and PRs related to the libuv dependency or the uv binding. label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. libuv Issues and PRs related to the libuv dependency or the uv binding. report Issues and PRs related to process.report.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants