Skip to content

Improve error logging, proxy robustness, and operational quality of life #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Sep 10, 2024

Conversation

dsriseah
Copy link
Contributor

@dsriseah dsriseah commented Aug 20, 2024

This PR adds robust logging to nc-multiplex and cleans up the code base, correcting numerous oversights related to logging. Most notably it hardens the proxy against crashes, adds much improved logging, and provides a way to autorestore datasets on server start.

TESTING

  1. Stop nc-multiplex if it is running.
  2. Pull the dev-sri/timestamps branch in your nc-multiplex directory
  3. Use the new startup script start-nc-multiplex.sh (see comments for pm2 instructions)
  4. npm ci
  5. run your startup script
  6. In another terminal, run tail -f log.txt to monitor the output
  7. browse to http://host/manage to load databases
  8. watch for issues

ADVANCED TESTING

Suggest tail -f log.txt in a terminal window to see what's happening in the log while you try these.

  1. while nc-multiplex is running, kill the process to check that the log shows a somewhat graceful exit. pm2 should restart it on the public servers.
  2. while nc-multiplex is running, kill a child process and see what happens (this behavior may be undefined)
  3. try using the wsping.jssh script (in .vscode/utilities) to connect via websocket to nc-multiplex root, and see if the log reports it or crashes (it should not be reported)

INSTALLATION ON DIGITAL OCEAN

Instructions from inside the nc-multiplex directory.

  1. pm2 list to find the running instance (usually 'start.sh' or 'do-start.sh')
  2. pm2 stop [instance]
  3. git fetch to update branch information
  4. git status to check what branch you're on
  5. git checkout [branch] where [branch] is dev (if installing) or dev-sri/timestamps (if testing)
  6. git pull to grab latest branch information
  7. pm2 delete [instance] to remove the old startup script
  8. mv log.txt log-yyyy-mmdd.txt followed by touch log.txt
  9. pm2 start ./start-nc-multiplex.sh to start the new script
  10. pm2 save
  11. tail -n 100 -f log.txt to inspect the running log content as you try things out

BACKGROUND

Joshua reported that hist NetCreate servers on Digital Ocean were crashing or becoming unresponsive. The cause of this crashing could not be determined by examining the log.txt file that captures output from nc-multiplex.js.

The preliminary thought was that the server was somehow rebooting due to the reported EADDRESS_IN_USE in the log, but because the logging did not capture adequate information and the server was rebooted manually every time the problem occurred, there was no way to find out when it happened. The system logs also did not seem to reveal any surprise reboots.

The next thought was that pm2 was rerunning the startup script do-start.sh on a cron job, but these seemed fishy because a process manager should only run when the instance stops working, not constantly try to rerun it. There was an error in the configuration of pm2 due to the change in NodeJS version (see RELATED ISSUES above), but this was not thought to be a cause for a restart error because it prevented restarting.

Because we can not find the trigger or timing of the errors, this augmented version of nc-multiplex has a lot of additional logging to make it easier to determine when the next crash happens.

RELATED ISSUES

  1. A related fix in NetCreate is addressed in PR#242, which adds the missing ursys build instructions to npm run package. This command is necessary for correct functioning of nc-multiplex and may be the cause of unresponsive server errors. However, given the lack of issue tracking, this is speculative.

  2. A problem with the Digital Ocean servers with the pm2 process manager for NodeJS was related to a recent change in node version from 18.16.0 to 18.18.2 as defined in .nvmrc. There was an old process script that referred to the old 18.16.0 path .nvm/node/versions/18.16.0/bin, which no longer existed. This was detected in the ubuntu system logs. It's not likely the cause of the reported server issues, though.

  3. The use of npm run package as a way to initialize the netcreate-itest subrepo does not build source maps which will show up as errors in the netcreate console. We will modify the packaging script.

COMPLETE CHANGE LOG

LOGGING IMPROVEMENTS

  • Fixed the start.sh script to not overwrite the log file every time. It now appends, and also captures stderr as well as stdout. new script provided now as part of the repo.
  • Added timestamping to critical sections of the nc-multiplex output so we could tell when particular events occurred.
  • Added runtime instance detection to the /manage page
  • Updated memory checks to report system memory in addition to the heapSize, as heap memory is the allocation memory in the Javascript runtime of nc-multiplex. It is useless in this context. Fixed incorrect units.
  • Added memory "heartbeat" checks now run every 15 minutes and write to the log so we can monitor the life of the server over time
  • Add Client ip logging to /manage, /, and /graph/:graph/:file? express route handling
  • The server start time is written to the file .nc-server-start.txt

CRASH PROTECTION

  • Child instances now monitor crash of parent process (nc-multiplex) and shut down, reporting in the log.
  • Server now automatically restores the last set of running databases if the nc-process-state.json file is found and valid. It will crash if the datasets are not found or there is a loading error and log the message.
  • Malformed requests for proxied file requests by http-proxy-middleware are detected and reported
  • Removed suspicious throw Error() in nc-multiplex because this could cause an actual crash we can't detect
  • Detect situation where websocket upgrade messages are pinging the home route /, causing a crash.
  • Updated the http-proxy-middleware express package to a version that doesn't have the crashing behavior (1.0.6 -> 3.0.0)

UI IMPROVEMENTS

  • The /manage page now resets the cookie duration on refresh properly, and now includes a countdown
  • When cookie authorization expires, you are redirect to the login page instead of showing a broken page
  • The /manage page displays more pertinent memory statistics and now includes the server start.

CODE IMPROVEMENTS

  • Cleaned up the code so it flows in a logical order with fewer surprises and gaps in commenting.
  • Rewrote misleading names of variables, functions, and constants to reflect their operation more clearly
  • Removed difficult-to-read status output and debug-related messages unrelated to operation
  • Conformed all log outfit to blend seamlessly with the log output of NetCreate itself
  • Augmented the instance launcher script so it appends cosmetic CLI parameters making it easier to see what each instance is doing when using the ps command

UTILITIES

All utilities are in the hidden .vscode/utilities directory, and must be copied into the nc-multiplex root directory to run.

  • Utility log-less.sh which with strip color ANSI markers from the log and pipe it into less
  • Utility log-rotate.sh renames log.txt to log.txt.0, log.txt.1 to log.txt.2 and so on such that the lowest-number log is more recent then the next highest number.
  • Utility log-archive.sh uses a different log archiving strategy, creating a log-YYYY-MMDD-HHMM.txt file where the date is the time of creating the log, and then creating a blank log.txt
  • Utility log-check provides a convenient way to scan for certain errors in the log.txt file using several grep commands.
  • Utility memdump.jssh provides way of checking for instances and system memory conveniently from the command line, filtering only for netcreate instances launched by nc-multiplex
  • Utility wsping.jssh that tries to open a websocket connection to a server's home page and send a message. This would reliably crash older versions of nc-multiplex.

@dsriseah
Copy link
Contributor Author

dsriseah commented Aug 20, 2024

Unresolved Mystery No. 1

Finding the source of error message given a reported error from Joshua

...nc-multiplex: --> not running yet, starting new 2024PD6thGrade
...nc-launch-config: [97;42mfound primary repo ./netcreate-itest[0m
...nc-launch-instance:
...nc-launch-instance: STARTING DB 2024PD6thGrade
...nc-launch-instance:
...nc-launch-instance: 1. Setting netcreate-config.js.
...nc-launch-instance: 2. Starting server
Error: listen EADDRINUSE: address already in use :::3001
    at Server.setupListenHandle [as _listen2] (node:net:1817:16)
    at listenInCluster (node:net:1865:12)
    at Server.listen (node:net:1953:7)
    at Function.listen (/root/nc-multiplex/netcreate-itest/node_modules/express/lib/application.js:635:24)
    at module.exports (/root/nc-multiplex/netcreate-itest/brunch-server.js:132:7)
    at /root/nc-multiplex/nc-launch-instance.js:33:5
    at new Promise (<anonymous>)
    at promiseServer (/root/nc-multiplex/nc-launch-instance.js:32:10)
    at startServer (/root/nc-multiplex/nc-launch-instance.js:50:9)
    at process.<anonymous> (/root/nc-multiplex/nc-launch-instance.js:46:3) {
  code: 'EADDRINUSE',
  errno: -98,
  syscall: 'listen',
  address: '::',
  port: 3001
}

### NETCREATE STARTUP ERROR: '-98'
Error: listen EADDRINUSE: address already in use :::3001
### PROGRAM STOP
...

This is located in brunch-server.js as part of APP.listen() with this catch() block:

  }).on('error', function (err) {
    let errstring = `### NETCREATE STARTUP ERROR: '${err.errno}'\n`;
    switch (err.errno) {
      case 'EADDRINUSE':
        errstring += `Another program is already using port ${config.port}.\n`;
        errstring += `Go to "http://localhost:${config.port}" to check if NetCreate is already running.\n\n`;
        errstring += `Still broken? See https://github.com/daveseah/netcreate-2018/issues/4\n`;
        break;
      default:
        errstring += `${err}`;
        console.log(err);
    }
    console.log(`\n\n${errstring}\n### PROGRAM STOP\n`);
    throw new Error(err.errno);
  });

Interestingly, we are seeing this being emitted from the default case handler. As the APP.Listen() block is quite long, any error at all could possibly be throwing this, and since the error is thrown again it could be the source of a misattributed error.

There could be a bug in this routine. It's originating in promiseServer() in the instance launcher.

@dsriseah
Copy link
Contributor Author

Unresolved Mystery No. 2

image

What's weird about this one is that the displayed number of active graphs is not related to running instances, but on a "instance token" array that is a pool of spawned children. They are literally just javascript numbers. This array should always have the "base" deployment that's used to serve static assets which is 0, and the kill route that kills the child process and releases the instance token has a special check to make sure it's not 0 that's ever removed.

@dsriseah dsriseah changed the title Improve Debug Logging DRAFT: Improve Debug Logging Aug 20, 2024
@dsriseah
Copy link
Contributor Author

Ben captured a crash on his ubuntu server

NC_MUX   - * MEMORY HEARTBEAT 2024/08/20 TUE 12:42:24 heapUsed 15,325mb / heapTotal 16,936mb
NC_MUX   - * PID:953   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js base:0/3000/4000
NC_MUX   - * PID:1337   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js SampleEventNetwork:1/3001/4001
NC_MUX   - * PID:1354   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js demo:2/3002/4002
NC_MUX   - * PID:1424   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js monaug19:4/3004/4004
NC_MUX   - 2024/08/20 TUE 12:52:21 home page GET on /
NC_MUX   - .. no home.html, sending default
[HPM] Upgrading to WebSocket
/root/nc-multiplex/nc-multiplex-sri.js:749
      if (req.params.file) return true; // legit file
                     ^

TypeError: Cannot read properties of undefined (reading 'file')
    at app.use.createProxyMiddleware.router (/root/nc-multiplex/nc-multiplex-sri.js:749:22)
    at Object.match (/root/nc-multiplex/node_modules/http-proxy-middleware/dist/context-matcher.js:31:16)
    at HttpProxyMiddleware.shouldProxy (/root/nc-multiplex/node_modules/http-proxy-middleware/dist/http-proxy-middleware.js:69:35)
    at HttpProxyMiddleware.<anonymous> (/root/nc-multiplex/node_modules/http-proxy-middleware/dist/http-proxy-middleware.js:53:22)
    at Generator.next (<anonymous>)
    at /root/nc-multiplex/node_modules/http-proxy-middleware/dist/http-proxy-middleware.js:8:71
    at new Promise (<anonymous>)
    at __awaiter (/root/nc-multiplex/node_modules/http-proxy-middleware/dist/http-proxy-middleware.js:4:12)
    at HttpProxyMiddleware.handleUpgrade (/root/nc-multiplex/node_modules/http-proxy-middleware/dist/http-proxy-middleware.js:52:53)
    at Server.emit (node:events:529:35)

Node.js v18.18.2
_MUX_CNF - ESC[97;42mfound primary repo ./netcreate-itestESC[0m




--------------------------------------------------------------------------------
NC_MUX   - nc-multiplex started: 2024/08/20 TUE 12:54:39
NC_MUX   -
NC_MUX   - *** sending port:3000, db:base, netport:4000 to forked process
NC_MUX   - * MEMORY HEARTBEAT 2024/08/20 TUE 12:54:39 heapUsed 13,933mb / heapTotal 18,396mb
NC_MUX   - * PID:953   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js base:0/3000/4000
NC_MUX   - * PID:1337   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js SampleEventNetwork:1/3001/4001
NC_MUX   - * PID:1354   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js demo:2/3002/4002
NC_MUX   - * PID:1424   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js monaug19:4/3004/4004
NC_MUX   - * PID:6748   .nvm/versions/v18.18.2/bin/node ./nc-launch-instance.js base:0/3000/4000
[HPM] Proxy created: (pathname, req) => {
      // only match if there is a trailing '/'
      if (req.params.file) return true; // legit file
      if (req.params.graph && req.originalUrl.endsWith('/')) return true; // legit graph
      return false;
    }  -> http://localhost:3000/
[HPM] Proxy created: /  -> http://localhost:3000/
NC_MUX   - 2024/08/20 TUE 12:54:39
NC_MUX   - NC-MULTIPLEX Express Server running on port 80.
NC_MUX   - NODE VERSION: v18.18.2 OK
_MUX_CNF - ESC[97;42mfound primary repo ./netcreate-itestESC[0m
_MUX_LN  - 2024/08/20 TUE 12:54:39
_MUX_LN  - STARTING DB base
_MUX_LN  - 1. Setting netcreate-config.js.
_MUX_LN  - 2. Starting server
Error: listen EADDRINUSE: address already in use :::3000
    at Server.setupListenHandle [as _listen2] (node:net:1817:16)
    at listenInCluster (node:net:1865:12)
    at Server.listen (node:net:1953:7)
    at Function.listen (/root/nc-multiplex/netcreate-itest/node_modules/express/lib/application.js:635:24)
    at module.exports (/root/nc-multiplex/netcreate-itest/brunch-server.js:132:7)
    at /root/nc-multiplex/nc-launch-instance.js:38:5
    at new Promise (<anonymous>)
    at promiseServer (/root/nc-multiplex/nc-launch-instance.js:37:10)
    at startServer (/root/nc-multiplex/nc-launch-instance.js:52:9)
    at process.<anonymous> (/root/nc-multiplex/nc-launch-instance.js:48:3) {
  code: 'EADDRINUSE',
  errno: -98,
  syscall: 'listen',
  address: '::',
  port: 3000
}


### NETCREATE STARTUP ERROR: '-98'
Error: listen EADDRINUSE: address already in use :::3000
### PROGRAM STOP

/root/nc-multiplex/netcreate-itest/brunch-server.js:221
    throw new Error(err.errno);
    ^

Error: -98
    at Server.<anonymous> (/root/nc-multiplex/netcreate-itest/brunch-server.js:221:11)
    at Server.emit (node:events:517:28)
    at emitErrorNT (node:net:1844:8)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

Node.js v18.18.2
NC_MUX   - 2024/08/20 TUE 12:54:40 home page GET on /
NC_MUX   - .. no home.html, sending default
NC_MUX   - * MEMORY HEARTBEAT 2024/08/20 TUE 13:09:39 heapUsed 9,283mb / heapTotal 11,304mb

The EADDRINUSE error is due to the child processes continuing to run after nc-multiplex has crashed due to the request processing in the http-proxy-middleware filter function shown. However, WHY this is even happening (and why websocket upgrading is happening) is a mystery.

@dsriseah
Copy link
Contributor Author

Adding hardening to process control, reporting of weird URLs.

@dsriseah
Copy link
Contributor Author

At this point, the server may be stable enough by not crashing, but it remains to be seen if additional crashes happen.

… still works.

This file was renamed to nc-launch-instance.jssh because it's not a pure js file, but a shell script
@dsriseah dsriseah changed the title DRAFT: Improve Debug Logging Improve error logging, proxy robustness, and operational quality of life Aug 26, 2024
@daveseah
Copy link
Contributor

daveseah commented Aug 27, 2024

LOGGING NEWS

Joshua ran trials on 198 today, August 27, and they went well! Sri ran through the log and found one instance of a crash:

image

and another two instances from this specific IP address which isn't recognized as one of ours:

image

This seems to support that websocket connection attempts by random bots was the cause of increased server failure rates.

UPDATE NEWS

198 was running the older branch of dev-sri/timestamps that doesn't include database autorefresh on restart. Sri updated 198 to the latest for trials tomorrow, updating the pm2 configuration as well to use the new startup script start-nc-multiplex.sh

@daveseah
Copy link
Contributor

Looking at that excerpt again, there are a lot of UADDR dropping from the network at the same time at the time of that error. Unfortunately, the NetCreate server itself doesn't timestamp these connection status messages.

Coincidence?

@dsriseah
Copy link
Contributor Author

New Errors

This was a different form of req.params===undefined than we've seen before (noted twice, server restored itself in about 15 seconds and the activity seemed to progress)
image

  • this is happening after a websocket upgrade again, so double-check code that it's correct

And there is a weird error with a particular UADDR socket
image

  • this seems to have happened after a server restart on the same crash as above, and the socket wasn't dropped for some reason. Will have to research it to see what's going on.

@dsriseah
Copy link
Contributor Author

dsriseah commented Aug 29, 2024

Digging further into better ways of handling errors like this, the original router code and the way it redirects things is hard to work with. I tried to not change it appreciably in making my annotations but I think it needs a rewrite, which means a whole new test cycle.

Joshua reports that with his Thursday Aug 19 2024 sessions, he noticed no problems; the server restart seems to take about 15 seconds and client reloads should just work. What's unknown is whether socket connections are somehow not getting dropped.

I'm unable to reproduce the behavior on my Mac, so I'll have to use 159 to make some changes to try to reproduce this particular crash, probably next week.

@dsriseah
Copy link
Contributor Author

dsriseah commented Sep 4, 2024

NEW ERRORS

This log shows excerpts of a crash.

There are two major errors noted in the logs

ERR_INVALID_HTTP_TOKEN triggered by IP 64.39.98.3 trying to login hundreds of times. The error is originating in http-proxy, the ancient proxy library that is used by http-proxy-middleware and it outside our control

NC_MUX   - 2024/09/04 WED 08:40:51 GET /login (client ::ffff:64.39.98.30)
TypeError [ERR_INVALID_HTTP_TOKEN]: Header name must be a valid HTTP token ["x-forwarded-for header"]
    at ClientRequest.setHeader (node:_http_outgoing:650:3)
    at new ClientRequest (node:_http_client:291:14)
    at Object.request (node:http:100:10)
    at Array.stream (/root/nc-multiplex/node_modules/http-proxy/lib/http-proxy/passes/web-incoming.js:126:74)
    at ProxyServer.<anonymous> (/root/nc-multiplex/node_modules/http-proxy/lib/http-proxy/index.js:81:21)
    at HttpProxyMiddleware.<anonymous> (/root/nc-multiplex/node_modules/http-proxy-middleware/dist/http-proxy-middleware.js:30:32)
    at Generator.next (<anonymous>)
    at fulfilled (/root/nc-multiplex/node_modules/http-proxy-middleware/dist/http-proxy-middleware.js:5:58)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Then, ERR_NOT_NETMESG is thrown out of our own code in NetCreate itself (not nc-multiplex), occurring several times. Each error represents a proxied server crashing, because after these errors stop appearing in the log, the HEARTBEAT no longer lists any running instances.

SRV_NET  - socket DEL UADDR_01 from network
SRV_NET  - socket ADD UADDR_02 to network

/root/nc-multiplex/netcreate-itest/app/unisys/common-netmessage-class.js:65
        throw ERR_NOT_NETMESG;
        ^
:ERR:PKT      -obj does not seem to be a NetMessage
(Use `node --trace-uncaught ...` to show where the exception was thrown)

Node.js v18.18.2
SRV_NET  - socket ADD UADDR_02 to network

/root/nc-multiplex/netcreate-itest/app/unisys/common-netmessage-class.js:65
        throw ERR_NOT_NETMESG;
        ^
:ERR:PKT      -obj does not seem to be a NetMessage
(Use `node --trace-uncaught ...` to show where the exception was thrown)
.
. other errors .
.

NC_MUX   - * MEMORY HEARTBEAT 2024/09/04 WED 16:33:39 - nodeHeap 21,850 / 23,848kb (91.63%) - freeMem 583mb
NC_MUX   - * MEMORY HEARTBEAT 2024/09/04 WED 16:48:39 - nodeHeap 21,984 / 23,848kb (92.19%) - freeMem 588mb
NC_MUX   - * MEMORY HEARTBEAT 2024/09/04 WED 17:03:39 - nodeHeap 22,030 / 23,848kb (92.38%) - freeMem 590mb

After all the servers have crashed, there are various HPM proxy request errors that appear despite the /manage page being accessed.

NC_MUX   - 2024/09/04 WED 17:45:47 GET /manage (client ::ffff:129.79.197.124)
[HPM] Error occurred while trying to proxy request /images/netcreate-logo.svg from 198.211.109.198 to http://localhost:3000 (ECONNREFUSED)
 (https://nodejs.org/api/errors.html#errors_common_system_errors)
NC_MUX   - 2024/09/04 WED 17:46:00 GET /manage (client ::ffff:129.79.197.124)
[HPM] Error occurred while trying to proxy request /images/netcreate-logo.svg from 198.211.109.198 to http://localhost:3000 (ECONNREFUSED)
 (https://nodejs.org/api/errors.html#errors_common_system_errors)
[HPM] Error occurred while trying to proxy request /favicon.ico from 198.211.109.198 to http://localhost:3000 (ECONNREFUSED) (https://node
js.org/api/errors.html#errors_common_system_errors)
NC_MUX   - 2024/09/04 WED 17:46:11 GET /manage (client ::ffff:129.79.197.124)
[HPM] Error occurred while trying to proxy request /images/netcreate-logo.svg from 198.211.109.198 to http://localhost:3000 (ECONNREFUSED)
 (https://nodejs.org/api/errors.html#errors_common_system_errors)

This appears to be an attepmt to access a cgi-bin script...a blind access attempt:

NC_MUX   - * MEMORY HEARTBEAT 2024/09/04 WED 17:18:39 - nodeHeap 22,148 / 23,848kb (92.87%) - freeMem 590mb
[HPM] Error occurred while trying to proxy request /cgi-bin/luci/;stok=/locale from 198.211.109.198:80 to http://localhost:3000 (ECONNREFU
SED) (https://nodejs.org/api/errors.html#errors_common_system_errors)

Also, the management page does not show running instances, meaning that nc-multiplex was unable to detect the crash of the child servers.

image

ATTACHMENTS

@dsriseah
Copy link
Contributor Author

dsriseah commented Sep 6, 2024

ERRORS for 198 on SEP 6

NC_MUX   - 2024/09/06 FRI 09:40:54 GET / (client ::ffff:185.191.126.213)
NC_MUX   - .. no home.html, sending default
NC_MUX   - 2024/09/06 FRI 09:41:19 ERROR in m_RouterLogic: req.params is undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 req.ip: undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! /graph/ not allowed to spawn (AUTO_NEW=ALOW_SPAWN=false)
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! m_RouterLogic error: <undefined-req-params>
[HPM] Upgrading to WebSocket
NC_MUX   - 2024/09/06 FRI 09:41:19 ??? ProxyRewrite req.originalUrl is undefined
[HPM] Upgrading to WebSocket
NC_MUX   - 2024/09/06 FRI 09:41:19 ERROR in m_RouterLogic: req.params is undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 req.ip: undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! /graph/ not allowed to spawn (AUTO_NEW=ALOW_SPAWN=false)
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! m_RouterLogic error: <undefined-req-params>
[HPM] Upgrading to WebSocket
NC_MUX   - 2024/09/06 FRI 09:41:19 ??? ProxyRewrite req.originalUrl is undefined
[HPM] Upgrading to WebSocket
NC_MUX   - 2024/09/06 FRI 09:41:19 ERROR in m_RouterLogic: req.params is undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 req.ip: undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! /graph/ not allowed to spawn (AUTO_NEW=ALOW_SPAWN=false)
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! m_RouterLogic error: <undefined-req-params>
[HPM] Upgrading to WebSocket
NC_MUX   - 2024/09/06 FRI 09:41:19 ??? ProxyRewrite req.originalUrl is undefined
[HPM] Upgrading to WebSocket
NC_MUX   - 2024/09/06 FRI 09:41:19 ERROR in m_RouterLogic: req.params is undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 req.ip: undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! /graph/ not allowed to spawn (AUTO_NEW=ALOW_SPAWN=false)
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! m_RouterLogic error: <undefined-req-params>
[HPM] Upgrading to WebSocket
NC_MUX   - 2024/09/06 FRI 09:41:19 ??? ProxyRewrite req.originalUrl is undefined
[HPM] Upgrading to WebSocket
[HPM] Error occurred while trying to proxy request / from 198.211.109.198 to http://localhost:3000 (HPE_INVALID_CHUNK_SIZE) (https://nodejs.org/api/errors.html#errors_common_system_errors)
NC_MUX   - 2024/09/06 FRI 09:41:19 ERROR in m_RouterLogic: req.params is undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 req.ip: undefined
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! /graph/ not allowed to spawn (AUTO_NEW=ALOW_SPAWN=false)
NC_MUX   - 2024/09/06 FRI 09:41:19 !!! m_RouterLogic error: <undefined-req-params>
[HPM] Upgrading to WebSocket
NC_MUX   - 2024/09/06 FRI 09:41:19 ??? ProxyRewrite req.originalUrl is undefined
[HPM] Upgrading to WebSocket
[HPM] Error occurred while trying to proxy request / from localhost to http://localhost:3000 (HPE_INVALID_CHUNK_SIZE) (https://nodejs.org/api/errors.html#errors_common_system_errors)
node:events:495
      throw er; // Unhandled 'error' event
      ^

Error [ERR_STREAM_WRITE_AFTER_END]: write after end
    at new NodeError (node:internal/errors:405:5)
    at _write (node:internal/streams/writable:322:11)
    at Writable.write (node:internal/streams/writable:337:10)
    at ClientRequest.<anonymous> (/root/nc-multiplex/node_modules/http-proxy/lib/http-proxy/passes/ws-incoming.js:115:16)
    at ClientRequest.emit (node:events:517:28)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:700:27)
    at HTTPParser.parserOnHeadersComplete (node:_http_common:119:17)
    at Socket.socketOnData (node:_http_client:541:22)
    at Socket.emit (node:events:517:28)
    at addChunk (node:internal/streams/readable:335:12)
Emitted 'error' event on Socket instance at:
    at Socket.onerror (node:internal/streams/readable:796:14)
    at Socket.emit (node:events:517:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'ERR_STREAM_WRITE_AFTER_END'
}

Node.js v18.18.2
_LAUNCH  -  2024/09/06 FRI 09:41:19 crash/disconnect detected by '20248thELATraits' ... closing server 
_LAUNCH  -  2024/09/06 FRI 09:41:19 crash/disconnect detected by 'PercyJackson' ... closing server 

on 159:

NC_MUX   - 2024/09/05 THU 16:36:58 ??? USE / is a websocket connection attempt from ::ffff:79.137.198.113 
NC_MUX   - 2024/09/05 THU 16:36:58 ??? USE / req.params is undefined 

@dsriseah
Copy link
Contributor Author

dsriseah commented Sep 9, 2024

Fixed bug in OutOfMemory report, which didn't convert to MB. Also added finer grained out-of-memory checks.

@dsriseah
Copy link
Contributor Author

Monday Sep 9...198 shows this crash followed by recovery, which is similar to what we've seen above.
image

@dsriseah dsriseah merged commit 8f85b37 into dev Sep 10, 2024
@dsriseah dsriseah deleted the dev-sri/timestamps branch September 10, 2024 01:47
@dsriseah dsriseah restored the dev-sri/timestamps branch September 10, 2024 01:48
@benloh benloh mentioned this pull request Sep 10, 2024
@benloh benloh deleted the dev-sri/timestamps branch September 11, 2024 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants