Skip to content

Conversation

benloh
Copy link
Collaborator

@benloh benloh commented Sep 2, 2020

Problem

Kalani reported:

We’re getting lots of server disconnect errors from students who seem to have reasonable internet connections. Is there a way to extend the disconnect-ping timing so that students are less likely to get errors when there’s a slow connection, they can still make changes?

Class total is 150 students (not all at one time; the most I’ve seen in Google Analytics is 10-15). Most are in dorms or hotels (the latter of which is an issue for speed purposes), but it shouldn’t be changing IP addresses mid-session. It seems more like a ping time out, the way it’s been described.

Issues that students have reported in basic categories:

  • Long load times from students who are just seeing spinning balls and white screens. I’m offering those students alternatives because there are only 2 or 3 and it’s an Internet-speed issue that can’t be solved by us.
  • Post-load server disconnection issues. These students are getting the whole network loaded, and then when they try to edit stuff, they get a disconnect. (Again, PEBKAC, so I am going on what the student is describing and don’t have as much detail as I’d like). I think that an extension of the timeout would help.
  • We have 1 person, as yet unidentified, who is taking the whole network down. Any students who load the network after that student get a white screen with a text-only proxy error. One student in particular has reported this twice and I’m hoping that it’s this student’s tech creating issues but they’re in quarantine right now, so I have no idea when I’ll be able to tackle that issue. In the meantime, uptimerobot.com is checking for the proxy keyword and alerting me so I can cycle the network instance when that happens. If we can tie IP or access code to the drops, we can ID the student (I hope).

Finally, there are some duplicate nodes appearing, which I think are probably related to server disconnects (5 or 6 “Justinian” nodes showing up, for instance, though I’ve cleaned those up).

Solution

This pull request does 4 things:

  1. Extend the wait time for heartbeats. Both the server and the client will wait for 10 seconds before declaring a disconnect. Heartbeats should be generated by the server every 5 seconds, so this gives us a large window.

  2. Show a specific "Client Disconnect" or "Server Disconnect" with a timestamp error for users.

"Client Disconnected" -- Client did not receive a "ping" heartbeat from the server within the time allowed. Usually this is a result of the client losing the internet connection.

"Server Disconnected" -- Either the server shut down, or the server did not receive a "pong" response from the client within the time allowed. Usually this is a result of the server initiating the disconnect as it shuts down.

  1. Log the missing "pong" message to the server logs along with the UADDR so you can identify the machine thats down. e.g.

11:30:39 tacitus SRV-NET - UADDR_02 pong not received before time ran out -- CLIENT CONNECTION DEAD!

  1. Add GZIP compression for all files. In some cases resulting in an 80% reduction in data that has to be sent over the wires.

This isn't a perfect solution but perhaps it'll give us a little more information about what's going on.

To Test

  1. git checkout dev-bl/net-debug
  2. npm run dev || ./nc.js --dataset=xxxx
  3. Login with a client
  4. Disconnect the client wifi -- the client should display a "Client Disconnected" message.
  5. On the server, you should see a log entry indicating which client was disconnected.
  6. Reconnect the client wifi.
  7. ctrl-c on the server to stop the server.
  8. The client should display a "Server DIsconnected" message.

@benloh benloh requested a review from jdanish September 2, 2020 17:41
@benloh benloh changed the base branch from master to dev September 2, 2020 17:41
@benloh
Copy link
Collaborator Author

benloh commented Sep 2, 2020

@kalanicraig @jdanish Please try this build to see if it helps with the error reporting and drops.

@benloh
Copy link
Collaborator Author

benloh commented Sep 2, 2020

Added GZIP compression. This reduces the size of netc-lib.js from 6.5MB to 1.2MB.

@benloh
Copy link
Collaborator Author

benloh commented Sep 2, 2020

Added UADDR to server log.

Saves about 1 second of processing time on a graph with 500 nodes.
Saves about 1 second of processing time on a graph with 500 nodes.
netc-lib.js reduced from 5.4MB down to 3.1MB.  Gzipped is 743kb.
@benloh
Copy link
Collaborator Author

benloh commented Sep 9, 2020

@jdanish I just pushed a number of performance enhancements to net-debug.

Slow Network Improvements

You'll need to run npm ci to install the new terser library and remove the old uglify-* libraries.

@jdanish
Copy link
Collaborator

jdanish commented Sep 9, 2020

Looks great!! One bug and one minor thought that I'll log in a sec. Thanks!

Copy link
Collaborator

@jdanish jdanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just filed #137 and #138 ... otherwise looks good at a glance but would want to test some more.

…in NodeTable and EdgeTable.

This reduces NodeTable display from about 14 seconds down to 5s, with Markdown only eating up about 1.2s.
…d in a bad edge count referencing source/target objects instead of source/target ids.
@benloh
Copy link
Collaborator Author

benloh commented Sep 10, 2020

@jdanish More fixes!

NodeTable / EdgeTable Improvements

@jdanish
Copy link
Collaborator

jdanish commented Sep 10, 2020

Thanks @benloh

Unfortunately, I am getting a compile error? It looks like the prior function was not closed but I wanted to make sure there wasn't something else?

20:55:50 - error: Compiling of app/view/netcreate/components/EdgeTable.jsx failed. L425:27 Unexpected token, expected ;
423 | this.handleDataUpdate(D3DATA);
424 |

425 | componentWillUnmount() {
| ^
426 | this.AppStateChangeOff('D3DATA', this.handleDataUpdate);
427 | this.AppStateChangeOff('TEMPLATE', this.OnTemplateUpdate);
428 | }
Stack trace was suppressed. Run with LOGGY_STACKS=1 to see the trace.

@benloh
Copy link
Collaborator Author

benloh commented Sep 10, 2020

@jdanish Ack! Apologies. I missed committing a line because I thought it was just an empty line, but of course it had a paren. Should be up now.

@benloh
Copy link
Collaborator Author

benloh commented Sep 14, 2020

@jdanish Just checking in on the testing. If this seems to work well, I'd like to merge this and tag it 1.3.1.

@jdanish
Copy link
Collaborator

jdanish commented Sep 14, 2020 via email

@benloh benloh merged commit efe54df into dev Sep 17, 2020
@benloh benloh mentioned this pull request Sep 17, 2020
@benloh benloh deleted the dev-bl/net-debug branch November 8, 2021 18:06
benloh added a commit that referenced this pull request Sep 5, 2024
Created, Updated, Revision Handling and Table Filter Fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants