UI causes browser to run the system out of memory #5414

rbasralian · 2024-04-19T22:36:01Z

Something about the UI apparently causes Safari to use almost-inconceivable amounts of memory (the DHC UI is the only thing I had open in Safari here). It seems related to either stopping/restarting the DHC container or letting my computer go to sleep and wake back up (or maybe both). Given enough time, it causes my Mac to pop up a "your system is out of memory" dialog.

Whatever causes this also causes the server to repeatedly log "No AuthenticationRequestHandler registered for type Bearer":

Here is a server-side thread dump from when this was occurring: thread_dump.txt

Version info (this is a locally-built image off of commit 61ae61b6):

Engine Version: 0.34.0-SNAPSHOT
Web UI Version: 0.69.1
Java Version: 21.0.2
Barrage Version: 0.6.0
Browser Name: Safari 17.4
OS Name: macOS 10.15.7

I don't have browser logs from when this was occurring but will try to get them if it happens again.

The text was updated successfully, but these errors were encountered:

mattrunyon · 2024-04-19T22:48:38Z

Did you have anything running in your UI?

dsmmcken · 2024-04-22T13:38:25Z

That's a lot of no auth request handler errors happening extremely fast, looking at the timestamps.

niloc132 · 2024-04-22T14:25:13Z

The JS API's design is intended to make it impossible for it to accidentally queue up many requests like this on its own - until an auth refresh succeeds or fails, the new one won't be passed to setTimeout. Is it possible that something in the UI is using setInterval or the like which might cause this when the tab resumes, handling dozens/hundreds of enqueued calls to complete?

For what its worth, the thread dump doesn't show any activity - and specifically nothing that shows a grpc call is presently being handled to refresh a token.

niloc132 · 2024-04-25T21:20:59Z

We've found a way that this could happen in the JS API, it requires a table to fail before the disconnect, and the disconnect has to be long enough for the auth token to expire. The client then gets into a bad state of trying to reconnect the subscription for the failed table in a microtask "loop", resulting rapid calls to the server ... which then each fail with auth issues.

I'll move this to deephaven-core and handle it there.

niloc132 · 2024-04-30T20:31:17Z

Here's a simple set of steps to reproduce locally, simulating a network issue:

Start deephaven-core on one port, assuming 10000 for now
Forward another port to 10000 so that the app is available on a second port - I used ssh localhost -L 10001:localhost:10000 to also be able to connect to the app on 10001
Connect to the web IDE on port 10001 in a browser, http://localhost:10001

Start a command for a table that is doomed to fail, for example:

from deephaven import time_table
t = time_table("PT1s").update(["I = i == 15 ? null : `` + i", "J = I.split(`,`)"])

Wait for the table to load, then error out
End the SSH tunnel. Usually this involves ctrl-d to end the shell, then ctrl-c to ask ssh to disconnect without waiting for all forwarded ports to finish
Wait for the web IDE to acknowledge the disconnect, usually just a few seconds
Reconnect the ssh tunnel promptly - losing the auth token isn't required as I said it was above (that is only necessary for the repeated "No auth handler for type Bearer"

Expected, table stays broken, but the rest of the page keeps working
Actual, page freezes, constant stream of error messages on the server.

Note that this might not be quite the same as what @rbasralian originally found, but it does have many of the same characteristics - fixing this hopefully will prevent the constant reconnect loop originally seen (but not yet reproduced).

DHC reconnects are able to restore server streams to an existing session, but most of the JS API was written to assume that a lost connection requires rebuilding objects on the server by replaying operations. This fix handles cases where a table failed and then a network error occurred, causing the table to be stuck unable to reconnect, since the table has failed. Two bugs prevented this from working, both cases where after some operation couldn't be scheduled, a microtask would immediately try again, leading effectively to an infinite loop in the browser. Table subscriptions are fixed by first checking if the table is running, so can be subscribed, and table refetch is fixed by using null for its fetcher, and during refetch if the fetcher is null either fail right away with the existing fail message, or succeed right away. This fix currently makes it possible for a failed table on a reconnected worker to not signal that it is still failed - this will be addressed in a follow-up. Fixes #5414

rbasralian added bug Something isn't working triage labels Apr 19, 2024

vbabich removed the triage label Apr 23, 2024

vbabich assigned mattrunyon Apr 23, 2024

niloc132 transferred this issue from deephaven/web-client-ui Apr 25, 2024

niloc132 assigned niloc132 and unassigned mattrunyon Apr 25, 2024

niloc132 added the jsapi label Apr 25, 2024

rcaudy added this to the 2. April 2024 milestone Apr 26, 2024

niloc132 mentioned this issue May 16, 2024

5414 reconnect subscriptions #5501

Merged

niloc132 closed this as completed in #5501 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UI causes browser to run the system out of memory #5414

UI causes browser to run the system out of memory #5414

rbasralian commented Apr 19, 2024

mattrunyon commented Apr 19, 2024

dsmmcken commented Apr 22, 2024

niloc132 commented Apr 22, 2024

niloc132 commented Apr 25, 2024

niloc132 commented Apr 30, 2024

UI causes browser to run the system out of memory #5414

UI causes browser to run the system out of memory #5414

Comments

rbasralian commented Apr 19, 2024

mattrunyon commented Apr 19, 2024

dsmmcken commented Apr 22, 2024

niloc132 commented Apr 22, 2024

niloc132 commented Apr 25, 2024

niloc132 commented Apr 30, 2024