Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unhandled :gun_error for websocket connection needing to be established before pushing #12

Closed
the-mikedavis opened this issue Feb 3, 2021 · 2 comments
Assignees
Labels

Comments

@the-mikedavis
Copy link
Collaborator

unknown message {:gun_error, #PID<0.16322.1>, {:badstate, 'Connection needs to be upgraded to Websocket before the gun:ws_send/1 function can be used.'}}
heard in Slipstream.Connection
please open an issue in NFIBrokerage/slipstream with this message and
any available information.

starting to feel a need for #4 so we can understand how a connection gets in a state where this happens

@the-mikedavis the-mikedavis self-assigned this Feb 3, 2021
@the-mikedavis
Copy link
Collaborator Author

closed by 9d8fd35

@the-mikedavis
Copy link
Collaborator Author

the-mikedavis commented Feb 4, 2021

gun-error-trace-capture

of course the first try finds us a nice :gun_error once the changes from #14 (see also #13) were merged. as you can see in the trace above (going bottom to top in reverse chronological order):

  • SendHeartbeat command that got translated into a ChannelClosed event with reason: :heartbeat_timeout
  • gun connection is up (this comes from a retry mechanism in gun)
  • gun connection is down
  • :gun_error about us not being able to send the heartbeat from 2 spans up (2 bullets down in this list)
  • gun connection is up
  • SendHeartbeat command that would fail with :gun_error on the send

I chaos-monkeyed this failure state by performing a rollout-restart (kubernetes) on the back-end to this front-end (the server to this client). (That back-end service has a RollingUpdate recreation strategy)

downtime for this bug would have been minimal, but leaving the above case unhandled would allow up to the heartbeat-timeout interval in 'dead' time for the connection. by fixing this we fallback from gun retry strategies to slipstream retry strategies, which is potentially faster because of the heartbeat-timeout mechanism


☝️ that graph there comes from https://honeycomb.io, with connection telemetry shipped by our NFIBrokerage/slipstream_honeycomb adapter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant