-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Describe the bug
Running in Google Cloud Run, chat shuts down for no apparent reason. Log example:
2025-05-28 23:50:04.885 ART
02:50:04.885 streaming response from 'claude-sonnet-4-0' took 2.36s [LLM]
2025-05-28 23:50:14.649 ART
on_chat_end called
Other processes taking 10 seconds do not kill the chat; in any case that would be an unreasonable timeout value for an LLM-centric framework. This is absolutely not a timeout signal from the AskActionMessage which had just been sent (see below) - it happens when an AskUserMessage is being prepared based on the user's response to a previous AskUserMessage or AskActionMessage. As shown below, this also happens when the app should be waiting for user input.
Edit: Note that the "unable to connect to server" message is also not displayed in this case. There is no indication to the user that the server is no longer responding.
To Reproduce
I have no idea. I thought this was due to LLM functions taking too long to complete but it apparently is not. See screenshot below. It is impossible to identify what is causing this because (apart from logging statement in my own @on_chat_end code) nothing is emitted from Chainlit.
Edit: the only explanation I can come up with is that GCR times out websockets connections; increasing the instance timeout to 3600s appears to mitigate the issue, but it does not solve it (a one hour cap for chat sessions is not really acceptable). Per the GCR docs "WebSockets clients connecting to Cloud Run should handle reconnecting to the server if the request times out or the server disconnects" - if this is indeed the cause of the issue, the only reliable fix is for Chainlit to implement reconnecting. However I have not seen the 504 response codes that I would expect to be associated with timed out requests (this may be due to the connection being WS rather than HTTP)...
Expected behavior
Chat does not end for no reason. Agent is usable in production. Chainlit logs why it is terminating a chat server-side/this behavior, which is generally undesirable, can be turned off.
Screenshots
This is an AskActionMessage with timeout=3600 - 404 errors because the server has terminated chat (see log above).
GCP
python:3.11-slim
chainlit 2.5.5