-
Notifications
You must be signed in to change notification settings - Fork 906
Replies: 2 comments · 3 replies
-
@thy09 Could you please help take a look on this? |
Beta Was this translation helpful? Give feedback.
All reactions
-
@jwk-qbllp Can you provide more logs? The log "has been running for 60 seconds" is just a warning in a monitor thread, it doesn't affect the running of the node. As for the signal 9 problem, I have no idea why is it happen if we don't have more context. |
Beta Was this translation helpful? Give feedback.
All reactions
-
@thy09 Here are the complete logs: Instance status: Container events: Container logs: 2024-03-15 20:39:07 +0000 25 execution WARNING GenerateClaimsResponse in line 0 has been running for 120 seconds, stacktrace of thread 139993131493120: 2024-03-15 20:39:36 +0000 25 execution.flow INFO Node GenerateClaimsResponse completes. [2024-03-15 20:41:35 +0000] [16] [CRITICAL] WORKER TIMEOUT (pid:25) |
Beta Was this translation helpful? Give feedback.
All reactions
-
so far the maximum timeout of a single request is 5 min for MIR deployment, and seems your flow contains many nodes and runs longer than 5min, is this long request processing time expected in your scenario? |
Beta Was this translation helpful? Give feedback.
All reactions
-
Yes our run might take longer than 5 minutes. Do we have any options to increase this timeout? |
Beta Was this translation helpful? Give feedback.
-
We are having some strange behavior with a promptflow that is not completing due to a WORKER TIMEOUT. This flow has a handful of LLM tool steps that do not take longer than 180 seconds to run, but we have a node that always triggers a termination signal after 60 seconds. This happens after multiple successful LLM node runs prior to this step. This node always sends a signal 9 termination code after 60 seconds.
GenerateClaimsResponse Runs fine and no timeouts.
"system_metrics":{"completion_tokens":1838,"duration":139.077701,"prompt_tokens":4677,"total_tokens":6515},
Node ClaimsComparison this node always failed in a deployed endpoint but runs fine as a debug run.
"system_metrics":{"completion_tokens":1808, "duration":125.590981"prompt_tokens":4586,"total_tokens":6394}
This is the node that gets the termination signal after 60 seconds.
Here are the logs:
WARNING ClaimsComparison in line 0 has been running for 60 seconds
…
[CRITICAL] WORKER TIMEOUT (pid:25)
[WARNING] Worker with pid 25 was terminated due to signal 9
We have tried many sizes of compute nodes, and it does not appear to affect this timeout behavior.
Any ideas on this?
Beta Was this translation helpful? Give feedback.
All reactions