-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xtrigger timeout #6319
Comments
So you're suggesting a separate timeout for xtriggers, or reducing the general process pool timeout? If the system load is such that job submission and event handlers are taking PT10M to complete, then it's probably reasonable that xtriggers do too. |
I'm suggesting a separate timeout for xtriggers. Xtriggers are supposed to be simple short-running functions that return True/False and are run over and over until they are satisfied. They are not intended to be used as long-lived pollers that sit there waiting for a condition to be met. The PT10M timeout encourages the development of long-lived pollers. Setting a shorter default would discourage this behaviour (which is not the intended usage and really not a good idea as it will choke up your subprocpool). Async xtriggers will provide us with a way to make long-lived xtriggers a viable prospect, by making the "waiting on external condition" bit async. However, I don't think we will implement async xtriggers within the same framework due to their very different execution environment. |
We definitely should not encourage "long-lived poller" xtriggers. However, I think legit one-off-check xtriggers can still be on par (in terms of execution time) with other process pool processes . Well, perhaps not batched remote submissions, but certainly event handlers. Do you think that's not the case? |
Are you just arguing that xtriggers shouldn't be "special" or are you arguing that a 10 minute xtrigger is "reasonable"? |
I'm kind of struggling to see exactly why xtriggers should time out earlier than other processes in the process pool. A legit xtrigger could easily take as long to execute as job submission or event handler processes. None of these should be taking anything like 10 minutes (the default timeout) but I think we allow that in case of extreme system load. |
Ok, so you are arguing that xtriggers should not be "special", not that a PT10M xtrigger is reasonable. System load aside, we know that some subprocesses can take quite a while under normal circumstances, e.g. remote file-installation and remote log-retrieval will take as long as data transfer requires and could be impacted by local network. So the PT10M might make sense in some situations, however, it definitely does not make sense for xtriggers (which were intended to be short-lived polling functions). So given that some subprocess might be reasonably expected to take longer to run than an xtrigger, is it not reasonable to implement a separate time limit for xtriggers?! |
I've recently seen a workflow with an xtrigger which ran for 10 minutes.
It's likely that this clogged up the subproc pool causing delays to other commands. We should instate a default timeout for xtriggers that is more compatible with their intended usage in order to feedback this advice to workflow developers earlier in the process to avoid ending up in this situation. This would also help to protect against hanging xtriggers of course.
The text was updated successfully, but these errors were encountered: