Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout handlers do not execute when the corresponding abort on X timeout = True #5997

Open
MetRonnie opened this issue Feb 23, 2024 · 3 comments
Labels
bug Something is wrong :(
Milestone

Comments

@MetRonnie
Copy link
Member

[scheduler]
    [[events]]
        abort on workflow timeout = True
        workflow timeout = PT1S
        abort handlers = echo "ALPHA"
        workflow timeout handlers = echo "TANGO"

The workflow timeout handler is not running when abort on workflow timeout is set

Originally posted by @MetRonnie in #5959 (comment)

@MetRonnie MetRonnie added the bug Something is wrong :( label Feb 23, 2024
@MetRonnie MetRonnie added this to the cylc-8.x milestone Feb 23, 2024
@oliver-sanders
Copy link
Member

The reason for this is that when the workflow aborts, it terminates processes in the subprocpool INCLUDING the event handler.

Changing this behaviour will require careful thought as it could trigger events we don't want. E.G. preparing tasks may go into the submit-failed state erroneously.

@hjoliver
Copy link
Member

hjoliver commented Feb 26, 2024

Can we just leverage the cylc stop --now (but not --now --now) code, for the abort shutdown?

stop  -n, --now             Shut down without waiting for active tasks to
                        complete. If this option is specified once, wait for
                        task event handler, job poll/kill to complete. If this
                        option is specified more than once, tell the workflow
                        to terminate immediately.

@oliver-sanders
Copy link
Member

oliver-sanders commented Jul 15, 2024

Abort events take down the scheduler by raising a SchedulerError rather than requesting a shutdown. It's a much more instantaneous stop which also results in a non-zero exit code:

# "cylc play" needs to exit with error status here.
raise SchedulerError(f'"{abort_conf}" is set')

In the abort case, we want to wait for aborted/timeout handlers, but I guess we might not want to wait for log file retrieval, etc (it could be a really critical shutdown).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

No branches or pull requests

3 participants