Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: handle startup errors more nicely #4865

Open
oliver-sanders opened this issue May 9, 2022 · 1 comment
Open

server: handle startup errors more nicely #4865

oliver-sanders opened this issue May 9, 2022 · 1 comment
Labels
could be better Not exactly a bug, but not ideal.
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented May 9, 2022

If an error occurs during the startup of the workflow network server we get a long nasty traceback irrespective of the error type.

The reason for this is that exceptions raised inside of threads cannot be caught from the parent thread so Cylc eventually falls over when the timeout on the "barrier" is hit.

There is one exception which is relatively likely to occur on startup:

zmq.error.ZMQError: Address already in use

This can happen if all of the ports in the specified range are occupied, OR, I guess potentially if multiple workflows/processes try to claim the same port simultaneously, dunno. To replicate this add the following to your global.cylc:

[scheduler]    
    [[run hosts]]    
        ports = 43042 .. 43042

Ideally we would, somehow catch this exception in the parent thread and pass it through our standard error handling which aught to niceify the error a bit making it more obvious what the issue is.

There's some info on catching exceptions in threads here:

https://stackoverflow.com/questions/2829329/catch-a-threads-exception-in-the-caller-thread

Looks like the ThreadPoolExecutor passes the exceptions around nicely - https://stackoverflow.com/a/12808634

tldr; we don't want the exception TB to arise from the barrier.wait if possible.

Pull requests welcome!

@oliver-sanders oliver-sanders added the could be better Not exactly a bug, but not ideal. label May 9, 2022
@oliver-sanders oliver-sanders added this to the cylc-8.x milestone May 9, 2022
@oliver-sanders
Copy link
Member Author

(note same situation on master as in #4274)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be better Not exactly a bug, but not ideal.
Projects
None yet
Development

No branches or pull requests

1 participant