server: handle startup errors more nicely #4865

oliver-sanders · 2022-05-09T15:26:32Z

If an error occurs during the startup of the workflow network server we get a long nasty traceback irrespective of the error type.

The reason for this is that exceptions raised inside of threads cannot be caught from the parent thread so Cylc eventually falls over when the timeout on the "barrier" is hit.

There is one exception which is relatively likely to occur on startup:

zmq.error.ZMQError: Address already in use

This can happen if all of the ports in the specified range are occupied, OR, I guess potentially if multiple workflows/processes try to claim the same port simultaneously, dunno. To replicate this add the following to your global.cylc:

[scheduler]    
    [[run hosts]]    
        ports = 43042 .. 43042

Ideally we would, somehow catch this exception in the parent thread and pass it through our standard error handling which aught to niceify the error a bit making it more obvious what the issue is.

There's some info on catching exceptions in threads here:

https://stackoverflow.com/questions/2829329/catch-a-threads-exception-in-the-caller-thread

Looks like the ThreadPoolExecutor passes the exceptions around nicely - https://stackoverflow.com/a/12808634

tldr; we don't want the exception TB to arise from the barrier.wait if possible.

Pull requests welcome!

The text was updated successfully, but these errors were encountered:

oliver-sanders · 2022-05-09T15:26:53Z

(note same situation on master as in #4274)

oliver-sanders added the could be better Not exactly a bug, but not ideal. label May 9, 2022

oliver-sanders added this to the cylc-8.x milestone May 9, 2022

oliver-sanders mentioned this issue Apr 12, 2023

play: handle port error better #4958

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: handle startup errors more nicely #4865

server: handle startup errors more nicely #4865

oliver-sanders commented May 9, 2022 •

edited

Loading

oliver-sanders commented May 9, 2022

server: handle startup errors more nicely #4865

server: handle startup errors more nicely #4865

Comments

oliver-sanders commented May 9, 2022 • edited Loading

oliver-sanders commented May 9, 2022

oliver-sanders commented May 9, 2022 •

edited

Loading