You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The wakeup mechanism in UCX may hang up to 20 seconds when using the progress thread, and thus appear that there was a deadlock in the application, for example when running:
Specify --progress-mode thread-polling to the benchmark above (or UCXPY_PROGRESS_MODE=thread-polling for other applications); or
Specifying a low keep alive interval: UCX_TCP_KEEPINTVL=1ms UCX_KEEPALIVE_INTERVAL=1ms.
Ideally we wouldn't need either one of them. Using thread-polling mode means the progress thread will run at 100% CPU all the time, whereas a low keep alive interval may be the source of other issues, perhaps even causing premature timeouts.
The text was updated successfully, but these errors were encountered:
Increase the timeout of the Python benchmarks with `thread` progress
mode to avoid hitting rapidsai#15 . This
can be reverted once that issue is resolved.
The wakeup mechanism in UCX may hang up to 20 seconds when using the progress thread, and thus appear that there was a deadlock in the application, for example when running:
There are currently two ways to prevent the hang:
--progress-mode thread-polling
to the benchmark above (orUCXPY_PROGRESS_MODE=thread-polling
for other applications); orUCX_TCP_KEEPINTVL=1ms UCX_KEEPALIVE_INTERVAL=1ms
.Ideally we wouldn't need either one of them. Using
thread-polling
mode means the progress thread will run at 100% CPU all the time, whereas a low keep alive interval may be the source of other issues, perhaps even causing premature timeouts.The text was updated successfully, but these errors were encountered: