You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Exception in callback Worker._handle_stimulus_from_task(<Task finishe... to SSL")\n')>) at /opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py:1844
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: handle: <Handle Worker._handle_stimulus_from_task(<Task finishe... to SSL")\n')>) at /opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py:1844>
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Traceback (most recent call last):
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/asyncio/events.py", line 80, in _run
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: self._context.run(self._callback, *self._args)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 193, in wrapper
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: return method(self, *args, **kwargs)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 1854, in _handle_stimulus_from_task
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: super()._handle_stimulus_from_task(task)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker_state_machine.py", line 3341, in _handle_stimulus_from_task
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: self.handle_stimulus(stim)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 193, in wrapper
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: return method(self, *args, **kwargs)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 1866, in handle_stimulus
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: super().handle_stimulus(*stims)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker_state_machine.py", line 3354, in handle_stimulus
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: instructions = self.state.handle_stimulus(*stims)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker_state_machine.py", line 1279, in handle_stimulus
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: instructions += self._transitions(recs, stimulus_id=stim.stimulus_id)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker_state_machine.py", line 2531, in _transitions
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: process_recs(recommendations.copy())
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker_state_machine.py", line 2525, in process_recs
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: a_recs, a_instructions = self._transition(
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker_state_machine.py", line 2443, in _transition
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: recs, instructions = func(self, ts, *args, stimulus_id=stimulus_id)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker_state_machine.py", line 1921, in _transition_cancelled_error
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: assert ts.previous in ("executing", "long-running")
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: AssertionError
Notably, the following log messages directly preceded the error:
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: 2022-08-11 10:00:05,215 - distributed.core - INFO - Event loop was unresponsive in Worker for 45.99s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: 2022-08-11 10:00:05,225 - distributed.worker - INFO - Stopping worker at tls://10.0.3.173:42801
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: 2022-08-11 10:00:05,240 - distributed.core - ERROR - Exception while handling op get_data
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Traceback (most recent call last):
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/comm/tcp.py", line 223, in read
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: frames_nbytes = await stream.read_bytes(fmt_size)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: tornado.iostream.StreamClosedError: Stream is closed
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: During handling of the above exception, another exception occurred:
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Traceback (most recent call last):
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/core.py", line 769, in _handle_comm
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: result = await result
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 1692, in get_data
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: response = await comm.read(deserializers=serializers)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/comm/tcp.py", line 239, in read
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: convert_stream_closed_error(self, e)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/comm/tcp.py", line 140, in convert_stream_closed_error
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: if "UNKNOWN_CA" in exc.reason:
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: TypeError: argument of type 'NoneType' is not iterable
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: 2022-08-11 10:00:05,268 - distributed.worker - ERROR - IOStream is not idle; cannot convert to SSL
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Traceback (most recent call last):
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/core.py", line 1362, in connect
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: await done.wait()
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/asyncio/locks.py", line 214, in wait
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: await fut
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: asyncio.exceptions.CancelledError
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: During handling of the above exception, another exception occurred:
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Traceback (most recent call last):
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 1983, in gather_dep
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: response = await get_data_from_worker(
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 2725, in get_data_from_worker
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: return await retry_operation(_get_data, operation="get_data_from_worker")
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/utils_comm.py", line 383, in retry_operation
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: return await retry(
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/utils_comm.py", line 368, in retry
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: return await coro()
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 2702, in _get_data
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: comm = await rpc.connect(worker)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/core.py", line 1367, in connect
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: await connect_attempt
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/core.py", line 1307, in _connect
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: comm = await connect(
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/comm/core.py", line 291, in connect
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: comm = await asyncio.wait_for(
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: return fut.result()
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/comm/tcp.py", line 449, in connect
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: stream = await self.client.connect(
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/tornado/tcpclient.py", line 288, in connect
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: stream = await stream.start_tls(
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/tornado/iostream.py", line 1270, in start_tls
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: raise ValueError("IOStream is not idle; cannot convert to SSL")
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: ValueError: IOStream is not idle; cannot convert to SSL
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Task exception was never retrieved
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: future: <Task finished name='Task-338' coro=<Server._handle_comm() done, defined at /opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/core.py:675> exception=TypeError("argument of type 'NoneType' is not iterable")>
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Traceback (most recent call last):
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/comm/tcp.py", line 223, in read
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: frames_nbytes = await stream.read_bytes(fmt_size)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: tornado.iostream.StreamClosedError: Stream is closed
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: During handling of the above exception, another exception occurred:
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: Traceback (most recent call last):
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/core.py", line 769, in _handle_comm
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: result = await result
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/worker.py", line 1692, in get_data
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: response = await comm.read(deserializers=serializers)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/comm/tcp.py", line 239, in read
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: convert_stream_closed_error(self, e)
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: File "/opt/conda/envs/coiled/lib/python3.10/site-packages/distributed/comm/tcp.py", line 140, in convert_stream_closed_error
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: if "UNKNOWN_CA" in exc.reason:
Aug 11 10:00:05 ip-10-0-3-173 cloud-init[1264]: TypeError: argument of type 'NoneType' is not iterable
The text was updated successfully, but these errors were encountered:
ts.previous == "flight".
This is tightly related to #6705:
The network stack raised, which caused gather_dep to return GatherDepFailureEvent, which is quite uncommon in real life and virtually untested.
(https://cloud.coiled.io/dask-engineering/clusters/48635/details on
hendrik-debug-worker-057e4c65bd
)Notably, the following log messages directly preceded the error:
The text was updated successfully, but these errors were encountered: