Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-graceful exit from dask example #34

Open
kpedro88 opened this issue Oct 8, 2024 · 2 comments
Open

Non-graceful exit from dask example #34

kpedro88 opened this issue Oct 8, 2024 · 2 comments

Comments

@kpedro88
Copy link
Contributor

kpedro88 commented Oct 8, 2024

The example at https://github.com/CoffeaTeam/lpcjobqueue#with-dask provides the expected output, but afterward produces this error:

2024-10-07 18:59:19,866 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f65de5b27d0>>, <Task finished name='Task-10753' coro=<SpecCluster._correct_state_internal() done, defined at /usr/local/lib/python3.11/site-packages/distributed/deploy/spec.py:346> exception=RuntimeError('cannot schedule new futures after shutdown')>)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/tornado/ioloop.py", line 750, in _run_callback
    ret = callback()
          ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tornado/ioloop.py", line 774, in _discard_future_result
    future.result()
RuntimeError: cannot schedule new futures after shutdown
2024-10-07 18:59:19,891 - distributed.scheduler - ERROR - 
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/distributed/utils.py", line 806, in wrapper
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/distributed/scheduler.py", line 4595, in add_worker
    await self.handle_worker(comm, address)
  File "/usr/local/lib/python3.11/site-packages/distributed/scheduler.py", line 6121, in handle_worker
    await self.handle_stream(comm=comm, extra={"worker": worker})
  File "/usr/local/lib/python3.11/site-packages/distributed/core.py", line 886, in handle_stream
    msgs = await comm.read()
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/distributed/comm/tcp.py", line 225, in read
    frames_nosplit_nbytes_bin = await stream.read_bytes(fmt_size)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError
Last-ditch attempt to close HTCondor job 43109063 in finalizer! You should confirm the job exits!
@kpedro88
Copy link
Contributor Author

kpedro88 commented Oct 8, 2024

The coffea-based simple_example has a similar problem (after finishing as expected):

2024-10-07 19:16:21,690 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f12b479aaa0>>, <Task finished name='Task-21336' coro=<SpecCluster._correct_state_internal() done, defined at /usr/local/lib/python3.10/site-packages/distributed/deploy/spec.py:346> exception=RuntimeError('cannot schedule new futures after shutdown')>)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/tornado/ioloop.py", line 738, in _run_callback
    ret = callback()
  File "/usr/local/lib/python3.10/site-packages/tornado/ioloop.py", line 762, in _discard_future_result
    future.result()
RuntimeError: cannot schedule new futures after shutdown
Last-ditch attempt to close HTCondor job 43109076 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109074 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109071 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109070 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109069 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109068 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109067 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109066 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109065 in finalizer! You should confirm the job exits!
2024-10-07 19:16:22,058 - distributed.deploy.adaptive_core - INFO - Adaptive stop

@kpedro88
Copy link
Contributor Author

kpedro88 commented Oct 8, 2024

Here's another, more verbose version from a different simple_example run:

2024-10-07 19:23:44,537 - distributed.deploy.adaptive_core - ERROR - cannot schedule new futures after shutdown
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/distributed/utils.py", line 801, in wrapper
    return await func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/adaptive.py", line 204, in scale_down
    await f
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/spec.py", line 574, in scale_down
    await self
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/spec.py", line 418, in _
    await self._correct_state()
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/spec.py", line 359, in _correct_state_internal
    await asyncio.gather(*tasks)
  File "/srv/.env/lib/python3.10/site-packages/lpcjobqueue/cluster.py", line 156, in close
    raise ex
  File "/srv/.env/lib/python3.10/site-packages/lpcjobqueue/cluster.py", line 146, in close
    is_gone = await asyncio.get_event_loop().run_in_executor(
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 821, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 167, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
2024-10-07 19:23:44,600 - tornado.application - ERROR - Exception in callback <function AdaptiveCore.__init__.<locals>._adapt at 0x7f477be81870>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/tornado/ioloop.py", line 921, in _run
    await val
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/adaptive_core.py", line 124, in _adapt
    await core.adapt()
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/adaptive_core.py", line 243, in adapt
    await self.scale_down(**recommendations)
  File "/usr/local/lib/python3.10/site-packages/distributed/utils.py", line 801, in wrapper
    return await func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/adaptive.py", line 204, in scale_down
    await f
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/spec.py", line 574, in scale_down
    await self
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/spec.py", line 418, in _
    await self._correct_state()
  File "/usr/local/lib/python3.10/site-packages/distributed/deploy/spec.py", line 359, in _correct_state_internal
    await asyncio.gather(*tasks)
  File "/srv/.env/lib/python3.10/site-packages/lpcjobqueue/cluster.py", line 156, in close
    raise ex
  File "/srv/.env/lib/python3.10/site-packages/lpcjobqueue/cluster.py", line 146, in close
    is_gone = await asyncio.get_event_loop().run_in_executor(
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 821, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 167, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
2024-10-07 19:23:45,615 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f477be5eb30>>, <Task finished name='Task-22453' coro=<SpecCluster._correct_state_internal() done, defined at /usr/local/lib/python3.10/site-packages/distributed/deploy/spec.py:346> exception=RuntimeError('cannot schedule new futures after shutdown')>)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/tornado/ioloop.py", line 738, in _run_callback
    ret = callback()
  File "/usr/local/lib/python3.10/site-packages/tornado/ioloop.py", line 762, in _discard_future_result
    future.result()
RuntimeError: cannot schedule new futures after shutdown
Last-ditch attempt to close HTCondor job 43109087 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109086 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109085 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109084 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109083 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109082 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109079 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109078 in finalizer! You should confirm the job exits!
Last-ditch attempt to close HTCondor job 43109077 in finalizer! You should confirm the job exits!
2024-10-07 19:23:45,980 - distributed.deploy.adaptive_core - INFO - Adaptive stop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant