You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I created a cluster on HPC using Slurm and dask-gateway-server, I encountered a problem. My understanding of the running process is as follows: when dask-gateway-server receives the new_cluster command from the client, it converts the command into an sbatch command. I have edited the dask_gateway_server/backends/jobqueue/slurm.py file and print the variables cmd, env, and script in get_submit_cmd_env_stdin, the output are as follows:
When the Slurm node receives this command and begins execution, if the non-edge node receives the Slurm Job, it will try to find the dask.crt and dask.pem files that appear in the environment variables above, but these files do not exist on this node. The Slurm task will fail and the error message is as follows:
2023-05-29 17:09:58,047 - distributed.preloading - INFO - Import preload module: dask_gateway.scheduler_preload
/opt/dask/lib/python3.10/site-packages/distributed/cli/dask_scheduler.py:140: FutureWarning: dask-scheduler is deprecated and will be removed in a future release; use `dask scheduler` instead
warnings.warn(
2023-05-29 17:09:58,049 - distributed.scheduler - INFO - -----------------------------------------------
2023-05-29 17:09:58,050 - distributed.preloading - INFO - Creating preload: dask_gateway.scheduler_preload
2023-05-29 17:09:58,050 - distributed.preloading - INFO - Import preload module: dask_gateway.scheduler_preload
2023-05-29 17:09:58,050 - distributed.scheduler - INFO - End scheduler
Traceback (most recent call last):
File "/opt/dask/bin/dask-scheduler", line 8, in <module>
sys.exit(main())
File "/opt/dask/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/dask/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/dask/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/dask/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/dask/lib/python3.10/site-packages/distributed/cli/dask_scheduler.py", line 249, in main
asyncio.run(run())
File "/opt/dask/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/dask/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/dask/lib/python3.10/site-packages/distributed/cli/dask_scheduler.py", line 209, in run
scheduler = Scheduler(
File "/opt/dask/lib/python3.10/site-packages/distributed/scheduler.py", line 3464, in __init__
self.connection_args = self.security.get_connection_args("scheduler")
File "/opt/dask/lib/python3.10/site-packages/distributed/security.py", line 342, in get_connection_args
"ssl_context": self._get_tls_context(tls, ssl.Purpose.SERVER_AUTH),
File "/opt/dask/lib/python3.10/site-packages/distributed/security.py", line 299, in _get_tls_context
ctx = ssl.create_default_context(purpose=purpose, cafile=ca)
File "/opt/dask/lib/python3.10/ssl.py", line 766, in create_default_context
context.load_verify_locations(cafile, capath, cadata)
FileNotFoundError: [Errno 2] No such file or directory
Hi, I am also facing the same issue. Can someone please support me on this? My understanding is dask-gateway sets the environment variable for the location of dask.crt which is the staging location but it never copies the dask.crt to that location.
When I created a cluster on HPC using Slurm and dask-gateway-server, I encountered a problem. My understanding of the running process is as follows: when dask-gateway-server receives the new_cluster command from the client, it converts the command into an
sbatch
command. I have edited thedask_gateway_server/backends/jobqueue/slurm.py
file and print the variables cmd, env, and script inget_submit_cmd_env_stdin
, the output are as follows:cmd
env
script
When the Slurm node receives this command and begins execution, if the non-edge node receives the Slurm Job, it will try to find the dask.crt and dask.pem files that appear in the environment variables above, but these files do not exist on this node. The Slurm task will fail and the error message is as follows:
@jcrist @consideRatio @TomAugspurger @jacobtomlinson @martindurant
The text was updated successfully, but these errors were encountered: