-
Notifications
You must be signed in to change notification settings - Fork 114
Closed
Labels
Description
What happened? (You can include a screenshot if it helps explain)
Running Everest does not seem to work when running on compute cluster. The server starts on a node, but no jobs are submitted. After I while an error showed up in the terminal:
ERROR:everest_main:Everest run failed with: Traceback (most recent call last):
File "/path/to/lib64/python3.11/site-packages/everest/detached/jobs/everserver.py", line 327, in main
status, message = _get_optimization_status(run_model.exit_code, shared_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/lib64/python3.11/site-packages/everest/detached/jobs/everserver.py", line 391, in _get_optimization_status
messages = _failed_realizations_messages(shared_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/lib64/python3.11/site-packages/everest/detached/jobs/everserver.py", line 401, in _failed_realizations_messages
failed = shared_data[SIM_PROGRESS_ENDPOINT]["status"]["failed"]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
KeyError: 'status'
Traceback (most recent call last):
File "/path/to/lib64/python3.11/site-packages/everest/detached/jobs/everserver.py", line 327, in main
status, message = _get_optimization_status(run_model.exit_code, shared_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/lib64/python3.11/site-packages/everest/detached/jobs/everserver.py", line 391, in _get_optimization_status
messages = _failed_realizations_messages(shared_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/path/to/lib64/python3.11/site-packages/everest/detached/jobs/everserver.py", line 401, in _failed_realizations_messages
failed = shared_data[SIM_PROGRESS_ENDPOINT]["status"]["failed"]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
KeyError: 'status'
This was on lsf
, but was also reported for slurm
. Not sure if the error message was the same for slurm. Everest could run on the same config with local
.
What did you expect to happen?
No response
steps to reproduce
Run math_func
with:
simulator:
queue_system: lsf
Environment where bug has been observed
- python 3.11
- python 3.12
- macosx
- rhel7
- rhel8
- local queue
- lsf queue
- slurm queue
- openPBS queue
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Done