job.result() doesn't raise RuntimeJobFailureError for failed job #311

jyu00 · 2022-05-06T22:18:06Z

Describe the bug

job.result() doesn't raise RuntimeJobFailureError even though the job failed.

My program sends a bad job, calls job.result() then job.status(). Here's the qiskit-ibm-runtime tracing:

>>>> calling job.result
base.on_open:DEBUG:2022-05-06 22:04:20,368: Websocket connection established for job c9qpmos6pc307mhm4v90
base.on_close:DEBUG:2022-05-06 22:04:28,770: Websocket connection for job c9qpmos6pc307mhm4v90 closed. status code=1000, message=
base.stream:DEBUG:2022-05-06 22:04:28,770: Websocket run_forever finished.
base.disconnect:DEBUG:2022-05-06 22:04:28,770: Client closing websocket connection with code None.
session._log_request_info:DEBUG:2022-05-06 22:04:28,771: Endpoint: https://runtime-us-east-dev.quantum-computing.ibm.com/jobs/c9qpmos6pc307mhm4v90. Method: GET. 
runtime.job_get:DEBUG:2022-05-06 22:04:28,803: Runtime job get response: {'id': 'c9qpmos6pc307mhm4v90', 'hub': 'rte-test', 'group': 'lp-all', 'project': 'aot', 'backend': 'test_lp1_dd_aot', 'state': {'status': 'Running'}, 'params': {'run_options': None, 'bad_run_jobs': 0, 'num_fast_jobs': 0, 'num_snail_jobs': 0, 'bad_compile_jobs': 1, 'num_slow_jobs': 0, 'num_large_jobs': 0, 'custom_circuits': None, 'use_job_builder': True}, 'program': {'id': 'aot-test-W3xljYElz3'}, 'created': '2022-05-06T22:04:19.756389Z', 'runtime': 'qiskit-program:latest', 'status': 'Running'}
session._log_request_info:DEBUG:2022-05-06 22:04:28,803: Endpoint: https://runtime-us-east-dev.quantum-computing.ibm.com/jobs/c9qpmos6pc307mhm4v90/results. Method: GET. 
>>>> calling job.status
session._log_request_info:DEBUG:2022-05-06 22:04:28,834: Endpoint: https://runtime-us-east-dev.quantum-computing.ibm.com/jobs/c9qpmos6pc307mhm4v90. Method: GET. 
runtime.job_get:DEBUG:2022-05-06 22:04:28,869: Runtime job get response: {'id': 'c9qpmos6pc307mhm4v90', 'hub': 'rte-test', 'group': 'lp-all', 'project': 'aot', 'backend': 'test_lp1_dd_aot', 'state': {'status': 'Failed', 'reason': 'Error'}, 'params': {'bad_compile_jobs': 1, 'bad_run_jobs': 0, 'num_slow_jobs': 0, 'num_large_jobs': 0, 'num_snail_jobs': 0, 'run_options': None, 'num_fast_jobs': 0, 'custom_circuits': None, 'use_job_builder': True}, 'program': {'id': 'aot-test-W3xljYElz3'}, 'created': '2022-05-06T22:04:19.756389Z', 'runtime': 'qiskit-program:latest', 'status': 'Failed'}
session._log_request_info:DEBUG:2022-05-06 22:04:28,870: Endpoint: https://runtime-us-east-dev.quantum-computing.ibm.com/jobs/c9qpmos6pc307mhm4v90/results. Method: GET. 
job status is JobStatus.ERROR

job.result() calls wait_for_final_state, which waits for websocket to close then calls status() to get the latest status. It appears in this case, the call to get the latest status returned Running. But wait_for_final_state assumes it'd be one of the end state status (since websocket was closed). But because it's Running instead of Failed, it never raises RuntimeJobFailureError.

The job.status() call that happens after shows the job had indeed failed.

Steps to reproduce

I was only able to reproduce this in CI. Also only tested staging.

Expected behavior
RuntimeJobFailureError should be raised.

Suggested solutions
It looks like the server should have returned Failed instead of Running. So either the server needs to be fixed, or the client needs to double check the status is indeed a final one in wait_for_final_state.

Additional Information

qiskit-ibm-runtime version: 0.4.0
Python version: 3.8.7
Operating system:

The text was updated successfully, but these errors were encountered:

rathishcholarajan · 2022-05-09T21:02:35Z

@renier Can we fix this on server side and make sure websocket connection closes only when the job has reached one of the final states?

renier · 2022-05-10T00:10:00Z

That’s how it works already. However, the stream gets the status update first right away, so it might not still be persisted in the db.

* Fix error message when min noise factors is not reached * Add noise factors to error message * Update ValueError message on ZNE extrapolator setting * Print extrapolator setting in repr format for ZNE error message * Remove quotations marks after repr printing in ZNE extrapolator setting error message Co-authored-by: Mariana Bernagozzi <Mariana.Bernagozzi@ibm.com>

jyu00 added the bug Something isn't working label May 6, 2022

rathishcholarajan added the api action Needs new API or changes to existing APIs before this ticket can be worked upon. label May 24, 2022

rathishcholarajan mentioned this issue May 26, 2022

Poll status API after stream closes in wait_for_final_state until it is final #341

Merged

rathishcholarajan closed this as completed in #341 May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

job.result() doesn't raise RuntimeJobFailureError for failed job #311

job.result() doesn't raise RuntimeJobFailureError for failed job #311

jyu00 commented May 6, 2022

rathishcholarajan commented May 9, 2022

renier commented May 10, 2022

job.result() doesn't raise RuntimeJobFailureError for failed job #311

job.result() doesn't raise RuntimeJobFailureError for failed job #311

Comments

jyu00 commented May 6, 2022

rathishcholarajan commented May 9, 2022

renier commented May 10, 2022