Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job.result() doesn't raise RuntimeJobFailureError for failed job #311

Closed
jyu00 opened this issue May 6, 2022 · 2 comments · Fixed by #341
Closed

job.result() doesn't raise RuntimeJobFailureError for failed job #311

jyu00 opened this issue May 6, 2022 · 2 comments · Fixed by #341
Labels
api action Needs new API or changes to existing APIs before this ticket can be worked upon. bug Something isn't working

Comments

@jyu00
Copy link
Collaborator

jyu00 commented May 6, 2022

Describe the bug

job.result() doesn't raise RuntimeJobFailureError even though the job failed.

My program sends a bad job, calls job.result() then job.status(). Here's the qiskit-ibm-runtime tracing:

>>>> calling job.result
base.on_open:DEBUG:2022-05-06 22:04:20,368: Websocket connection established for job c9qpmos6pc307mhm4v90
base.on_close:DEBUG:2022-05-06 22:04:28,770: Websocket connection for job c9qpmos6pc307mhm4v90 closed. status code=1000, message=
base.stream:DEBUG:2022-05-06 22:04:28,770: Websocket run_forever finished.
base.disconnect:DEBUG:2022-05-06 22:04:28,770: Client closing websocket connection with code None.
session._log_request_info:DEBUG:2022-05-06 22:04:28,771: Endpoint: https://runtime-us-east-dev.quantum-computing.ibm.com/jobs/c9qpmos6pc307mhm4v90. Method: GET. 
runtime.job_get:DEBUG:2022-05-06 22:04:28,803: Runtime job get response: {'id': 'c9qpmos6pc307mhm4v90', 'hub': 'rte-test', 'group': 'lp-all', 'project': 'aot', 'backend': 'test_lp1_dd_aot', 'state': {'status': 'Running'}, 'params': {'run_options': None, 'bad_run_jobs': 0, 'num_fast_jobs': 0, 'num_snail_jobs': 0, 'bad_compile_jobs': 1, 'num_slow_jobs': 0, 'num_large_jobs': 0, 'custom_circuits': None, 'use_job_builder': True}, 'program': {'id': 'aot-test-W3xljYElz3'}, 'created': '2022-05-06T22:04:19.756389Z', 'runtime': 'qiskit-program:latest', 'status': 'Running'}
session._log_request_info:DEBUG:2022-05-06 22:04:28,803: Endpoint: https://runtime-us-east-dev.quantum-computing.ibm.com/jobs/c9qpmos6pc307mhm4v90/results. Method: GET. 
>>>> calling job.status
session._log_request_info:DEBUG:2022-05-06 22:04:28,834: Endpoint: https://runtime-us-east-dev.quantum-computing.ibm.com/jobs/c9qpmos6pc307mhm4v90. Method: GET. 
runtime.job_get:DEBUG:2022-05-06 22:04:28,869: Runtime job get response: {'id': 'c9qpmos6pc307mhm4v90', 'hub': 'rte-test', 'group': 'lp-all', 'project': 'aot', 'backend': 'test_lp1_dd_aot', 'state': {'status': 'Failed', 'reason': 'Error'}, 'params': {'bad_compile_jobs': 1, 'bad_run_jobs': 0, 'num_slow_jobs': 0, 'num_large_jobs': 0, 'num_snail_jobs': 0, 'run_options': None, 'num_fast_jobs': 0, 'custom_circuits': None, 'use_job_builder': True}, 'program': {'id': 'aot-test-W3xljYElz3'}, 'created': '2022-05-06T22:04:19.756389Z', 'runtime': 'qiskit-program:latest', 'status': 'Failed'}
session._log_request_info:DEBUG:2022-05-06 22:04:28,870: Endpoint: https://runtime-us-east-dev.quantum-computing.ibm.com/jobs/c9qpmos6pc307mhm4v90/results. Method: GET. 
job status is JobStatus.ERROR

job.result() calls wait_for_final_state, which waits for websocket to close then calls status() to get the latest status. It appears in this case, the call to get the latest status returned Running. But wait_for_final_state assumes it'd be one of the end state status (since websocket was closed). But because it's Running instead of Failed, it never raises RuntimeJobFailureError.

The job.status() call that happens after shows the job had indeed failed.

Steps to reproduce

I was only able to reproduce this in CI. Also only tested staging.

Expected behavior
RuntimeJobFailureError should be raised.

Suggested solutions
It looks like the server should have returned Failed instead of Running. So either the server needs to be fixed, or the client needs to double check the status is indeed a final one in wait_for_final_state.

Additional Information

  • qiskit-ibm-runtime version: 0.4.0
  • Python version: 3.8.7
  • Operating system:
@jyu00 jyu00 added the bug Something isn't working label May 6, 2022
@rathishcholarajan
Copy link
Member

@renier Can we fix this on server side and make sure websocket connection closes only when the job has reached one of the final states?

@renier
Copy link
Contributor

renier commented May 10, 2022

That’s how it works already. However, the stream gets the status update first right away, so it might not still be persisted in the db.

@rathishcholarajan rathishcholarajan added the api action Needs new API or changes to existing APIs before this ticket can be worked upon. label May 24, 2022
blakejohnson pushed a commit to blakejohnson/qiskit-ibm-runtime that referenced this issue May 26, 2023
* Fix error message when min noise factors is not reached

* Add noise factors to error message

* Update ValueError message on ZNE extrapolator setting

* Print extrapolator setting in repr format for ZNE error message

* Remove quotations marks after repr printing in ZNE extrapolator setting error message

Co-authored-by: Mariana Bernagozzi <Mariana.Bernagozzi@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api action Needs new API or changes to existing APIs before this ticket can be worked upon. bug Something isn't working
Projects
None yet
3 participants