A workflow engine #333

ajavadia · 2019-09-05T13:36:46Z

IBMQ needs a workflow engine to manage the execution and retrieval of jobs.

This would be a higher level construct than Job, which is the unit of interaction with the backends (from the API's point of view). But jobs have arbitrary limitations that should be masked from the high-level user. For example each backend places its own limit on the number of experiments contained within a job. So currently you have to manually batch many circuits into some jobs, manually store the jobs into some datastructure, and manually retrieve/post-process the results. If one job fails, it can be difficult to recover the total results without manually resubmitting that job and replacing the entry in the database.

The workflow engine automates this process. It can be called a JobManager. At a high level, the user specifies which "tasks" they would like to run. Even if a task has a million circuits, the workflow engine will take care of splitting them, farming them out to the backends, reporting on the progress to the user, and joining the reuslts into a single Result for the task.

Simple pseudo-code (up for discussion):

from qiskit.providers.IBMQ import JobManager
jm = JobManager()

# build a million circuits
from qiskit.circuit.random import random_circuit
circs = []
for _ in range(1000000):
    circs.append(random_circuit())

# farm out the jobs.. 
from qiskit import IBMQ
p = IBMQ.load_account()
device = p.backends.tenerife()
jm.run(circs, backend=device)

# inquire about the status of my runs
jm.status()  # says how many jobs, and the status of each

# join result of (potentially) multiple jobs
res = jm.result()

The text was updated successfully, but these errors were encountered:

woodsp-ibm · 2019-10-03T16:23:58Z

This would be great if we can alleviate Aqua from these type of transport/payload concerns and have it more widely available. Having a reliable guaranteed operation would allow us to rely on such an engine instead of requiring to have such code in Aqua. I am cross-referencing this PR as more thought for the engine. While we did some level splitting it turns out that it can become less than optimal as the PR outlines. qiskit-community/qiskit-aqua#694

ajavadia · 2019-10-09T01:43:19Z

Long term it would be nice to be able to somehow record this in the online database too. One very nice feature of IBMQProvider is that it allows you to retrieve results/qobj/properties from past executions and analyze them. If, for some big experiment, your circuits had to be split into 5 jobs, it can be hard to relate those 5 jobs later on. So it would be nice if the JobManager somehow recorded this relationship between the jobs in the database and presented them to the user in one unified form upon later retrieval.

jyu00 · 2019-10-15T20:09:47Z

@ajavadia a few design questions:

Say the circuits are divided into 3 jobs. jobs 1-2 are submitted, but job 3 fails to submit (e.g. no credits left). Do you think the user would still want jobs 1-2? I think the answer is yes, since the circuits might not be related at all, but I'd like your input.
To your point of jm.result() returning a single Result instance, I think that'd be rather hacky. Result is meant for just a single job, and it has metadata like job_id to point to a single job. In addition, Result cannot be instantiated directly (only viafrom_dict()), so it'd make it harder to manipulate it.
Is the JobManager meant to manage just 1 set of circuits? Or can I do jm.run() multiple times with different sets of circuits? I think the latter would make things more confusing (e.g. if we say circuit 21 failed, is it 21st from the beginning or 21st of the most recent set?). Plus the user can always create another JobManager instance for a new set of circuits.

Finally, to your point about relating the jobs together, we can use job name to do that. All jobs from the same set of circuits will have the same job name prefix, and the user can use backends.jobs() to retrieve them all.

ajavadia · 2019-10-20T18:02:27Z

Hi @jyu00

1- Yes I think partial Jobs/Results should be supported. Terra recently started to support partial results for cases when only a subset of circuits succeed in a job (Qiskit/qiskit#3217). This would be analogous to that.
It would be nice if the jobmanager.report() could say something like circuits 200-300 failed to submit. Reason: no credit left. I think currently it says something about job failure, but doesn't tell the user which circuits in the original huge list this corresponds to.

2- Hm, yeah I see. But that sort of defeats the purpose here because the point of this is to hide "API details" from a user who doesn't care how jobs are split up, what the limits are, etc. They just want to execute 1000 circuits and look at the result.

So how about: when I do jobmanager.result(), maybe it could return a JobManagerResult class which is agnostic to how many jobs it took to create it? This would basically be a wrapper around the list of results you currently return, but would support methods like jm_result.get_counts(circuit). This would not have a single job_id associated with it (could have a list of job_id maybe).

3- I'm good with whatever you decide here.

jyu00 · 2019-11-04T15:55:37Z

Implemented via the PRs mentioned in this issue.

ajavadia added the type: feature request New feature or request label Sep 5, 2019

ajavadia changed the title ~~A workflow manager~~ A workflow engine Sep 5, 2019

diego-plan9 added this to the 0.4 milestone Oct 8, 2019

woodsp-ibm mentioned this issue Oct 10, 2019

Aqua QSVM - 403 Client Error: ..... Error code: NOT_CREDITS_AVALIABLES. qiskit-community/qiskit-aqua#698

Closed

jyu00 mentioned this issue Oct 11, 2019

Job manager #389

Merged

jyu00 mentioned this issue Oct 18, 2019

Managed job sets #400

Merged

diego-plan9 assigned jyu00 Oct 23, 2019

This was referenced Oct 23, 2019

Managed results #407

Merged

Move the other job methods to ManagedJob #418

Merged

jyu00 closed this as completed Nov 4, 2019

jyu00 mentioned this issue Dec 11, 2019

Support retrieving old managed job sets #481

Closed

jyu00 mentioned this issue Jan 8, 2020

Add job share level to managed jobs #506

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A workflow engine #333

A workflow engine #333

ajavadia commented Sep 5, 2019

woodsp-ibm commented Oct 3, 2019

ajavadia commented Oct 9, 2019

jyu00 commented Oct 15, 2019

ajavadia commented Oct 20, 2019

jyu00 commented Nov 4, 2019

A workflow engine #333

A workflow engine #333

Comments

ajavadia commented Sep 5, 2019

woodsp-ibm commented Oct 3, 2019

ajavadia commented Oct 9, 2019

jyu00 commented Oct 15, 2019

ajavadia commented Oct 20, 2019

jyu00 commented Nov 4, 2019