Skip to content
This repository has been archived by the owner on Jul 28, 2023. It is now read-only.

A workflow engine #333

Closed
ajavadia opened this issue Sep 5, 2019 · 5 comments
Closed

A workflow engine #333

ajavadia opened this issue Sep 5, 2019 · 5 comments
Assignees
Labels
type: feature request New feature or request
Milestone

Comments

@ajavadia
Copy link
Member

ajavadia commented Sep 5, 2019

IBMQ needs a workflow engine to manage the execution and retrieval of jobs.

This would be a higher level construct than Job, which is the unit of interaction with the backends (from the API's point of view). But jobs have arbitrary limitations that should be masked from the high-level user. For example each backend places its own limit on the number of experiments contained within a job. So currently you have to manually batch many circuits into some jobs, manually store the jobs into some datastructure, and manually retrieve/post-process the results. If one job fails, it can be difficult to recover the total results without manually resubmitting that job and replacing the entry in the database.

The workflow engine automates this process. It can be called a JobManager. At a high level, the user specifies which "tasks" they would like to run. Even if a task has a million circuits, the workflow engine will take care of splitting them, farming them out to the backends, reporting on the progress to the user, and joining the reuslts into a single Result for the task.

Simple pseudo-code (up for discussion):

from qiskit.providers.IBMQ import JobManager
jm = JobManager()

# build a million circuits
from qiskit.circuit.random import random_circuit
circs = []
for _ in range(1000000):
    circs.append(random_circuit())

# farm out the jobs.. 
from qiskit import IBMQ
p = IBMQ.load_account()
device = p.backends.tenerife()
jm.run(circs, backend=device)

# inquire about the status of my runs
jm.status()  # says how many jobs, and the status of each

# join result of (potentially) multiple jobs
res = jm.result()
@ajavadia ajavadia added the type: feature request New feature or request label Sep 5, 2019
@ajavadia ajavadia changed the title A workflow manager A workflow engine Sep 5, 2019
@woodsp-ibm
Copy link
Member

This would be great if we can alleviate Aqua from these type of transport/payload concerns and have it more widely available. Having a reliable guaranteed operation would allow us to rely on such an engine instead of requiring to have such code in Aqua. I am cross-referencing this PR as more thought for the engine. While we did some level splitting it turns out that it can become less than optimal as the PR outlines. qiskit-community/qiskit-aqua#694

@diego-plan9 diego-plan9 added this to the 0.4 milestone Oct 8, 2019
@ajavadia
Copy link
Member Author

ajavadia commented Oct 9, 2019

Long term it would be nice to be able to somehow record this in the online database too. One very nice feature of IBMQProvider is that it allows you to retrieve results/qobj/properties from past executions and analyze them. If, for some big experiment, your circuits had to be split into 5 jobs, it can be hard to relate those 5 jobs later on. So it would be nice if the JobManager somehow recorded this relationship between the jobs in the database and presented them to the user in one unified form upon later retrieval.

@jyu00
Copy link
Collaborator

jyu00 commented Oct 15, 2019

@ajavadia a few design questions:

  1. Say the circuits are divided into 3 jobs. jobs 1-2 are submitted, but job 3 fails to submit (e.g. no credits left). Do you think the user would still want jobs 1-2? I think the answer is yes, since the circuits might not be related at all, but I'd like your input.

  2. To your point of jm.result() returning a single Result instance, I think that'd be rather hacky. Result is meant for just a single job, and it has metadata like job_id to point to a single job. In addition, Result cannot be instantiated directly (only viafrom_dict()), so it'd make it harder to manipulate it.

  3. Is the JobManager meant to manage just 1 set of circuits? Or can I do jm.run() multiple times with different sets of circuits? I think the latter would make things more confusing (e.g. if we say circuit 21 failed, is it 21st from the beginning or 21st of the most recent set?). Plus the user can always create another JobManager instance for a new set of circuits.

Finally, to your point about relating the jobs together, we can use job name to do that. All jobs from the same set of circuits will have the same job name prefix, and the user can use backends.jobs() to retrieve them all.

@jyu00 jyu00 mentioned this issue Oct 18, 2019
@ajavadia
Copy link
Member Author

Hi @jyu00

1- Yes I think partial Jobs/Results should be supported. Terra recently started to support partial results for cases when only a subset of circuits succeed in a job (Qiskit/qiskit#3217). This would be analogous to that.
It would be nice if the jobmanager.report() could say something like circuits 200-300 failed to submit. Reason: no credit left. I think currently it says something about job failure, but doesn't tell the user which circuits in the original huge list this corresponds to.

2- Hm, yeah I see. But that sort of defeats the purpose here because the point of this is to hide "API details" from a user who doesn't care how jobs are split up, what the limits are, etc. They just want to execute 1000 circuits and look at the result.

So how about: when I do jobmanager.result(), maybe it could return a JobManagerResult class which is agnostic to how many jobs it took to create it? This would basically be a wrapper around the list of results you currently return, but would support methods like jm_result.get_counts(circuit). This would not have a single job_id associated with it (could have a list of job_id maybe).

3- I'm good with whatever you decide here.

@jyu00
Copy link
Collaborator

jyu00 commented Nov 4, 2019

Implemented via the PRs mentioned in this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants