Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add open batches to pulser-pasqal #701

Merged
merged 33 commits into from
Sep 20, 2024
Merged

Add open batches to pulser-pasqal #701

merged 33 commits into from
Sep 20, 2024

Conversation

oliver-gordon
Copy link
Collaborator

On the PCS we support open batches, to continuously add new jobs and there is support for this in the SDK but this has not been translated over to pulser-pasqal yet.

Due to the abstraction of batches in this lib I've attempted to keep away the term batch and just flow with open_submission, since our known reference is a submission_id.

When creating a new submission, a user is to either pass the submission_id and we can append new jobs to an existing batch or a boolean flag to indicate whether or not we want the submission being created to be left as open

@oliver-gordon oliver-gordon requested a review from HGSilveri June 28, 2024 12:27
@HGSilveri HGSilveri marked this pull request as draft June 28, 2024 12:31
@HGSilveri HGSilveri changed the title [DRAFT] Add open submissions to pulser-pasqal Add open submissions to pulser-pasqal Jun 28, 2024
Copy link
Collaborator

@HGSilveri HGSilveri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this first draft @oliver-gordon ! I have a left few comments, but let me focus on a design question here.

Does it make sense to use the same run() method for both starting a submission and adding jobs to one? I suggested below that it made more sense as a separate backend method, but it would also work as a method of the RemoteResults class you get after you call run().

EDIT: That was a bad idea, let's please disregard it.

pulser-core/pulser/backend/qpu.py Outdated Show resolved Hide resolved
pulser-core/pulser/backend/qpu.py Outdated Show resolved Hide resolved
pulser-core/pulser/backend/remote.py Outdated Show resolved Hide resolved
pulser-core/pulser/backend/qpu.py Outdated Show resolved Hide resolved
pulser-pasqal/pulser_pasqal/backends.py Outdated Show resolved Hide resolved
@MatthieuMoreau0
Copy link
Collaborator

Due to the abstraction of batches in this lib I've attempted to keep away the term batch and just flow with open_submission, since our known reference is a submission_id.

Should we not introduce the notion of batch at all in this lib? If it becomes the main entrypoint to submit jobs, shouldn't we use the same language as on other components of our platform - where the notion of batch is central. I am worried it would create some confusion for users when switching from pulser-pasqal to the user portal. Curious what you guys think on that front

@MatthieuMoreau0
Copy link
Collaborator

MatthieuMoreau0 commented Jul 1, 2024

My main suggestion for the user interface compared to what is done here is to not ask the user for the submission_id explicitly but handle it for them; we only need the user to let us know when they want to open the batch and when they want to close it.

We could provide a open_batch/start_session method which would create an open batch and store as an attribute of the class the batch ID. In the run function flow we would then check for that attribute to know whether we should create a new closed batch or add jobs to the current batch. We could add a close_batch/stop_session which makes an API call to close the batch and reset the batch ID stored in the backend class.

Example snippet

Open batch

backend = QPUBackend(seq, connection)
backend.start_session()
# these jobs are automatically added to the batch
results = backend.run(job_params, wait=True)
# some more jobs added to the same batch
results_2 = backend.run(job_params_2, wait=True)
backend.stop_session()

Closed batch (no change)

backend = QPUBackend(seq, connection)
# Create a closed batch with some jobs
results = backend.run(job_params, wait=True)
# Create another batch with some jobs
results_2 = session.run(job_params_2, wait=True)

We could also provide a context manager to handle the batch lifecycle

Starting the context manager would create a batch and store the associated batch_id as an attribute of the backend instance. Exiting the context manager closes the batch automatically and resets the batch_id attribute.

If the run method is used outside the context manager, it creates a closed batch with the jobs passed as argument.

e.g the user interface would look something like

Open batch example:

backend = QPUBackend(seq, connection)
# opening the context manager creates a batch. 
# Here I used the wording "start session" but it could be called "open_batch"
with backend.start_session():
     # these jobs are automatically added to the batch
     results = backend.run(job_params, wait=True)
     # some more jobs added to the same batch
     results_2 = backend.run(job_params_2, wait=True)
#we exited the context manager so the batch is now closed

Closed batch (no change)

backend = QPUBackend(seq, connection)
# Create a closed batch with some jobs
results = backend.run(job_params, wait=True)
# Create another batch with some jobs
results_2 = session.run(job_params_2, wait=True)

Thoughts about this general interface? What about the context manager? It's simply syntactic sugar but it can help prevent users forgetting to close their batch

@oliver-gordon
Copy link
Collaborator Author

@MatthieuMoreau0 Should we not introduce the notion of batch at all in this lib? That would not be for this MR, where the intention is to introduce functionality. To introduce the concept of a batch would be to modify all the existing interfaces

@HGSilveri
Copy link
Collaborator

Due to the abstraction of batches in this lib I've attempted to keep away the term batch and just flow with open_submission, since our known reference is a submission_id.

Should we not introduce the notion of batch at all in this lib? If it becomes the main entrypoint to submit jobs, shouldn't we use the same language as on other components of our platform - where the notion of batch is central. I am worried it would create some confusion for users when switching from pulser-pasqal to the user portal. Curious what you guys think on that front

Indeed, I have avoided referring to "batches" so far, but in reality "submission" is still only an internal parameter too, so we could start calling it "batch" from the get go. However, I'm not sure about this because it seems to me that the way the user will interact with the Batch is different - they won't be taking the batch, adding jobs to it and closing, but rather using the backend to do all these things. So on the one hand, calling a submission a "batch" will provide a useful link to a concept of pasqal-cloud, but on the other hand it might induce confusion.

@HGSilveri
Copy link
Collaborator

Open batch example:

backend = QPUBackend(seq, connection)
# opening the context manager creates a batch. 
# Here I used the wording "start session" but it could be called "open_batch"
with backend.start_session():
     # these jobs are automatically added to the batch
     results = backend.run(job_params, wait=True)
     # some more jobs added to the same batch
     results_2 = backend.run(job_params_2, wait=True)
#we exited the context manager so the batch is now closed

Closed batch (no change)

backend = QPUBackend(seq, connection)
# Create a closed batch with some jobs
results = backend.run(job_params, wait=True)
# Create another batch with some jobs
results_2 = session.run(job_params_2, wait=True)

Thoughts about this general interface? What about the context manager? It's simply syntactic sugar but it can help prevent users forgetting to close their batch

I quite like this, I find it elegant, relatively simple and it meets the criteria of having the actions performed by the backend. One question though: when you call run() for the second time in an open batch, does the results_2 object give a view of only the results of the new jobs, or also include the results of previously submitted jobs? In other others, does it complement results or does it replace it? I suspect you were going for the first option and it is also the one that makes most sense to me

@awennersteen
Copy link
Member

I personally would welcome having the concept of Batch in Pulser. However it would be in a way that I can re-use in downstreams libraries, and ideally we'd have it consistent with the cloud backend and emulation backends (even if that means usually wrapping the results of the emulator as a single job in a batch).

Due to that, I think I Agree with Oliver that it might be out of scope here.

@MatthieuMoreau0
Copy link
Collaborator

I was not suggesting to introduce batches in this MR specifically either; and definitely not as an object the user should manipulate. I was wondering mostly about the wording because as Henrique mentioned submission_id is currently an internal implementation detail the user is not aware of so we could rename it "batch" freely if we want to share the cloud naming conventions

@MatthieuMoreau0
Copy link
Collaborator

when you call run() for the second time in an open batch, does the results_2 object give a view of only the results of the new jobs, or also include the results of previously submitted jobs?

I don't know tbh. Whatever makes most sense given the current implementation of backends in Pulser

@awennersteen
Copy link
Member

awennersteen commented Jul 3, 2024

In Henriques example with open batches what if we do:

backend = QPUBackend(seq, connection)
# opening the context manager creates a batch. 
with batch as backend.start_session():
     # we can (optionally?) pass the context to run
     results = backend.run(job_params, wait=True, batch=batch)
     # some more jobs added to the same batch, and the results
     # are just the results from that single run
     results_2 = backend.run(job_params_2, wait=True, batch)

With the actual batch implementation could come in a followup, so that we can think of that interface a bit/not scope creep this mr

@HGSilveri
Copy link
Collaborator

HGSilveri commented Jul 3, 2024

In Henriques example with open batches what if we do:

*Matthieu's example, I don't want to take away his credit 🙂

backend = QPUBackend(seq, connection)
# opening the context manager creates a batch. 
with batch as backend.start_session():
     # we can (optionally?) pass the context to run
     results = backend.run(job_params, wait=True, batch=batch)
     # some more jobs added to the same batch, and the results
     # are just the results from that single run
     results_2 = backend.run(job_params_2, wait=True, batch)

With the actual batch implementation could come in a followup, so that we can think of that interface a bit/not scope creep this mr

I'm not convinced with exposing a batch object here, I feel like it strays away from the backend interface philosophy and it defeats the purpose of trying to simplify things for the more "junior" Pulser users. Let's remember that pasqal-cloud is not going away, so the option to work with batches will remain available for whoever needs them.

For the purposes of this PR (ie just allowing an open batch/submission), I feel like we can get away with the backend storing internally the open batch ID and just adding jobs to it on every run() call that happens within the context (instead of starting a new batch).

@oliver-gordon
Copy link
Collaborator Author

I suppose there are some assumptions that the runtime doesn't end for us to retain an ID as a class property? What if a user is running incredibly short lived scripts? or executing something again to add another job after already creating and submitting a batch? We would be missing the submission_id property

I can't speak for the regular usecases of pulser-pasqal users but, this one seems rather plausible?

@HGSilveri
Copy link
Collaborator

HGSilveri commented Jul 3, 2024

I suppose there are some assumptions that the runtime doesn't end for us to retain an ID as a class property? What if a user is running incredibly short lived scripts? or executing something again to add another job after already creating and submitting a batch? We would be missing the submission_id property

I can't speak for the regular usecases of pulser-pasqal users but, this one seems rather plausible?

I would go back to the argument that using pasqal-cloud directly is still an option. This implementation provides a restricted access to the feature and that's okay.
With pulser-pasqal, if a user wants to have use an open batch they have to stay within the context and the moment the exit it, the batch is closed. In exchange for forcing the user to work within these bounds, I think this approach provides a simpler interface (eg not having to deal with batch IDs) and it is foolproof (eg. the user doesn't have to know/remember to close the batch). In my opinion, this is a fair tradeoff.

@oliver-gordon
Copy link
Collaborator Author

I can live with this. I'll push my change to the draft today to see if we're aligned on how it will look

@MatthieuMoreau0
Copy link
Collaborator

MatthieuMoreau0 commented Jul 3, 2024

I suppose there are some assumptions that the runtime doesn't end for us to retain an ID as a class property? What if a user is running incredibly short lived scripts? or executing something again to add another job after already creating and submitting a batch? We would be missing the submission_id property

To cover this the start_session function could accept an optional argument batch_id. If set, it would not open a new batch but simply store that batch_id as an attribute to be reused when calling the run function to add other jobs to that existing batch

I think we can start without that optional argument in this MR and wait for user feedback.

@oliver-gordon oliver-gordon force-pushed the og/open-submissions branch 2 times, most recently from 8a79168 to 3f3a07c Compare July 9, 2024 09:54
Copy link
Collaborator

@MatthieuMoreau0 MatthieuMoreau0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I would like to reiterate the question about naming things submission vs batch especially now that these are public attributes/methods using these names. imo calling things batch makes the most sense.

tests/test_backend.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@HGSilveri HGSilveri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this still needs some work.
Apart from the comments I just left, I would suggest that the open submission functionality is moved to RemoteBackend instead of being defined just in QPUBackend. Also, the ability to use open submissions should depend on whether the RemoteConnection supports it

pulser-core/pulser/backend/qpu.py Outdated Show resolved Hide resolved
pulser-core/pulser/backend/qpu.py Outdated Show resolved Hide resolved
pulser-core/pulser/backend/qpu.py Outdated Show resolved Hide resolved
pulser-core/pulser/backend/qpu.py Outdated Show resolved Hide resolved
pulser-core/pulser/backend/qpu.py Outdated Show resolved Hide resolved
pulser-core/pulser/backend/remote.py Outdated Show resolved Hide resolved
pulser-pasqal/pulser_pasqal/pasqal_cloud.py Outdated Show resolved Hide resolved
oliver.gordon added 2 commits July 31, 2024 14:41
@HGSilveri HGSilveri added this to the v0.20 Release milestone Sep 11, 2024
@HGSilveri
Copy link
Collaborator

HGSilveri commented Sep 17, 2024

Still needed:

@HGSilveri HGSilveri changed the title Add open submissions to pulser-pasqal Add open batches to pulser-pasqal Sep 17, 2024
@HGSilveri
Copy link
Collaborator

@oliver-gordon @MatthieuMoreau0 I managed to replace Submission -> Batch in a backwards compatible way everywhere except in RemoteResults (basically, the notion of "submission" is contained in the RemoteResults class). There are still links between submission and batch in RemoteResults though (eg the RemoteResults.batch_id property).
I had to replicate SubmissionStatus as BatchStatus and keep both because there are users that might have scripts where they call RemoteResults.get_status() and they expect a SubmissionStatus instance as the return type.
Is this acceptable for you?

@HGSilveri HGSilveri marked this pull request as ready for review September 19, 2024 11:18
Copy link
Collaborator

@MatthieuMoreau0 MatthieuMoreau0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions to deprecate "submission" but otherwise lgtm

pulser-core/pulser/backend/remote.py Show resolved Hide resolved
pulser-core/pulser/backend/remote.py Show resolved Hide resolved
pulser-core/pulser/backend/remote.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@MatthieuMoreau0 MatthieuMoreau0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@HGSilveri HGSilveri merged commit c12306a into develop Sep 20, 2024
9 checks passed
@HGSilveri HGSilveri mentioned this pull request Sep 20, 2024
HGSilveri added a commit that referenced this pull request Sep 20, 2024
**Main changes:**
- Reworking the NoiseModel interface (#710)
- Allow modification of the EOM setpoint without disabling EOM mode (#708)
- Enable definition of effective noise operators in all basis (#716)
- Add leakage (#720)
- Support differentiability through Torch tensors (#703)
- Add from_abstract_repr to Device and VirtualDevice (#727) 
- [FEAT] Handle batches with partial results (#707)
- Add open batches to pulser-pasqal (#701)
@HGSilveri HGSilveri deleted the og/open-submissions branch November 12, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants