-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy pulse qobj loading for large backend #8885
Lazy pulse qobj loading for large backend #8885
Conversation
Thank you for opening a new pull request. Before your PR can be merged it will first need to pass continuous integration tests and be reviewed. Sometimes the review process can be slow, so please be patient. While you're waiting, please feel free to review other open PRs. While only a subset of people are authorized to approve pull requests for merging, everyone is encouraged to review open pull requests. Doing reviews helps reduce the burden on the core team and helps make the project's code better for everyone. One or more of the the following people are requested to review this:
|
While I cannot comment on the code right now. I can speak that as a user this is hugely appreciated 🎉 |
All the IBM provider (except for the one with the q) backends will have this same problem too since they all pull in the pulse schedules from defaults for |
Pull Request Test Coverage Report for Build 3908064617
💛 - Coveralls |
Do they have own mechanism for Target construction or just using some code in terra? If they rely on terra code I can update InstructionProperties in follow-up/in this PR. |
It's a copy and paste of the same basic code that's in the |
Added new commit 2690308 to support lazy conversion for V2. This commit allows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a big fan of this, it'll fix most of the overhead for circuits that aren't using pulse and avoid the need to parse the entire payload which can be super expensive. Thanks for doing this! Looking through the code as a first pass nothing major stands out to me, but I'll do a more through pass after #8839 merges and this is rebased.
The one thing I think will be good to have validation on is compilation across a multiprocessing boundary works as expected. I don't see anything super concerning on that front, but since we're moving towards lazy loading the defaults having tests that will do that across a process boundary seems like a good thing to add. We have this class:
If you adds some tests which require using the pulse calibrations from the defaults file for something in transpile()
with multiple circuits along with PassManager.run()
with multiple circuits too they have different parallel dispatch methods (although the current transpile()
one will go away in 0.25.0 and just use PassManager.run()
internally).
a46ba04
to
bd3b0ec
Compare
…future deprecation of instruction schedule map.
Thanks Matthew. I rebased the branch and added a new test file for calibration entry classes and updated the transpile test with the lazy loading test case. Because we don't have any preset pass that uses the backend calibration (we have RZX calibration builder but this is not in the preset passes), I wrote a fake pass for testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM, I like the new structure and the definition of the interface for a calibration entry in a instruction schedule map is a good addition. Thanks for adding the additional test coverage with the parallel context, I didn't expect any issue with it, but think it's good to have.
I had a question inline and a spot where it looks like an unrelated change (not blocking). The only thing missing is maybe some release notes. We're changing a bit of the InstructionScheduleMap
interface particularly around types supported. You might also want a feature note for the lazy loading, because the behavior will be a bit different (although equivalent) and provide a nice speedup which would good to talk about in the release documentation (also maybe talking about the new calibration entry interface, although that might not really be intended to be public, so we can skip it).
|
||
del self._map[instruction][qubits] | ||
if not self._map[instruction]: | ||
self._map.pop(instruction) | ||
del self._map[instruction] | ||
|
||
self._qubit_instructions[qubits].remove(instruction) | ||
if not self._qubit_instructions[qubits]: | ||
self._qubit_instructions.pop(qubits) | ||
del self._qubit_instructions[qubits] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bit of an unrelated change, it's better to keep the diff limited to the logical change of the PR. The logic looks the same to me so it's not anything worth blocking over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah strictly speaking this is unrelated to the main purpose but I realized this logic is slower according to this stackoverflow. So I made this change to improve overall performance of inst map.
41217fd
to
880a252
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fast update.
Oops before we tag as atuomerge did you want to add a release note? |
Thanks for catching the missing release note. I had forgotten to write the note. Now it's added. |
* Update instmap to support lazy qobj conversion * Avoid deepcopy and mutating the source dict * Support V2 lazy load * Update Gate.from_dict * Move CalibrationEntry and CalibrationPublisher to dedicated file for future deprecation of instruction schedule map. * revert typehint change * add test for cal entries * add test for parallel transpile and fix converter * add API for lazy get * add release note
* Update instmap to support lazy qobj conversion * Avoid deepcopy and mutating the source dict * Support V2 lazy load * Update Gate.from_dict * Move CalibrationEntry and CalibrationPublisher to dedicated file for future deprecation of instruction schedule map. * revert typehint change * add test for cal entries * add test for parallel transpile and fix converter * add API for lazy get * add release note
* Update instmap to support lazy qobj conversion * Avoid deepcopy and mutating the source dict * Support V2 lazy load * Update Gate.from_dict * Move CalibrationEntry and CalibrationPublisher to dedicated file for future deprecation of instruction schedule map. * revert typehint change * add test for cal entries * add test for parallel transpile and fix converter * add API for lazy get * add release note
* Update instmap to support lazy qobj conversion * Avoid deepcopy and mutating the source dict * Support V2 lazy load * Update Gate.from_dict * Move CalibrationEntry and CalibrationPublisher to dedicated file for future deprecation of instruction schedule map. * revert typehint change * add test for cal entries * add test for parallel transpile and fix converter * add API for lazy get * add release note
Summary
Loading backend
PulseDefaults
with 100+ qubit is significantly slow because it immediately converts all pulse qobjs intoSchedule
s when it is called for the first time. This induces significant performance regression in transpile because all preset passes require the default object, i.e.InstructionScheduleMap
, to invoke pulse gate pass for user calibration.This PR allows us to convert it as-needed basis for speed up.
Fix #7914
Blocked by #8839
Details and comments
This PR allows
InstructionScheduleMap
to take unconverted pulse qobj, and invoke the conversion logic when theInstructionScheduleMap.get
method is called for a particular calibration.This is mainly done by the commit 64a3712
According to the profile, I also removed the use of deep copy (this is added because Qobj data is a nested dict and to avoid mutating it) in the commit 0e1b5f2 (roughly 3x speedup)
This is output of airspeed velocity test:
As you can see we achieved roughly x50 speedup with these two commits. Note that scheduling test suffers the performance regression as a side effect of the lazy conversion (because it internally
.get
schedule). I guess the same problem will occur in the V2 converter #8759, but the converter should be able to avoid the performance regression by implementing the similar lazy conversion mechanism. Regarding circuit scheduler, in principle, we should improve the mechanism to buildSchedule
, as proposed in #8029.