Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove _legacy_format_cache slot from CircuitInstruction #10417

Merged

Conversation

jakelishman
Copy link
Member

Summary

This was originally added as a mitigation for performance loss in consumers of QuantumCircuit.data that were still using the legacy interface. It is now becoming more critical to minimise memory usage in the QuantumCircuit object, so the extra slot is a luxury we can no longer afford.

This represents a 12.5% reduction in the inherent memory footprint of CircuitInstruction, though in practice the impact will be smaller, since the operation and qubits fields will near-universally have associated weight themselves (the clbits field is usually the empty tuple, with is a natural singleton untracked by the GC in CPython). Other ongoing work on making more objects singletons will reduce those weights, however.

Details and comments

100% of the performance impact of this change within Terra should be mitigated by #10416, but equally, those places are unlikely to be too highly performance critical already.

The memory-usage improvement from this PR will have more relative impact when measured along with #10314 - the absolute improvement in memory usage won't change, but making more operations singletons will drastically improve the relative effects.

This was originally added as a mitigation for performance loss in
consumers of `QuantumCircuit.data` that were still using the legacy
interface.  It is now becoming more critical to minimise memory usage in
the `QuantumCircuit` object, so the extra slot is a luxury we can no
longer afford.

This represents a 12.5% reduction in the inherent memory footprint of
`CircuitInstruction`, though in practice the impact will be smaller,
since the `operation` and `qubits` fields will near-universally have
associated weight themselves (the `clbits` field is _usually_ the empty
tuple, with is a natural singleton untracked by the GC in CPython).
Other ongoing work on making more objects singletons will reduce those
weights, however.
@jakelishman jakelishman added type: qa Issues and PRs that relate to testing and code quality Changelog: None Do not include in changelog labels Jul 12, 2023
@jakelishman jakelishman requested a review from a team as a code owner July 12, 2023 14:40
@coveralls
Copy link

Pull Request Test Coverage Report for Build 5533006024

  • 4 of 5 (80.0%) changed or added relevant lines in 1 file are covered.
  • 17 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.02%) to 85.974%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/circuit/quantumcircuitdata.py 4 5 80.0%
Files with Coverage Reduction New Missed Lines %
crates/qasm2/src/lex.rs 5 90.38%
crates/qasm2/src/parse.rs 12 95.71%
Totals Coverage Status
Change from base Build 5528323034: -0.02%
Covered Lines: 71730
Relevant Lines: 83432

💛 - Coveralls

@Cryoris
Copy link
Contributor

Cryoris commented Jul 12, 2023

I ran a small test on PEC-style circuits including the singleton PR and it reduced the memory requirements from 2.5GB to 1.7GB, so a 32% reduction 🙂

Copy link
Contributor

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it might be good to keep an eye on the asv benchmarks to see if this adds any unexpected overhead though 🙂

@jakelishman
Copy link
Member Author

jakelishman commented Jul 12, 2023

If it's a 32% reduction, there's something going wrong with the code that we're benchmarking. It can't be that large unless something in that benchmarking code is causing the legacy cache to get populated, which means that this PR would cause a runtime issue for that code.

edit: or the benchmark is not measuring the memory accurately, and this PR is somehow getting credit for a reclamation from the GC that the other form of the code should also be able to attain as well.

@jakelishman
Copy link
Member Author

I should say, I still think that this is good to merge, but the exact memory profiling is confusing me, and I think it's strongly indicative that however we're measuring the memory usage of a QuantumCircuit isn't producing a tight bound. It could be slack overallocations from pymalloc, or a whole bunch of stuff, but fundamentally, the most this PR should be able to reduce the size of the sum of all Python objects in a circuit (assuming no population of the legacy cache) is by 12.5%.

@jakelishman jakelishman added this pull request to the merge queue Jul 19, 2023
Merged via the queue into Qiskit:main with commit a804df5 Jul 19, 2023
@jakelishman jakelishman deleted the remove-legacy-slot-circuitinstruction branch July 19, 2023 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance type: qa Issues and PRs that relate to testing and code quality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants