Fix hash of `Parameter` and `ParameterExpression` #10875

jakelishman · 2023-09-21T12:27:07Z

Summary

This fixes the construction paths for Parameter and its hash such that it will now correctly hash equal to any ParameterExpressions that it compares equal to. This is a requirement of the Python data model for hashmaps and hashsets, which previously we were breaking. In order to achieve this, we slightly modify the hash key such that the Parameter instances are no longer a part of the hash of ParameterExpression, which means we can use the same hashing strategy for both.

This rearrangement has the benefit of removing the requirement for the __new__ overrides on Parameter and ParameterVectorElement, ~~and the pickle overrides for ParameterVectorElement~~. edit: Neko caught a case where the vector and index of a ParameterVectorElement weren't restored after unpickling in the first iteration of this PR, which our internal test suite didn't catch. I've restored the special handling to the pickle there, and added a regression test.

Details and comments

With _hash still cached in Parameter, I didn't see any meaningful changes to performance in the benchmarks I ran locally (removing the cache of it does cause a noticeable slowdown in QuantumCircuit.assign_parameters for circuits with many parameters, though). I saw a 4% improvement in the (mostly meaningless) microbenchmark

from qiskit.circuit import Parameter
it = [None] * 100_000
%timeit [Parameter("a") for _ in it]

most likely due to no longer having the __new__ call or the super().__init__ resolution in the construction path, but that doesn't really mean anything. ParameterVector construction saw approximately the same improvement (maybe slightly smaller).

Fix #9299

qiskit-bot · 2023-09-21T12:27:11Z

One or more of the the following people are requested to review this:

@Qiskit/terra-core
@mtreinish
@nkanazawa1989

This fixes the construction paths for `Parameter` and its hash such that it will now correctly hash equal to any `ParameterExpression`s that it compares equal to. This is a requirement of the Python data model for hashmaps and hashsets, which previously we were breaking. In order to achieve this, we slightly modify the hash key such that the `Parameter` instances are no longer a part of the hash of `ParameterExpression`, which means we can use the same hashing strategy for both. This rearrangement has the benefit of removing the requirement for the `__new__` overrides on `Parameter` and `ParameterVectorElement`.

qiskit/circuit/parameter.py

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

qiskit/circuit/parameter.py

kevinsung · 2023-09-21T17:03:04Z

qiskit/circuit/parameter.py

+                same name do not compare equal to help catch shadowing bugs when two circuits
+                containing the same named parameters are spurious combined.  Setting the ``uuid``
+                field when creating two parameters to the same thing (along with the same name)
+                allows them to be equal.  This is useful during serialization and deserialization.
        """
        self._name = name


Might be unrelated to this PR, but why is the _name attribute needed? Isn't that information already included in _symbol_expr?

Off the top of my head, I think you're completely right and there's no real problems with dropping this field to reduce a pointer's worth of memory usage per parameter, and slightly reduce the hashing work we need to do.

Because there's a (minor) risk that we break something by inferring name from the sympy/symegine symbol rather than storing it ourselves, perhaps let's try that in a follow-up PR so we can isolate it if it does happen to cause bugs without losing this bugfix.

Ok I created #10880

wshanks

This is nice. It makes Parameter feel a little more normal to me, though I still find the way it subclasses ParameterExpression but is needed to create a ParameterExpression weird.

qiskit/circuit/parameter.py

qiskit/pulse/schedule.py

wshanks · 2023-09-21T18:19:33Z

qiskit/circuit/parameter.py

        return self._hash

+    # We have to manually control the pickling so that the hash is computable before the unpickling


It took me a while to understand why the unpickling was putting this parameter into a hashmap. The issue is the self._parameter_symbols = {self: symbol} line in __init__. I am not sure how pickle avoids an infinite loop there when it pickles the attributes and slots of an object, but it must do something clever. When unpickling, it wants to recreate this dictionary but self is not fully initialized with a _hash when it does.

One alternative approach could be to override _parameter_symbols with a property that returns {self: self._symbol_expr} so that self is not in the data of the class instance. I think that would almost be enough to avoid __getstate__ and __setstate__. However, I think we might still want to override __getstate__ to null out _hash and have __hash__ recompute it if it is None. The reason is to avoid pickling in one Python process and unpickling in another with a different hash seed and not having the restored _hash match what would be calculated for a fresh hash() call.

Self-referential objects or just reference cycles in general aren't all that uncommon in Python - they're handled by all the standard library serialisers etc. Even printing works on standard objects:

>>> a = a[0] = [[]] >>> a [[...]]

(where that makes a a list that contains itself). I think a pretty large number of reference cycles can be handled in pickle serialisation just by it inserting references to the various objects before it initialises each of them, and keeps a memo list of things it's still got to restore state to to avoid looping. In the cases where there's a particular partial order between an object's attributes, that's when you need __setstate__.

We could potentially make _parameter_symbols a property, but as you say, we'd still need to recompute the cached _hash on pickle restoration, so I'm not certain there's a huge reason to add the property - if nothing else, adding it as a descriptor means it needs to be recalculated each time it's accessed, and there's an overhead just from having the indirection.

Perhaps self-reference is not uncommon, but it has the potential to result in code that is hard to understand, and I think that has become the case here. Can we break the cycle by making Parameter no longer inherit from ParameterExpression?

It's obviously subjective, but personally I thought that this all interacted without much trouble with Python; it was just broken before because I think people weren't aware that there were data-model invariants they were required to uphold. I think it maybe feels more complex because the previous implementation was broken; this PR includes a few additional comments to highlight the invariants.

It's a little unusual to have a concrete class depend on a concrete subclass of itself (but common to have an abstract class depend on a concrete subclass), but it wouldn't be super easy to walk that back now; there's a lot of extant code that depends on being able to retrieve the .parameters as a set from an expression.

(edit: also, I was saying that "self-referential objects aren't all that uncommon" just in response to Will's surprise that pickle handles them (generally) well, more than saying it in support of the architecture.)

releasenotes/notes/fix-parameter-hash-d22c270090ffc80e.yaml

Cryoris

LGTM, thanks for adding all the comments in the code 😉

jakelishman added Changelog: New Feature Include in the "Added" section of the changelog Changelog: Bugfix Include in the "Fixed" section of the changelog labels Sep 21, 2023

jakelishman added this to the 0.45.0 milestone Sep 21, 2023

jakelishman requested a review from kevinsung September 21, 2023 12:27

jakelishman requested review from eggerdj, wshanks and a team as code owners September 21, 2023 12:27

jakelishman force-pushed the fix-parameter-hash branch from 844c1eb to 4913f11 Compare September 21, 2023 12:40

mtreinish reviewed Sep 21, 2023

View reviewed changes

qiskit/circuit/parameter.py Outdated Show resolved Hide resolved

Add type hint

e6acceb

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

kevinsung reviewed Sep 21, 2023

View reviewed changes

qiskit/circuit/parameter.py Show resolved Hide resolved

kevinsung reviewed Sep 21, 2023

View reviewed changes

wshanks reviewed Sep 21, 2023

View reviewed changes

kevinsung mentioned this pull request Sep 21, 2023

Remove _name attribute of Parameter #10880

Closed

Tweak documentation wording

44711b4

kevinsung approved these changes Sep 25, 2023

View reviewed changes

wshanks approved these changes Sep 27, 2023

View reviewed changes

Cryoris approved these changes Oct 10, 2023

View reviewed changes

Cryoris added this pull request to the merge queue Oct 10, 2023

Merged via the queue into Qiskit:main with commit 8651d34 Oct 10, 2023
13 checks passed

jakelishman deleted the fix-parameter-hash branch October 10, 2023 18:43

kt474 mentioned this pull request Oct 20, 2023

Update qpy qiskit 0.45 Qiskit/qiskit-ibm-provider#736

Merged

wshanks mentioned this pull request Jan 27, 2024

Parameter instances can compare equal while having different hash values #11654

Closed

wshanks mentioned this pull request Apr 5, 2024

Add hash function for AffineScalarFunc class lmfit/uncertainties#189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hash of `Parameter` and `ParameterExpression` #10875

Fix hash of `Parameter` and `ParameterExpression` #10875

jakelishman commented Sep 21, 2023 •

edited

Loading

qiskit-bot commented Sep 21, 2023

kevinsung Sep 21, 2023

jakelishman Sep 21, 2023

kevinsung Sep 21, 2023

wshanks left a comment

wshanks Sep 21, 2023

jakelishman Sep 22, 2023 •

edited

Loading

kevinsung Sep 22, 2023

jakelishman Sep 25, 2023 •

edited

Loading

Cryoris left a comment

		return self._hash

		# We have to manually control the pickling so that the hash is computable before the unpickling

Fix hash of Parameter and ParameterExpression #10875

Fix hash of Parameter and ParameterExpression #10875

Conversation

jakelishman commented Sep 21, 2023 • edited Loading

Summary

Details and comments

qiskit-bot commented Sep 21, 2023

kevinsung Sep 21, 2023

Choose a reason for hiding this comment

jakelishman Sep 21, 2023

Choose a reason for hiding this comment

kevinsung Sep 21, 2023

Choose a reason for hiding this comment

wshanks left a comment

Choose a reason for hiding this comment

wshanks Sep 21, 2023

Choose a reason for hiding this comment

jakelishman Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

kevinsung Sep 22, 2023

Choose a reason for hiding this comment

jakelishman Sep 25, 2023 • edited Loading

Choose a reason for hiding this comment

Cryoris left a comment

Choose a reason for hiding this comment

Fix hash of `Parameter` and `ParameterExpression` #10875

Fix hash of `Parameter` and `ParameterExpression` #10875

jakelishman commented Sep 21, 2023 •

edited

Loading

jakelishman Sep 22, 2023 •

edited

Loading

jakelishman Sep 25, 2023 •

edited

Loading