Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hash of Parameter and ParameterExpression #10875

Merged
merged 3 commits into from
Oct 10, 2023

Conversation

jakelishman
Copy link
Member

@jakelishman jakelishman commented Sep 21, 2023

Summary

This fixes the construction paths for Parameter and its hash such that it will now correctly hash equal to any ParameterExpressions that it compares equal to. This is a requirement of the Python data model for hashmaps and hashsets, which previously we were breaking. In order to achieve this, we slightly modify the hash key such that the Parameter instances are no longer a part of the hash of ParameterExpression, which means we can use the same hashing strategy for both.

This rearrangement has the benefit of removing the requirement for the __new__ overrides on Parameter and ParameterVectorElement, and the pickle overrides for ParameterVectorElement. edit: Neko caught a case where the vector and index of a ParameterVectorElement weren't restored after unpickling in the first iteration of this PR, which our internal test suite didn't catch. I've restored the special handling to the pickle there, and added a regression test.

Details and comments

With _hash still cached in Parameter, I didn't see any meaningful changes to performance in the benchmarks I ran locally (removing the cache of it does cause a noticeable slowdown in QuantumCircuit.assign_parameters for circuits with many parameters, though). I saw a 4% improvement in the (mostly meaningless) microbenchmark

from qiskit.circuit import Parameter
it = [None] * 100_000
%timeit [Parameter("a") for _ in it]

most likely due to no longer having the __new__ call or the super().__init__ resolution in the construction path, but that doesn't really mean anything. ParameterVector construction saw approximately the same improvement (maybe slightly smaller).

Fix #9299

@jakelishman jakelishman added Changelog: New Feature Include in the "Added" section of the changelog Changelog: Bugfix Include in the "Fixed" section of the changelog labels Sep 21, 2023
@jakelishman jakelishman added this to the 0.45.0 milestone Sep 21, 2023
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core
  • @mtreinish
  • @nkanazawa1989

This fixes the construction paths for `Parameter` and its hash such that
it will now correctly hash equal to any `ParameterExpression`s that it
compares equal to.  This is a requirement of the Python data model for
hashmaps and hashsets, which previously we were breaking.  In order to
achieve this, we slightly modify the hash key such that the `Parameter`
instances are no longer a part of the hash of `ParameterExpression`,
which means we can use the same hashing strategy for both.

This rearrangement has the benefit of removing the requirement for the
`__new__` overrides on `Parameter` and `ParameterVectorElement`.
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
same name do not compare equal to help catch shadowing bugs when two circuits
containing the same named parameters are spurious combined. Setting the ``uuid``
field when creating two parameters to the same thing (along with the same name)
allows them to be equal. This is useful during serialization and deserialization.
"""
self._name = name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be unrelated to this PR, but why is the _name attribute needed? Isn't that information already included in _symbol_expr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of my head, I think you're completely right and there's no real problems with dropping this field to reduce a pointer's worth of memory usage per parameter, and slightly reduce the hashing work we need to do.

Because there's a (minor) risk that we break something by inferring name from the sympy/symegine symbol rather than storing it ourselves, perhaps let's try that in a follow-up PR so we can isolate it if it does happen to cause bugs without losing this bugfix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I created #10880

Copy link
Contributor

@wshanks wshanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. It makes Parameter feel a little more normal to me, though I still find the way it subclasses ParameterExpression but is needed to create a ParameterExpression weird.

qiskit/circuit/parameter.py Outdated Show resolved Hide resolved
qiskit/pulse/schedule.py Show resolved Hide resolved
return self._hash

# We have to manually control the pickling so that the hash is computable before the unpickling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand why the unpickling was putting this parameter into a hashmap. The issue is the self._parameter_symbols = {self: symbol} line in __init__. I am not sure how pickle avoids an infinite loop there when it pickles the attributes and slots of an object, but it must do something clever. When unpickling, it wants to recreate this dictionary but self is not fully initialized with a _hash when it does.

One alternative approach could be to override _parameter_symbols with a property that returns {self: self._symbol_expr} so that self is not in the data of the class instance. I think that would almost be enough to avoid __getstate__ and __setstate__. However, I think we might still want to override __getstate__ to null out _hash and have __hash__ recompute it if it is None. The reason is to avoid pickling in one Python process and unpickling in another with a different hash seed and not having the restored _hash match what would be calculated for a fresh hash() call.

Copy link
Member Author

@jakelishman jakelishman Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-referential objects or just reference cycles in general aren't all that uncommon in Python - they're handled by all the standard library serialisers etc. Even printing works on standard objects:

>>> a = a[0] = [[]]
>>> a
[[...]]

(where that makes a a list that contains itself). I think a pretty large number of reference cycles can be handled in pickle serialisation just by it inserting references to the various objects before it initialises each of them, and keeps a memo list of things it's still got to restore state to to avoid looping. In the cases where there's a particular partial order between an object's attributes, that's when you need __setstate__.

We could potentially make _parameter_symbols a property, but as you say, we'd still need to recompute the cached _hash on pickle restoration, so I'm not certain there's a huge reason to add the property - if nothing else, adding it as a descriptor means it needs to be recalculated each time it's accessed, and there's an overhead just from having the indirection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps self-reference is not uncommon, but it has the potential to result in code that is hard to understand, and I think that has become the case here. Can we break the cycle by making Parameter no longer inherit from ParameterExpression?

Copy link
Member Author

@jakelishman jakelishman Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's obviously subjective, but personally I thought that this all interacted without much trouble with Python; it was just broken before because I think people weren't aware that there were data-model invariants they were required to uphold. I think it maybe feels more complex because the previous implementation was broken; this PR includes a few additional comments to highlight the invariants.

It's a little unusual to have a concrete class depend on a concrete subclass of itself (but common to have an abstract class depend on a concrete subclass), but it wouldn't be super easy to walk that back now; there's a lot of extant code that depends on being able to retrieve the .parameters as a set from an expression.

(edit: also, I was saying that "self-referential objects aren't all that uncommon" just in response to Will's surprise that pickle handles them (generally) well, more than saying it in support of the architecture.)

Copy link
Contributor

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding all the comments in the code 😉

@Cryoris Cryoris added this pull request to the merge queue Oct 10, 2023
Merged via the queue into Qiskit:main with commit 8651d34 Oct 10, 2023
13 checks passed
@jakelishman jakelishman deleted the fix-parameter-hash branch October 10, 2023 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: Bugfix Include in the "Fixed" section of the changelog Changelog: New Feature Include in the "Added" section of the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parameter has inconsistent hash
6 participants