Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom metadata serializers to QPY load() and dump() #7550

Merged
merged 8 commits into from
Mar 11, 2022

Conversation

mtreinish
Copy link
Member

Summary

This commit adds a new argument to load(), metadata_serializer, and
dump(), metadata_deserializer, which is used to specify custom
serialization functions to use for serializing the python dictionary for
each circuit's metadata attribute. By default QPY expects the serialized
metadata payload to be encoded as a utf8 json string. But since there
are no user constraints on the metadata dictionary's contents a simple
json.dump() of the circuit's metadata attribute may not be feasible. To
address this issue these new arguments are added to enable users to
specify a custom function to encode the dictionary as binary data
however is required for their circuit's metadata and then read that
custom format as required when deserializing.

The QPY format specification does not change here and still explicitly
lists a UTF8 encoded JSON string as the contents. A fixed format is
needed in the format specification for potential interoperability for
other tools generating QPY data (none exist currently to my knowledge
but nothing precludes it which is why the format is publicly
documented). These new arguments are specific to only the qiskit
implementation and should only be used when generating qpy files and
loading them via qiskit.

Details and comments

This commit adds a new argument to load(), metadata_serializer, and
dump(), metadata_deserializer, which is used to specify custom
serialization functions to use for serializing the python dictionary for
each circuit's metadata attribute. By default QPY expects the serialized
metadata payload to be encoded as a utf8 json string. But since there
are no user constraints on the metadata dictionary's contents a simple
json.dump() of the circuit's metadata attribute may not be feasible. To
address this issue these new arguments are added to enable users to
specify a custom function to encode the dictionary as binary data
however is required for their circuit's metadata and then read that
custom format as required when deserializing.

The QPY format specification does not change here and still explicitly
lists a UTF8 encoded JSON string as the contents. A fixed format is
needed in the format specification for potential interoperability for
other tools generating QPY data (none exist currently to my knowledge
but nothing precludes it which is why the format is publicly
documented). These new arguments are specific to only the qiskit
implementation and should only be used when generating qpy files and
loading them via qiskit.
@mtreinish mtreinish requested a review from a team as a code owner January 20, 2022 17:34
@mtreinish mtreinish added Changelog: New Feature Include in the "Added" section of the changelog mod: qpy Related to QPY serialization labels Jan 20, 2022
@coveralls
Copy link

coveralls commented Jan 20, 2022

Pull Request Test Coverage Report for Build 1969081540

  • 11 of 13 (84.62%) changed or added relevant lines in 2 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.005%) to 83.456%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/qpy/binary_io/circuits.py 7 9 77.78%
Files with Coverage Reduction New Missed Lines %
qiskit/pulse/library/waveform.py 3 89.36%
Totals Coverage Status
Change from base Build 1965911954: -0.005%
Covered Lines: 52391
Relevant Lines: 62777

💛 - Coveralls

Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a raw implementation I don't see any problems, but I'm a bit concerned about breaking the binary format guarantees. At the very least, the module documentation needs updating to mention that the METADATA field might not be utf8-encoded JSON data, but given that it's really an arbitrary injected dependency, I wonder if it's best to put a new field into the metadata header in a new QPY format, just to indicate whether it needs a custom dependency. I'm not 100% sold on that, because it feels like it couldn't fully specify the data anyway if the serialiser and deserialiser are speaking valid JSON, but with special rules for custom objects, but I'm most concerned with the case that the data isn't valid JSON at all.

If we don't add a header field (and even if we do), it might be good to try/catch the deserialisation in order to provide a better error message if we can't read the data (e.g. "did you remember to pass your custom deserialiser?")

:func:`.qpy_serialization.dump` function for specifying a custom
serialization function for use when serializing the
:attr`.QuantumCircuit.metadata` attribute. By default the
:func:`~qiskit.circuit.qpy_serialization.dump` function will attempt to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for laziness: you can do both ~ and ., so :func:`~.qpy_serialization.dump` does the lookup with the . semantics, and displays it using the ~ truncation semantics (so it just shows dump). No change needed here, though.

Comment on lines 11 to 12
dictionary even those with contents not JSON serializable by the default
encoder will lead to circuits that can't be serialized. The new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dictionary even those with contents not JSON serializable by the default
encoder will lead to circuits that can't be serialized. The new
dictionary, even those with contents not JSON serializable by the default
encoder, will lead to circuits that can't be serialized. The new

Comment on lines 21 to 23
- |
Added a new kwarg, ``metadata_deserializer`` to the
:func:`.qpy_serialization.load` function for specifying a custom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little repetitive to describe the changes to dump and load separately.

@mtreinish
Copy link
Member Author

As a raw implementation I don't see any problems, but I'm a bit concerned about breaking the binary format guarantees. At the very least, the module documentation needs updating to mention that the METADATA field might not be utf8-encoded JSON data, but given that it's really an arbitrary injected dependency, I wonder if it's best to put a new field into the metadata header in a new QPY format, just to indicate whether it needs a custom dependency. I'm not 100% sold on that, because it feels like it couldn't fully specify the data anyway if the serialiser and deserialiser are speaking valid JSON, but with special rules for custom objects, but I'm most concerned with the case that the data isn't valid JSON at all.

Yeah, I was worried about that. I figured it would be good to just support that as an out of specification feature and if a user returns non-utf8 json bytes (like I did in the unittest) it's on them as being out of spec. That being said the other idea I had was to change the argument type to be an JSONDecoder and JSONEncoder subclass instead and then change the call to just set that on json.dump() and json.load() that way it will always be json.

@jakelishman
Copy link
Member

Out-of-spec features have a tendency to morph into features that we actually have to support. For safety's sake, I think I prefer requiring the data to be JSON with your idea of passing a JSONEncoder/JSONDecoder instance - it's easier for us to relax the requirement in the future than to tighten it.

This commit pivots to leveraging custom JSONEncoder and JSONDecoder
classes for the optional serializer and deserializer arguments instead
of custom callables that return bytes. In the previous commit we used
callables that would be passed the dictionary and returned bytes.
However, while this was flexible it was pretty easy for a user to create
QPY out of spec. While this was documented it is probably easier to just
constrain the argument to always producing valid json. Using custom
encoder and decoder classes let users still handle custom types but also
restrict things to always be JSON. This should fix the potential risk in
the previous iteration but still fix the underlying issue.
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks stricter and safer now - I guess it promotes slightly less efficient serialisation, but being stricter about what's allowed in the format feels more reliable, since nothing will crash if you try and load an unknown QPY file now.

Just one comment about the release note.

@mtreinish mtreinish self-assigned this Mar 4, 2022
@mtreinish mtreinish added this to the 0.20 milestone Mar 4, 2022
jakelishman
jakelishman previously approved these changes Mar 9, 2022
Copy link
Member

@jakelishman jakelishman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, one minor comment aside.

Comment on lines 47 to 55
def default(self, o): # pylint: disable=invalid-name
if isinstance(o, CustomObject):
return {"__type__": "Custom", "value": o.string}
return json.JSONEncoder.default(self, o)

class CustomDeserializer(json.JSONDecoder):
"""Custom json decoder to handle CustomObject."""

def object_hook(self, o): # pylint: disable=invalid-name,method-hidden
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the pylint markers in the release note? I didn't think we'd be checking those blocks. If not, possibly best to get rid of them to make things cleaner for readers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch, no we don't. I just copied this from the unittests verbatim Let's remove them

Co-authored-by: Jake Lishman <jake@binhbar.com>
@mergify mergify bot merged commit 56c2b86 into Qiskit:main Mar 11, 2022
@mtreinish mtreinish deleted the custom-qpy-serializers branch April 4, 2022 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: New Feature Include in the "Added" section of the changelog mod: qpy Related to QPY serialization priority: medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants