-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for custom metadata serializers to QPY load() and dump() #7550
Conversation
This commit adds a new argument to load(), metadata_serializer, and dump(), metadata_deserializer, which is used to specify custom serialization functions to use for serializing the python dictionary for each circuit's metadata attribute. By default QPY expects the serialized metadata payload to be encoded as a utf8 json string. But since there are no user constraints on the metadata dictionary's contents a simple json.dump() of the circuit's metadata attribute may not be feasible. To address this issue these new arguments are added to enable users to specify a custom function to encode the dictionary as binary data however is required for their circuit's metadata and then read that custom format as required when deserializing. The QPY format specification does not change here and still explicitly lists a UTF8 encoded JSON string as the contents. A fixed format is needed in the format specification for potential interoperability for other tools generating QPY data (none exist currently to my knowledge but nothing precludes it which is why the format is publicly documented). These new arguments are specific to only the qiskit implementation and should only be used when generating qpy files and loading them via qiskit.
Pull Request Test Coverage Report for Build 1969081540
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a raw implementation I don't see any problems, but I'm a bit concerned about breaking the binary format guarantees. At the very least, the module documentation needs updating to mention that the METADATA
field might not be utf8-encoded JSON data, but given that it's really an arbitrary injected dependency, I wonder if it's best to put a new field into the metadata header in a new QPY format, just to indicate whether it needs a custom dependency. I'm not 100% sold on that, because it feels like it couldn't fully specify the data anyway if the serialiser and deserialiser are speaking valid JSON, but with special rules for custom objects, but I'm most concerned with the case that the data isn't valid JSON at all.
If we don't add a header field (and even if we do), it might be good to try/catch the deserialisation in order to provide a better error message if we can't read the data (e.g. "did you remember to pass your custom deserialiser?")
:func:`.qpy_serialization.dump` function for specifying a custom | ||
serialization function for use when serializing the | ||
:attr`.QuantumCircuit.metadata` attribute. By default the | ||
:func:`~qiskit.circuit.qpy_serialization.dump` function will attempt to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for laziness: you can do both ~
and .
, so :func:`~.qpy_serialization.dump`
does the lookup with the .
semantics, and displays it using the ~
truncation semantics (so it just shows dump
). No change needed here, though.
dictionary even those with contents not JSON serializable by the default | ||
encoder will lead to circuits that can't be serialized. The new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dictionary even those with contents not JSON serializable by the default | |
encoder will lead to circuits that can't be serialized. The new | |
dictionary, even those with contents not JSON serializable by the default | |
encoder, will lead to circuits that can't be serialized. The new |
- | | ||
Added a new kwarg, ``metadata_deserializer`` to the | ||
:func:`.qpy_serialization.load` function for specifying a custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a little repetitive to describe the changes to dump
and load
separately.
Yeah, I was worried about that. I figured it would be good to just support that as an out of specification feature and if a user returns non-utf8 json bytes (like I did in the unittest) it's on them as being out of spec. That being said the other idea I had was to change the argument type to be an |
Out-of-spec features have a tendency to morph into features that we actually have to support. For safety's sake, I think I prefer requiring the data to be JSON with your idea of passing a |
This commit pivots to leveraging custom JSONEncoder and JSONDecoder classes for the optional serializer and deserializer arguments instead of custom callables that return bytes. In the previous commit we used callables that would be passed the dictionary and returned bytes. However, while this was flexible it was pretty easy for a user to create QPY out of spec. While this was documented it is probably easier to just constrain the argument to always producing valid json. Using custom encoder and decoder classes let users still handle custom types but also restrict things to always be JSON. This should fix the potential risk in the previous iteration but still fix the underlying issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks stricter and safer now - I guess it promotes slightly less efficient serialisation, but being stricter about what's allowed in the format feels more reliable, since nothing will crash if you try and load an unknown QPY file now.
Just one comment about the release note.
releasenotes/notes/custom-serializers-qpy-0097ab79f239fcfc.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, one minor comment aside.
def default(self, o): # pylint: disable=invalid-name | ||
if isinstance(o, CustomObject): | ||
return {"__type__": "Custom", "value": o.string} | ||
return json.JSONEncoder.default(self, o) | ||
|
||
class CustomDeserializer(json.JSONDecoder): | ||
"""Custom json decoder to handle CustomObject.""" | ||
|
||
def object_hook(self, o): # pylint: disable=invalid-name,method-hidden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the pylint markers in the release note? I didn't think we'd be checking those blocks. If not, possibly best to get rid of them to make things cleaner for readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch, no we don't. I just copied this from the unittests verbatim Let's remove them
releasenotes/notes/custom-serializers-qpy-0097ab79f239fcfc.yaml
Outdated
Show resolved
Hide resolved
releasenotes/notes/custom-serializers-qpy-0097ab79f239fcfc.yaml
Outdated
Show resolved
Hide resolved
Co-authored-by: Jake Lishman <jake@binhbar.com>
Summary
This commit adds a new argument to load(), metadata_serializer, and
dump(), metadata_deserializer, which is used to specify custom
serialization functions to use for serializing the python dictionary for
each circuit's metadata attribute. By default QPY expects the serialized
metadata payload to be encoded as a utf8 json string. But since there
are no user constraints on the metadata dictionary's contents a simple
json.dump() of the circuit's metadata attribute may not be feasible. To
address this issue these new arguments are added to enable users to
specify a custom function to encode the dictionary as binary data
however is required for their circuit's metadata and then read that
custom format as required when deserializing.
The QPY format specification does not change here and still explicitly
lists a UTF8 encoded JSON string as the contents. A fixed format is
needed in the format specification for potential interoperability for
other tools generating QPY data (none exist currently to my knowledge
but nothing precludes it which is why the format is publicly
documented). These new arguments are specific to only the qiskit
implementation and should only be used when generating qpy files and
loading them via qiskit.
Details and comments