Use metaclass to subclass errors to allow better pickling #9593
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When trying to pickle with dill, as happens under the hood when processing a HuggingFace
dataset
, I found that spaCy couldn't be pickled (related issue: huggingface/datasets#3178). I found that the cause wasdill.Pickler
that recursively pickled all objects (Pickler(file, recurse=True).dump(obj)
). This is preferred indatasets
: such iterative pickling allows for fingerprinting, i.e. keeping track of all the processing done to a dataset, deterministically. If the fingerprint is the same as an earlier processing attempt, then a cache can be used and the processing (e.g. tokenizing with spaCy) does not have to be done again. Very useful and great to save time and processing.The culprit in the spaCy codebase that could not be pickled in this way, was this part
spaCy/spacy/errors.py
Lines 4 to 15 in f1bc655
The class
ErrorsWithCodes
cannot be pickled here because it does not have a fixed signature in the global scope (its superclass is not known beforehand). After digging through the Internet (thank you Stack Overflow!) I found that a metaclass is the way to go, as implemented in this PR. In my opinion, this is also more structurally clear than before. Regular subclassing a metaclass instead of dynamically created a class based on a subclass with a decorator. The metaclass defines the magic methods that are required to access the class' attributes, rather than accessing instance attributes.I modified the error test accordingly, which completes successfully.
After this modification, recursive dill pickling works flawlessly as well.
closes #9584
Types of change
Enhancement
Checklist