This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
[FOR DISCUSSION] Automatically capture params on object construction #4413
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As I was writing parts of the guide, one thing that bothered me was that we don't have a good story around model saving and loading when you're not using config files. Also, I saw examples of huggingface code where you could just call
.save_pretrained()
on your in-python object and have things just work. For us to do that, we have to be able to save objects for which we don't have implementation code, but I think we can still do it. I got to thinking about this problem this morning, and was playing around with something that actually works, pretty simply.Basic idea: use a single metaclass on
FromParams
to make it so that allFromParams
objects (including subclasses) have a_params
field, which stores the config that they would have been created with. Then the model can just implement a simple.save()
method which saves_params
as a json object, then calls the archive code. We could probably even just remove the archive code altogether and have it live as methods onModel
, because this would make it much easier.I implemented a quick prototype that works for simple classes. I put it as a pull request instead of an issue so that the code could be more easily commented on. There are three major hurdles that I see to getting this to actually work, which may or may not be surmountable:
We allow subclasses to take
**kwargs
, and inspect the superclass to grab those arguments. This one should be pretty straightforward - we can directly use the existing logic to get the parameter list, and that might just be enough by itself.When we use non-standard constructors, like
from_partial_objects
, or what we do inVocabulary
. We have to know a priori what things are registered that way, and override them accordingly. This might be possible if we have this metaclass inspect theRegistry
somehow.We need to know which parameters don't need to get saved in the config file. Things like the
Vocabulary
for theModel
object (because it's actually specified higher up in the config). We don't have a way of programmatically detecting this, and I'm not sure how to do it. The goal of this is just for use in saving models, so we could feasibly special-case a thing or two (we'd want to serialize the vocabulary separately anyway), but that's not ideal. This one needs some more thought.A possible side-benefit of this is being able to pickle and recover some objects for use in multi-processing using just the saved
_params
.@epwalsh, @dirkgr, what do you think?