You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, deepnog ships one model per eggnog level and network architecture.
If we ever decide to retrain certain models, users need to individually come up with strategies to tell models apart, or use a specific model (e.g., for reproducibility), such as manually moving files around, renaming accordingly, etc.
Retraining, however, could sometimes make sense. For example, we might want to use different data splits, increase the share of training sequences compared to test sequences to squeeze a little more performance out of the model.
We should at least introduce some versioning, model identifiers, etc., that are stored with the model. Could be a simple string inside the model_dict. This could even be "backported" to existing models.
Ideally, automatic model download should also be version-aware. Currently, a user that already has downloaded a model will not receive any updated model.
The text was updated successfully, but these errors were encountered:
To summarize some key points of the recent discussion:
Models will receive a metadata field that holds the following information,
UUID as model identifier
Date & timestamp of training
training params (incl. learning rate, scheduler, number of epochs, etc.)
Orthology DB name
Taxonomic level in DB
metadata format version (v1 for now, v2 if this ever needs to be extended)
Technically, this can be implemented as a dict that is serialized into the .pth model file.
This can be backported to old models.
Model filenames obtain a version hint, e.g. the date, or v1, v2, etc., and a "latest" pointer to the most up-to-date version.
The client subcommand deepnog infer will use a use_latest boolean flag to use the latest model (otherwise, the one currently installed). A warning/info could be issued to users, when new models are available.
Currently, deepnog ships one model per eggnog level and network architecture.
If we ever decide to retrain certain models, users need to individually come up with strategies to tell models apart, or use a specific model (e.g., for reproducibility), such as manually moving files around, renaming accordingly, etc.
Retraining, however, could sometimes make sense. For example, we might want to use different data splits, increase the share of training sequences compared to test sequences to squeeze a little more performance out of the model.
We should at least introduce some versioning, model identifiers, etc., that are stored with the model. Could be a simple string inside the model_dict. This could even be "backported" to existing models.
Ideally, automatic model download should also be version-aware. Currently, a user that already has downloaded a model will not receive any updated model.
The text was updated successfully, but these errors were encountered: