Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model serialization #81

Merged
merged 57 commits into from
Oct 1, 2022
Merged

Model serialization #81

merged 57 commits into from
Oct 1, 2022

Conversation

florian-huber
Copy link
Member

@florian-huber florian-huber commented Sep 28, 2022

See description by @maximskorik:

Added functionality to export and import genesim.models.Word2Vec objects without pickling. Such a model can be used to calculate Spec2Vec scores, but is untrainable.
The model can be written on disk in two files: .json for model's metadata and .npy for its weights. If model's weights are in scipy.sparse format, the weights are converted into numpy.ndarray prior to saving (scipy uses pickle to save matrices); when reading the model such weights are converted back to their initial format.

  • Added serialization module, which exposes export_model and import_model functions;
  • added Word2VecLight class, which follows the interface of the original Word2Vec just enough to calculate similarity scores;
  • added data subdir to tests, which stores test files for serialization; moved pesticides.mgf to that directory;
  • added scipy dependency to the library

Testing

  • added test_model_serialization.py to tests; all newly introduced scripts are 100% covered as per pytest --cov
  • integration test with saving and reloading the model passes (this change is not pushed to this PR)

Closes #78

@florian-huber
Copy link
Member Author

Thanks again for the great work @maximskorik !
Since there haven't been additions for a while, some tests failed due to old code examples in the documentation or outdated dependencies. This should be fixed now and I will merge your code into the main branch. A new release should follow soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request] add JSON to export and import the model
2 participants