Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] add JSON to export and import the model #78

Closed
maximskorik opened this issue Aug 1, 2022 · 0 comments · Fixed by #80 or #81
Closed

[feature request] add JSON to export and import the model #78

maximskorik opened this issue Aug 1, 2022 · 0 comments · Fixed by #80 or #81

Comments

@maximskorik
Copy link
Collaborator

Description

I am working on adding the Spec2Vec package to the data processing pipeline @RECETOX. The implementation would include passing a trained model to Spec2Vec to compute the similarity. Due to the security vulnerability of the Pickle data format, we decided to implement a custom JSON exporter and importer for the model inside our pipeline. But we thought you might be interested in adding this functionality to the package (e.g., for #77).

Advantages: no security vulnerabilities – it will be safe to compute scores based on publicly available models, etc.
Disadvantages: such a model will not be trainable anymore and can only be used to compute scores

Implementation

  • Create a lightweight subclass of gensim.model.Word2Vec, that will only store the data required by Spec2Vec to compute the scores.
  • Implement exporting and importing for this subclass. The model will be stored in two files: a .json file for metadata, and .npy or .npz for storing the model's embeddings as numpy.ndarray or scipy.sparse respectively.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant