You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on adding the Spec2Vec package to the data processing pipeline@RECETOX. The implementation would include passing a trained model to Spec2Vec to compute the similarity. Due to the security vulnerability of the Pickle data format, we decided to implement a custom JSON exporter and importer for the model inside our pipeline. But we thought you might be interested in adding this functionality to the package (e.g., for #77).
Advantages: no security vulnerabilities – it will be safe to compute scores based on publicly available models, etc. Disadvantages: such a model will not be trainable anymore and can only be used to compute scores
Implementation
Create a lightweight subclass of gensim.model.Word2Vec, that will only store the data required by Spec2Vec to compute the scores.
Implement exporting and importing for this subclass. The model will be stored in two files: a .json file for metadata, and .npy or .npz for storing the model's embeddings as numpy.ndarray or scipy.sparse respectively.
The text was updated successfully, but these errors were encountered:
Description
I am working on adding the Spec2Vec package to the data processing pipeline @RECETOX. The implementation would include passing a trained model to Spec2Vec to compute the similarity. Due to the security vulnerability of the Pickle data format, we decided to implement a custom JSON exporter and importer for the model inside our pipeline. But we thought you might be interested in adding this functionality to the package (e.g., for #77).
Advantages: no security vulnerabilities – it will be safe to compute scores based on publicly available models, etc.
Disadvantages: such a model will not be trainable anymore and can only be used to compute scores
Implementation
gensim.model.Word2Vec
, that will only store the data required by Spec2Vec to compute the scores..json
file for metadata, and.npy
or.npz
for storing the model's embeddings asnumpy.ndarray
orscipy.sparse
respectively.The text was updated successfully, but these errors were encountered: