This repo examines the application of poincaré embedding in encoding the similarity between python modules (e.g. those found in common libraries such as NumPy, SciPy, Sklearn), measured by the shortest path in the tree constructed based on the hierarchy of these modules, crawled from Python documentation. Modules with the same name, e.g. “scipy.stats._continuous_distns.lomax_gen.fit” and “sklearn.linear_model.base.LinearModel.fit”, are distinguished as different nodes in the tree. During training (multithreaded async SGD), edges connecting modules with the same name are optionally added, with the assumption that the same name signals similar functionality (e.g. estimating the fit of data to a model/ distribution). To evaluate the effectiveness of the representation learned, Pearson correlation is computed for embedding distance versus shortest path distance in the original tree, between nodes that are under the same library (e.g. Numpy).
Adapted from PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations by Facebook AI Research
- Python 3 with NumPy
- PyTorch
- Scikit-Learn
- NLTK (to generate the WordNet data)
If you find this code useful for your research, please cite the following paper in your publication:
@incollection{nickel2017poincare,
title = {Poincar\'{e} Embeddings for Learning Hierarchical Representations},
author = {Nickel, Maximilian and Kiela, Douwe},
booktitle = {Advances in Neural Information Processing Systems 30},
editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
pages = {6341--6350},
year = {2017},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7213-poincare-embeddings-for-learning-hierarchical-representations.pdf}
}
This code is licensed under CC-BY-NC 4.0.