-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Poincare Model implementation #1696
Conversation
gensim/models/poincare.py
Outdated
Whether the input array contains any duplicates. | ||
|
||
""" | ||
seen = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return len(array) != len(set(array))
simpler. Probably not even worth adding a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
gensim/models/poincare.py
Outdated
|
||
Parameters | ||
---------- | ||
train_data : iterable of (str, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str
is ambiguous for Python 2 vs 3. Better to say unicode
or bytes
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both unicode
and bytes
are allowed here. Wherever this is true, I've used str
, wherever a specific type is required returned, I've used unicode
/bytes
. Does that sound okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, perfect. Are you sure it works correctly with bytes, though? I suppose if we train on bytes we'll end up with a bytes based model. I wonder if that's common for other gensim models. Won't that cause unexpected behavior with some KeyedVectors
calls?
gensim/models/poincare.py
Outdated
node_relations = defaultdict(set) # Mapping from node index to its related node indices | ||
|
||
logger.info("Loading relations from train data..") | ||
for hypernym_pair in self.train_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename hypernym_pair
to something more generic such as relation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
gensim/models/poincare.py
Outdated
Vectors of all nodes `u` in the batch. | ||
Expected shape (batch_size, dim). | ||
vectors_v : numpy.array | ||
Vectors of all hypernym nodes `v` and negatively sampled nodes `v'`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just "nodes"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
gensim/models/poincare.py
Outdated
|
||
|
||
class PoincareRelations(object): | ||
"""Class to stream hypernym relations for `PoincareModel` from a tsv-like file.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just "relations", here and elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
gensim/models/poincare.py
Outdated
"""Class to stream hypernym relations for `PoincareModel` from a tsv-like file.""" | ||
|
||
def __init__(self, file_path, encoding='utf8', delimiter='\t'): | ||
"""Initialize instance from file containing one hypernym pair per line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hypernym pair -> relation (here and elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
72d6fc4
to
930dfd4
Compare
I've added the rst files and made some fixes for python2 bugs. The only failing test is the one that requires autograd now (due to it being missing from test dependencies). With autograd added to test dependencies, the build errors (due to some MKL error, as you mentioned). |
@jayantj maybe remove this test (because we can't run it correctly in CI)? |
gensim/models/poincare.py
Outdated
def __init__( | ||
self, train_data, size=50, alpha=0.1, negative=10, workers=1, | ||
epsilon=1e-5, burn_in=10, burn_in_alpha=0.01, init_range=(-0.001, 0.001), seed=0): | ||
"""Initialize and train a Poincare embedding model from an iterable of transitive closure relations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the transitive closure a requirement? If not, let's just say "iterable of relations".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@menshikh-iv I've instead added a skiptest in case autograd is not installed, that way we can continue to check if the test runs locally, making development easier. Does that seem okay? |
169fbfc
to
dfc19cb
Compare
Pure Python implementation of the Poincare model from [1].
TODO -
Follow up PR: #1700
[1] Poincaré Embeddings for Learning Hierarchical Representations