-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add Poincare model #1757
Merged
Merged
[MRG] Add Poincare model #1757
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…misaligned header
Evaluation of existing Poincaré embedding implementations
* Initial classes and loading data for poincare model * Initial implementation of training using autograd * faster negative sampling, bugfix in vector updates * allows poincare dist function to be differentiable by autograd * batched gradient descent initial implementation * minor changes to batch poincare distance computation * Adds calculation of gradients for poincare model * Correct implementation of clipping of updated vectors * Fixes error in gradient computation * Better messages while training * Renames PoincareDistance to PoincareExample for clarity * Compares computed gradients to autograd gradients every few iterations * Avoids doing some numpy computations twice * Avoids creating copies of numpy vectors * Only calls nan_to_num when gamma has at least one value equal to 1 * Simply sets nan gradients to zero instead of nan_to_num * Adds batch-wise implementation of training and gradient computations * Minor correction in clipping * Fixes typo in clip_vectors * Prints average loss every few iterations instead of current loss * Adds weighted negative sampling * Ensures positive edges are not returned by negative sampling * Poincare model stores node indices in relations instead of node keys * Minor renaming; uses node indices for batch training instead of node keys * Changes shapes of vectors passed to PoincareBatch * Minor bugfixes related to batch size * Corrects implementation of negative sampling for batch training * Adds option to check gradients in batchwise training * Checks gradients only every few iterations * Handles multiple occurrence of same node across and within batches * Removes unused section of code * Implements slightly different clipping method * Fixes bugs with wrong reshape in batchwise training * Example-wise training takes into account multiple occurrences of same node in an example too * Batchwise training prints average loss over many iterations instead of current batch * Fixes bug in updating vector for batchwise training * Faster implementation of negative sampling * Negative sampling for a node follows different paths depending on fraction of positive relations * Uses a buffer for negative samples to reduce calls to np.random.choice * Cleans up poincare.py, removes unused code * Adds shapes to PoincareBatch, more documentation * Adds more documentation to PoincareModel * Stores indices for nodes in a batch in PoincareBatch for better encapsulation * More documentation for poincare module * Implements burn-in for poincare model * Slightly better logging for poincare model * Uses np.random.random and np.searchsorted for random sampling rather than np.random.choice * Removes duplicates in negative samples * Moves helper classes in poincare after PoincareModel * Change in PoincareModel API to allow initializing from an iterable, separate class for streaming from file * Adds failing test for handling encoding in PoincareData * Fixes encoding handling in PoincareData * Adds docstrings to PoincareData, PoincareData streams tuples now * More unittests for PoincareModel * Changes handle_duplicates to staticmethod, adds test * Adds batch size and print_every parameters to train method * Renames print_check to should_print * Adds separate parameter for checking gradients * Minor fixes for coding style * Removes default values from docstrings, redundant * Adds example to PoincareModel init docstring * Extracts buffer for negatives out into a separate class * More detailed logging, fix to check_gradients * Minor fixes to documentation in poincare.py * Adds tests for gradients checking * Raise AssertionError if gradients check fails * Adds failing tests for saving/loading PoincareModel instances * Fixes bug with saving/loading PoincareModel to disk * Adds test and fix for raising error on invalid input data * Adds test and fix for no duplicates and positives in negative sample * Bugfix with NegativesBuffer having less than items left * Uses larger data for poincare tests, adds data files * Bugfix with incorrect use of random state * Minor fixes in documentation style * Renames PoincareData to PoincareRelations * Change in the order of conditions checked before resampling * Imports datapath from test.utils instead of defining own * Adds working examples and a more detailed description in docstring * Renames term_relations to node_relations * Removes unused imports * Moves iter parameter to train instead of __init__, renames to epochs * Fixes term_relations in tests * Adds option to disable gradient check, disabled by default * Extracts gradient checking code into a separate method * Conditionally import autograd only if gradient checking is enabled * Marks private methods in poincare module with leading underscore * Adds init_range as an API parameter to PoincareModel * Marks private properties with a leading underscore * Fixes bug with burn-in happening on subsequent calls to train * Adds test for training multiple times * Adds autograd to test dependencies * Renames wv to kv in PoincareModel * add numpy==1.12 as test dependency * add missing quote * try to run tests without autograd * fix PEP8 in poincare.py * fix PEP8 in test_poincare * PoincareRelations handles python2 correctly * Bugfix with int division for python2 * Imports mock module for tests correctly in python2 * Cleaner implementation of __iter__ for PoincareRelations * Adds rst file and updates apiref.rst for poincare module * Adds clarifying comment to PoincareRelations.__iter__ * Updates rst file for poincare * Renames hypernym pair to relations everywhere * Simpler way of detecting duplicates * Minor documentation updates in poincare.py * Skips gradients test if autograd not installed, adds test for bytes input data * Fix flake8 (noqa + remove unused var) * Fix missing mock dependency for win * Fix links in docstrings * Changes error message for negative sampling failing * Adds option to specify dtype for PoincareModel and corresponding unittest * Extends test for dtype to check after training, updates docstring
* Initial classes and loading data for poincare model * Initial implementation of training using autograd * faster negative sampling, bugfix in vector updates * allows poincare dist function to be differentiable by autograd * batched gradient descent initial implementation * minor changes to batch poincare distance computation * Adds calculation of gradients for poincare model * Correct implementation of clipping of updated vectors * Fixes error in gradient computation * Better messages while training * Renames PoincareDistance to PoincareExample for clarity * Compares computed gradients to autograd gradients every few iterations * Avoids doing some numpy computations twice * Avoids creating copies of numpy vectors * Only calls nan_to_num when gamma has at least one value equal to 1 * Simply sets nan gradients to zero instead of nan_to_num * Adds batch-wise implementation of training and gradient computations * Minor correction in clipping * Fixes typo in clip_vectors * Prints average loss every few iterations instead of current loss * Adds weighted negative sampling * Ensures positive edges are not returned by negative sampling * Poincare model stores node indices in relations instead of node keys * Minor renaming; uses node indices for batch training instead of node keys * Changes shapes of vectors passed to PoincareBatch * Minor bugfixes related to batch size * Corrects implementation of negative sampling for batch training * Adds option to check gradients in batchwise training * Checks gradients only every few iterations * Handles multiple occurrence of same node across and within batches * Removes unused section of code * Implements slightly different clipping method * Fixes bugs with wrong reshape in batchwise training * Example-wise training takes into account multiple occurrences of same node in an example too * Batchwise training prints average loss over many iterations instead of current batch * Fixes bug in updating vector for batchwise training * Faster implementation of negative sampling * Negative sampling for a node follows different paths depending on fraction of positive relations * Uses a buffer for negative samples to reduce calls to np.random.choice * Cleans up poincare.py, removes unused code * Adds shapes to PoincareBatch, more documentation * Adds more documentation to PoincareModel * Stores indices for nodes in a batch in PoincareBatch for better encapsulation * More documentation for poincare module * Implements burn-in for poincare model * Slightly better logging for poincare model * Uses np.random.random and np.searchsorted for random sampling rather than np.random.choice * Removes duplicates in negative samples * Moves helper classes in poincare after PoincareModel * Change in PoincareModel API to allow initializing from an iterable, separate class for streaming from file * Adds failing test for handling encoding in PoincareData * Fixes encoding handling in PoincareData * Adds docstrings to PoincareData, PoincareData streams tuples now * More unittests for PoincareModel * Changes handle_duplicates to staticmethod, adds test * Adds batch size and print_every parameters to train method * Renames print_check to should_print * Adds separate parameter for checking gradients * Minor fixes for coding style * Removes default values from docstrings, redundant * Adds example to PoincareModel init docstring * Extracts buffer for negatives out into a separate class * More detailed logging, fix to check_gradients * Minor fixes to documentation in poincare.py * Adds support for most_similar to PoincareKeyedVectors * Refactors most_similar and loss_fn to use PoincareKeyedVectors.poincare_dists * Adds tests for gradients checking * Raise AssertionError if gradients check fails * Adds failing tests for saving/loading PoincareModel instances * Fixes bug with saving/loading PoincareModel to disk * Adds test and fix for raising error on invalid input data * Adds test and fix for no duplicates and positives in negative sample * Bugfix with NegativesBuffer having less than items left * Uses larger data for poincare tests, adds data files * Bugfix with incorrect use of random state * Minor fixes in documentation style * Renames PoincareData to PoincareRelations * Change in the order of conditions checked before resampling * Imports datapath from test.utils instead of defining own * Adds working examples and a more detailed description in docstring * Renames term_relations to node_relations * Removes unused imports * Moves iter parameter to train instead of __init__, renames to epochs * Fixes term_relations in tests * Adds option to disable gradient check, disabled by default * Extracts gradient checking code into a separate method * Conditionally import autograd only if gradient checking is enabled * Marks private methods in poincare module with leading underscore * Adds init_range as an API parameter to PoincareModel * Marks private properties with a leading underscore * Fixes bug with burn-in happening on subsequent calls to train * Adds test for training multiple times * Adds autograd to test dependencies * Renames wv to kv in PoincareModel * add numpy==1.12 as test dependency * add missing quote * Moves methods for evaluating poincare embeddings to poincare.py * Updates docstrings for newly added classes * Moves trie-related methods to LexicalEntailmentEvaluation * Moves code for loading PoincareEmbedding into notebook * Removes PoincareEmbedding class, adds functionality to PoincareKeyedVectors * Updates eval nb with code and evaluation results for gensim models * Minor documentation updates + bugfix in distance * Adds methods for rank and nodes_closer_than to PoincareKeyedVectors * Adds methods to return closest child, parent, and ancestor and descendant chain for an input node * Updates LE and reconstruction results for gensim models in eval nb * Adds notebook detailing Poincare embedding operations and report * Adds images for poincare embedding report * Updates image links in poincare report nb * try to run tests without autograd * fix PEP8 in poincare.py * fix PEP8 in test_poincare * PoincareRelations handles python2 correctly * Bugfix with int division for python2 * Imports mock module for tests correctly in python2 * Cleaner implementation of __iter__ for PoincareRelations * Adds rst file and updates apiref.rst for poincare module * Adds clarifying comment to PoincareRelations.__iter__ * Adds functions for visualization to poincare_visualization.py * Suppresses certain numpy warnings while training model * Updates rst file for poincare * Updates poincare report nb with reduced code, section on training, better visualization labels and titles * Renames hypernym pair to relations everywhere * Simpler way of detecting duplicates * Minor documentation updates in poincare.py * Skips gradients test if autograd not installed, adds test for bytes input data * Adds results of gensim models on link prediction to eval notebook * Adds link prediction results to report, more information about training * Adds further details to concept and motivation sections, section on future work, and images * Fix flake8 (noqa + remove unused var) * Fix missing mock dependency for win * Fix links in docstrings * Refactors KeyedVectors into KeyedVectorsBase and EuclideanKeyedVectors * Changes error message for negative sampling failing * Adds option to specify dtype for PoincareModel and corresponding unittest * Extends test for dtype to check after training, updates docstring * Adds tests for new methods in PoincareKeyedVectors * Fixes bug in closest_child implementation * Adds similarity and distance to KeyedVectorsBase interface, implementation and tests for similarity for PoincareKeyedVectors * Minor fixes to Poincare report notebook * Adds method to compute all distances to KeyedVectorsBase, moves most_similar from EuclideanKeyedVectors to KeyedVectorsBase * Allows PoincareKeyedVectors.distances to accept an optional list of words * Adds implementation of PoincareKeyedVectors.similarities and tests * Adds restrict_vocab option to most_similar and tests for EuclideanKeyedVectors.most_similar * Adds docstring for tests * Adds implementation of EuclideanKeyedVectors.distances and tests, updates docstrings * Moves most_similar_to_given to KeyedVectorsBase, adds tests * Moves similar_by_vector and similar_by_word to KeyedVectorsBase, adds tests * Adds failing tests for similar_by_word and similar_by_vector to PoincareKeyedVector tests * Moves multiple methods out of KeyedVectorsBase back to EuclideanKeyedVectors, removes tests * Adds test for most_similar with vector input for EuclideanKeyedVectors * Adds failing test for vector input for most_similar for PoincareKeyedVectors * Allows passing in vector input to most_similar and distances methods in PoincareKeyedVectors * Removes precompute_max_distance and uses simpler formula for similarity in PoincareKeyedVectors * Renames PoincareKeyedVectors.poincare_dists to PoincareKeyedVectors.poincare_distance_batch * Fixes error with unclosed file in PoincareRelations * Adds tests and method for computing poincare distance between two input vectors * Adds methods and tests for finding position and difference in hierarchical positions of input vectors * Fixes unused import, pep8 and docstring issues * More intuitive naming of arguments for methods in PoincareKeyedVectors * Uses w1 and w2 consistently across KeyedVectors methods * Removes most_similar from KeyedVectorsBase * Adds failing tests for words_closer_than and rank for EuclideanKeyedVectors and PoincareKeyedVectors * Adds distances method to KeyedVectorsBase and EuclideanKeyedVectors, fixes tests * Makes default argument for distances immutable * Uses conditional import for pygtrie in LexicalEntailmentEvaluation * Renames position_in_hierarchy to norm with minor change in behaviour, updates tests * Renames poincare_distance and poincare_distance_batch to vector_distance and vector_distance_batch * Forces float division for positive_fraction in _sample_negatives * Removes unused method from PoincareKeyedVectors * Updates report notebook with usage examples of new API methods * Minor pep8 fix * Fixes pep8 issues, unused imports and typo * Adds example of saving and loading model to notebook * Updates docstrings in poincare.py * Moves poincare visualization methods to new gensim.viz module * Updates rst files for poincare viz * Adds newline at the end of poincare.py in viz package * Adds link to original paper to poincare notebook * fix viz.poincare & update docs dependencies * add link to init file * fix PEP8 * fixes for poincare.py
* Initial classes and loading data for poincare model * Initial implementation of training using autograd * faster negative sampling, bugfix in vector updates * allows poincare dist function to be differentiable by autograd * batched gradient descent initial implementation * minor changes to batch poincare distance computation * Adds calculation of gradients for poincare model * Correct implementation of clipping of updated vectors * Fixes error in gradient computation * Better messages while training * Renames PoincareDistance to PoincareExample for clarity * Compares computed gradients to autograd gradients every few iterations * Avoids doing some numpy computations twice * Avoids creating copies of numpy vectors * Only calls nan_to_num when gamma has at least one value equal to 1 * Simply sets nan gradients to zero instead of nan_to_num * Adds batch-wise implementation of training and gradient computations * Minor correction in clipping * Fixes typo in clip_vectors * Prints average loss every few iterations instead of current loss * Adds weighted negative sampling * Ensures positive edges are not returned by negative sampling * Poincare model stores node indices in relations instead of node keys * Minor renaming; uses node indices for batch training instead of node keys * Changes shapes of vectors passed to PoincareBatch * Minor bugfixes related to batch size * Corrects implementation of negative sampling for batch training * Adds option to check gradients in batchwise training * Checks gradients only every few iterations * Handles multiple occurrence of same node across and within batches * Removes unused section of code * Implements slightly different clipping method * Fixes bugs with wrong reshape in batchwise training * Example-wise training takes into account multiple occurrences of same node in an example too * Batchwise training prints average loss over many iterations instead of current batch * Fixes bug in updating vector for batchwise training * Faster implementation of negative sampling * Negative sampling for a node follows different paths depending on fraction of positive relations * Uses a buffer for negative samples to reduce calls to np.random.choice * Cleans up poincare.py, removes unused code * Adds shapes to PoincareBatch, more documentation * Adds more documentation to PoincareModel * Stores indices for nodes in a batch in PoincareBatch for better encapsulation * More documentation for poincare module * Implements burn-in for poincare model * Slightly better logging for poincare model * Uses np.random.random and np.searchsorted for random sampling rather than np.random.choice * Removes duplicates in negative samples * Moves helper classes in poincare after PoincareModel * Change in PoincareModel API to allow initializing from an iterable, separate class for streaming from file * Adds failing test for handling encoding in PoincareData * Fixes encoding handling in PoincareData * Adds docstrings to PoincareData, PoincareData streams tuples now * More unittests for PoincareModel * Changes handle_duplicates to staticmethod, adds test * Adds batch size and print_every parameters to train method * Renames print_check to should_print * Adds separate parameter for checking gradients * Minor fixes for coding style * Removes default values from docstrings, redundant * Adds example to PoincareModel init docstring * Extracts buffer for negatives out into a separate class * More detailed logging, fix to check_gradients * Minor fixes to documentation in poincare.py * Adds support for most_similar to PoincareKeyedVectors * Refactors most_similar and loss_fn to use PoincareKeyedVectors.poincare_dists * Adds tests for gradients checking * Raise AssertionError if gradients check fails * Adds failing tests for saving/loading PoincareModel instances * Fixes bug with saving/loading PoincareModel to disk * Adds test and fix for raising error on invalid input data * Adds test and fix for no duplicates and positives in negative sample * Bugfix with NegativesBuffer having less than items left * Uses larger data for poincare tests, adds data files * Bugfix with incorrect use of random state * Minor fixes in documentation style * Renames PoincareData to PoincareRelations * Change in the order of conditions checked before resampling * Imports datapath from test.utils instead of defining own * Adds working examples and a more detailed description in docstring * Renames term_relations to node_relations * Removes unused imports * Moves iter parameter to train instead of __init__, renames to epochs * Fixes term_relations in tests * Adds option to disable gradient check, disabled by default * Extracts gradient checking code into a separate method * Conditionally import autograd only if gradient checking is enabled * Marks private methods in poincare module with leading underscore * Adds init_range as an API parameter to PoincareModel * Marks private properties with a leading underscore * Fixes bug with burn-in happening on subsequent calls to train * Adds test for training multiple times * Adds autograd to test dependencies * Renames wv to kv in PoincareModel * add numpy==1.12 as test dependency * add missing quote * Moves methods for evaluating poincare embeddings to poincare.py * Updates docstrings for newly added classes * Moves trie-related methods to LexicalEntailmentEvaluation * Moves code for loading PoincareEmbedding into notebook * Removes PoincareEmbedding class, adds functionality to PoincareKeyedVectors * Updates eval nb with code and evaluation results for gensim models * Minor documentation updates + bugfix in distance * Adds methods for rank and nodes_closer_than to PoincareKeyedVectors * Adds methods to return closest child, parent, and ancestor and descendant chain for an input node * Updates LE and reconstruction results for gensim models in eval nb * Adds notebook detailing Poincare embedding operations and report * Adds images for poincare embedding report * Updates image links in poincare report nb * try to run tests without autograd * fix PEP8 in poincare.py * fix PEP8 in test_poincare * PoincareRelations handles python2 correctly * Bugfix with int division for python2 * Imports mock module for tests correctly in python2 * Cleaner implementation of __iter__ for PoincareRelations * Adds rst file and updates apiref.rst for poincare module * Adds clarifying comment to PoincareRelations.__iter__ * Adds functions for visualization to poincare_visualization.py * Suppresses certain numpy warnings while training model * Updates rst file for poincare * Updates poincare report nb with reduced code, section on training, better visualization labels and titles * Renames hypernym pair to relations everywhere * Simpler way of detecting duplicates * Minor documentation updates in poincare.py * Skips gradients test if autograd not installed, adds test for bytes input data * Adds results of gensim models on link prediction to eval notebook * Adds link prediction results to report, more information about training * Adds further details to concept and motivation sections, section on future work, and images * Fix flake8 (noqa + remove unused var) * Fix missing mock dependency for win * Fix links in docstrings * Refactors KeyedVectors into KeyedVectorsBase and EuclideanKeyedVectors * Changes error message for negative sampling failing * Adds option to specify dtype for PoincareModel and corresponding unittest * Extends test for dtype to check after training, updates docstring * Adds tests for new methods in PoincareKeyedVectors * Fixes bug in closest_child implementation * Adds similarity and distance to KeyedVectorsBase interface, implementation and tests for similarity for PoincareKeyedVectors * Minor fixes to Poincare report notebook * Adds method to compute all distances to KeyedVectorsBase, moves most_similar from EuclideanKeyedVectors to KeyedVectorsBase * Allows PoincareKeyedVectors.distances to accept an optional list of words * Adds implementation of PoincareKeyedVectors.similarities and tests * Adds restrict_vocab option to most_similar and tests for EuclideanKeyedVectors.most_similar * Adds docstring for tests * Adds implementation of EuclideanKeyedVectors.distances and tests, updates docstrings * Moves most_similar_to_given to KeyedVectorsBase, adds tests * Moves similar_by_vector and similar_by_word to KeyedVectorsBase, adds tests * Adds failing tests for similar_by_word and similar_by_vector to PoincareKeyedVector tests * Moves multiple methods out of KeyedVectorsBase back to EuclideanKeyedVectors, removes tests * Adds test for most_similar with vector input for EuclideanKeyedVectors * Adds failing test for vector input for most_similar for PoincareKeyedVectors * Allows passing in vector input to most_similar and distances methods in PoincareKeyedVectors * Removes precompute_max_distance and uses simpler formula for similarity in PoincareKeyedVectors * Renames PoincareKeyedVectors.poincare_dists to PoincareKeyedVectors.poincare_distance_batch * Fixes error with unclosed file in PoincareRelations * Adds tests and method for computing poincare distance between two input vectors * Adds methods and tests for finding position and difference in hierarchical positions of input vectors * Fixes unused import, pep8 and docstring issues * More intuitive naming of arguments for methods in PoincareKeyedVectors * Uses w1 and w2 consistently across KeyedVectors methods * Removes most_similar from KeyedVectorsBase * Adds failing tests for words_closer_than and rank for EuclideanKeyedVectors and PoincareKeyedVectors * Adds distances method to KeyedVectorsBase and EuclideanKeyedVectors, fixes tests * Makes default argument for distances immutable * Uses conditional import for pygtrie in LexicalEntailmentEvaluation * Renames position_in_hierarchy to norm with minor change in behaviour, updates tests * Renames poincare_distance and poincare_distance_batch to vector_distance and vector_distance_batch * Forces float division for positive_fraction in _sample_negatives * Removes unused method from PoincareKeyedVectors * Updates report notebook with usage examples of new API methods * Minor pep8 fix * Adds l2 regularization to poincare model training * Cleaner way to avoid dependency on autograd * Fixes pep8 issues, unused imports and typo * Adds example of saving and loading model to notebook * Updates docstrings in poincare.py * Updates docstring for regularization coefficient in PoincareModel * Moves poincare visualization methods to new gensim.viz module * Updates rst files for poincare viz * Adds newline at the end of poincare.py in viz package * Adds link to original paper to poincare notebook * Adds l2 regularization to poincare model training * Cleaner way to avoid dependency on autograd * Updates docstring for regularization coefficient in PoincareModel
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Final PR (merge all Poincare stuff to
develop
)CC: @jayantj @janpom