-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation of existing Poincaré embedding implementations #1643
Conversation
"outputs": [], | ||
"source": [ | ||
"# Set this to the path of the directory that is to contain the poincare-embedding directory\n", | ||
"parent_directory = '/home/jayant/projects/'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use path relative to the notebook, so that you don't need to hardcode your home dir path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
], | ||
"source": [ | ||
"# Download and apply patch\n", | ||
"patch_url = \"https://raw.githubusercontent.com/RaRe-Technologies/gensim/poincare_eval/docs/notebooks/poincare_data/poincare_burn_in_eps.patch\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch is in the repo so you can just reference the local file (poincare_data/poincare_burn_in_eps.patch
). Besides, the URL will get broken once the code is merged to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, changed.
"text": [ | ||
"Results for WordNet Link Prediction task\n", | ||
"+--------------------------------------------+---------------------------------+\n", | ||
"| | Model Dimensions |\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table header misaligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
" \n", | ||
" \n", | ||
" @staticmethod\n", | ||
" def get_positive_relation_ranks(distances, positive_relations):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bit too much magic in this method implementation. Are you sure it actually does what it's expected to do? Have you tested it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree the code isn't straightforward. I changed distances
to all_distances
to make it slightly clearer. Not sure how it can be made clearer. And yes, I've tested it -
all_distances = np.array([5, 10, 15, 20, 0, 1, 2, 3])
positive_relations = [4, 6, 1, 7]
assert(ReconstructionEvaluation.get_positive_relation_ranks(all_distances, positive_relations) == [1, 2, 3, 2])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern wasn't about naming, but rather about the used constructions. Mainly
ranks = (negative_relation_distances < positive_relation_distances[:, np.newaxis]).sum(axis=1) + 1
is pretty non-obvious. I have tested it myself, though, and indeed does seem to work, so that's good.
" break\n", | ||
" return np.mean(ranks)\n", | ||
" \n", | ||
" def evaluate_map(self, max_n=None):\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the plan with evaluate_map
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I'd forgotten. Will implement.
58fc15e
to
0f48f20
Compare
@jayantj please ignore CI now, don't worry about it. |
0c5609c
to
0690746
Compare
"source": [ | ||
"# Prepare the WordNet data\n", | ||
"wordnet_file = os.path.join(data_directory, 'wordnet_noun_hypernyms.tsv')\n", | ||
"! python {parent_directory}/poincare-embedding/scripts/create_wordnet_noun_hierarchy.py {wordnet_file}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect path: poincare-embedding
-> poincare-cpp-embedding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After correcting the path I'm getting:
Traceback (most recent call last):
File "/data_ssd/poincare/gensim/docs/notebooks/poincare/poincare-cpp-embedding/scripts/create_wordnet_noun_hierarchy.py", line 4, in <module>
from nltk.corpus import wordnet as wn
ImportError: No module named nltk.corpus
We may need to add nltk
to requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and click
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about it, it doesn't seem a great idea to make these gensim requirements just because of the notebook. Let's just do
import nltk
import click
nltk.download('wordnet')
in the first cell in the notebook so that it fails there if these libs are missing. It should be obvious for the user that they need to install these. (I don't expect non-expert users to ever run the notebook.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense. I added the imports at the beginning, and a link to a separate requirements.txt
in poincare/
for easy installation of missing imports.
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"You might need to install an updated version of `cmake` to be able to compile the source code. Please make sure that the binary `poincare-embedding` has been created before proceeding." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't ask the user to do something that's trivial to do automatically. Just check if the file exists and raise an exception if it doesn't. Otherwise, the execution flow continues and fails much later. At that point it's no longer obvious what the root cause is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"if not os.path.exists(cpp_repo_name):\n", | ||
" ! git clone https://github.com/TatsuyaShirakawa/poincare-embedding.git {cpp_repo_name}\n", | ||
"\n", | ||
"patches_applied = False" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong. If the notebook is re-run with force_setup == False
the patches will get applied twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
14060f3
to
b7d0095
Compare
…misaligned header
b7d0095
to
99089a5
Compare
#1624