Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation of existing Poincaré embedding implementations #1643

Merged
merged 20 commits into from
Oct 30, 2017

Conversation

janpom
Copy link
Contributor

@janpom janpom commented Oct 23, 2017

@janpom janpom changed the title Evaluation of existing Pointcaré embedding implementations #1624 Evaluation of existing Pointcaré embedding implementations Oct 23, 2017
"outputs": [],
"source": [
"# Set this to the path of the directory that is to contain the poincare-embedding directory\n",
"parent_directory = '/home/jayant/projects/'"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use path relative to the notebook, so that you don't need to hardcode your home dir path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

],
"source": [
"# Download and apply patch\n",
"patch_url = \"https://raw.githubusercontent.com/RaRe-Technologies/gensim/poincare_eval/docs/notebooks/poincare_data/poincare_burn_in_eps.patch\"\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch is in the repo so you can just reference the local file (poincare_data/poincare_burn_in_eps.patch). Besides, the URL will get broken once the code is merged to master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed.

"text": [
"Results for WordNet Link Prediction task\n",
"+--------------------------------------------+---------------------------------+\n",
"| | Model Dimensions |\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table header misaligned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

" \n",
" \n",
" @staticmethod\n",
" def get_positive_relation_ranks(distances, positive_relations):\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit too much magic in this method implementation. Are you sure it actually does what it's expected to do? Have you tested it?

Copy link
Contributor

@jayantj jayantj Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree the code isn't straightforward. I changed distances to all_distances to make it slightly clearer. Not sure how it can be made clearer. And yes, I've tested it -

all_distances = np.array([5, 10, 15, 20, 0, 1, 2, 3])
positive_relations = [4, 6, 1, 7]
assert(ReconstructionEvaluation.get_positive_relation_ranks(all_distances, positive_relations) == [1, 2, 3, 2])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern wasn't about naming, but rather about the used constructions. Mainly

ranks = (negative_relation_distances < positive_relation_distances[:, np.newaxis]).sum(axis=1) + 1

is pretty non-obvious. I have tested it myself, though, and indeed does seem to work, so that's good.

" break\n",
" return np.mean(ranks)\n",
" \n",
" def evaluate_map(self, max_n=None):\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan with evaluate_map?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I'd forgotten. Will implement.

@menshikh-iv
Copy link
Contributor

@jayantj please ignore CI now, don't worry about it.

@gojomo gojomo changed the title Evaluation of existing Pointcaré embedding implementations Evaluation of existing Poincaré embedding implementations Oct 26, 2017
"source": [
"# Prepare the WordNet data\n",
"wordnet_file = os.path.join(data_directory, 'wordnet_noun_hypernyms.tsv')\n",
"! python {parent_directory}/poincare-embedding/scripts/create_wordnet_noun_hierarchy.py {wordnet_file}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect path: poincare-embedding -> poincare-cpp-embedding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After correcting the path I'm getting:

Traceback (most recent call last):
  File "/data_ssd/poincare/gensim/docs/notebooks/poincare/poincare-cpp-embedding/scripts/create_wordnet_noun_hierarchy.py", line 4, in <module>
    from nltk.corpus import wordnet as wn
ImportError: No module named nltk.corpus

We may need to add nltk to requirements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and click as well.

Copy link
Contributor Author

@janpom janpom Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, it doesn't seem a great idea to make these gensim requirements just because of the notebook. Let's just do

import nltk
import click

nltk.download('wordnet')

in the first cell in the notebook so that it fails there if these libs are missing. It should be obvious for the user that they need to install these. (I don't expect non-expert users to ever run the notebook.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. I added the imports at the beginning, and a link to a separate requirements.txt in poincare/ for easy installation of missing imports.

"cell_type": "markdown",
"metadata": {},
"source": [
"You might need to install an updated version of `cmake` to be able to compile the source code. Please make sure that the binary `poincare-embedding` has been created before proceeding."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't ask the user to do something that's trivial to do automatically. Just check if the file exists and raise an exception if it doesn't. Otherwise, the execution flow continues and fails much later. At that point it's no longer obvious what the root cause is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"if not os.path.exists(cpp_repo_name):\n",
" ! git clone https://github.com/TatsuyaShirakawa/poincare-embedding.git {cpp_repo_name}\n",
"\n",
"patches_applied = False"
Copy link
Contributor Author

@janpom janpom Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. If the notebook is re-run with force_setup == False the patches will get applied twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@janpom janpom merged commit 3ea3730 into poincare Oct 30, 2017
@janpom janpom deleted the poincare_eval branch October 30, 2017 09:46
@jayantj jayantj mentioned this pull request Dec 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants