Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/node2vec for issue Word2Vec in StellarGraph #255 #536

Merged
merged 142 commits into from
May 7, 2020

Conversation

daokunzhang
Copy link
Contributor

This is the pull request for adding Keras Node2Vec layer to StellarGraph library.

I mainly made the following changes:

  1. Add the Node2Vec layer.
  2. Add the Node2VecNodeGenerator and Node2VecLinkGenerator mapper.
  3. Change the interface for the __init__ and run function of BiasedRandomWalk to make UnSupervisedSampler support BiasedRandomWalk
  4. Write the Keras-node2vec notebook examples for embedding learning and node classification
  5. Add the unit test for the added node2vec layer and mappers

…mbedding learning with keras node2vec implementation
…xample of node classification with keras node2vec implementation
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

self.edge_weight_label = edge_weight_label
self._check_weights(self.p, self.q, self.weighted, self.edge_weight_label)

def run(self, nodes=None, n=None, length=None, seed=None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor this function to reduce its Cognitive Complexity from 56 to the 15 allowed.

sample_counter += 1

# If the batch_size number of samples are accumulated, yield.
if sample_counter == batch_size:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid deeply nested control flow statements.


yield edge_ids, edge_labels

if self.bidirectional is False:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid deeply nested control flow statements.

"""

def __init__(self, G, nodes=None, length=2, number_of_walks=1, seed=None):
def __init__(
self,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method "init" has 9 parameters, which is greater than the 7 authorized.

@codeclimate
Copy link

codeclimate bot commented Nov 29, 2019

Code Climate has analyzed commit 18274d2 and detected 2 issues on this pull request.

Here's the issue category breakdown:

Category Count
Security 2

View more on Code Climate.

@daokunzhang
Copy link
Contributor Author

Hi @kjun9

Thanks for your careful review! I have made necessary changes as per your suggestions. You can start the next round review now.

Best regards!
Daokun

@kjun9
Copy link
Contributor

kjun9 commented Apr 27, 2020

@daokunzhang

Though it is claimed that the same parameter is used to represent input and output embeddings, the implementation provided by authors directly deploys the gensim Word2Vec function, which is consistent with our gensim node2vec notebook. Here, I try to make a consistency with the gensim Word2Vec implementation, which uses different parameters to represent the so-called input and output embeddings.

Thanks for the explanation, that makes sense 👍 This looks good to me now, just one last thing to update that I can see is that in the demo notebooks, we're trying to keep the first markdown cell only contain the title (for example https://github.com/stellargraph/stellargraph/blob/develop/demos/node-classification/graphsage/graphsage-cora-node-classification-example.ipynb) since this lets us render the colab/binder buttons right underneath the title. So I'd suggest just moving the rest of the markdown in that first cell to a second markdown cell!

@daokunzhang
Copy link
Contributor Author

Hi @kjun9 ,

The notebooks have been formatted as you requested.

Thanks and best regards!
Daokun

@daokunzhang
Copy link
Contributor Author

@kjun9 ,

The code has been updated to use node ilocs in sampled node and link generaters #1267! Could you please help me review it again and merge it ?

Thanks,
Daokun

@kjun9
Copy link
Contributor

kjun9 commented May 7, 2020

@daokunzhang Hi daokun, sorry for the delay I'll take a look now!

Copy link
Contributor

@kjun9 kjun9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @daokunzhang looks great 👍

Before merging, could you revert the mode change on scripts/notebook_text_checker.py using chmod 664 like you did for the previous ones? I'm guessing you might have something in your setup that keeps automatically changing this 😅


Also, there's been a number of updates to how we document each notebook (they're all now included in our API docs https://stellargraph.readthedocs.io/en/latest/demos/index.html ) which means there's a couple of additional steps to add these new notebooks to the API documentation, but I think we can do this separately after landing this first - could you file an issue with a title like "Add keras node2vec demos to API documentation"? And feel free to paste some of the details below in the issue:

  1. Create new files that allow us to include these notebooks in the API docs:
  • docs/demos/embeddings/keras-node2vec-embeddings.nblink
  • docs/demos/node-classification/keras-node2vec-node-classification.nblink

Both of these should contain something like (you can look at another one as reference):

{
  "path": "../../../demos/embeddings/keras-node2vec-embeddings.ipynb"
}

and there should be symlinks for the images being used by the demo notebooks, similar to https://github.com/stellargraph/stellargraph/blob/develop/docs/demos/node-classification/Cora-features.png

  1. Update the tables in demo README and index.rst files to include an entry for each of these notebooks - I think this is done automatically (or semi-automatically?) using scripts/demo_indexing.py, I can double check exactly what should be done here.

@daokunzhang
Copy link
Contributor Author

daokunzhang commented May 7, 2020

Thanks @kjun9 ,

I have revised the scripts/notebook_text_checker.py file mode change problem. I have created the nblink files for the keras-node2vec-node-classification and keras-node2vec-embeddings demo files in the docs folder and added the word2vec_illustration image file to the docs folder. After that I added the component about keras node2vec to the scripts/demo_indexing.py file and rerun it to overwrite the ReadME files. You can check it.

For my part, I cannot merge this pull request. Maybe it is caused by that Yuriy ever requested changes but he haven't approved yet. Could you please help me merge it on your part if you are convenient?

Best regards!
Daokun

@kjun9
Copy link
Contributor

kjun9 commented May 7, 2020

Thanks @daokunzhang I can see the notebook in the API documentation 👍 I'll merge this now.

I think the demo-indexing could still be tweaked a bit, so I might still file an issue about that - I think this is the first time where we have two versions of an algorithm presented as demos (gensim vs stellargraph components) so I think it'd be nice for the table to communicate this more clearly, rather than showing them as two separate algorithms.

@daokunzhang
Copy link
Contributor Author

Thanks @kjun9 !

@kjun9
Copy link
Contributor

kjun9 commented May 7, 2020

#1534

@kjun9 kjun9 merged commit 318cd9c into develop May 7, 2020
@kjun9 kjun9 deleted the feature/node2vec branch May 7, 2020 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants