Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readme.md and data processing #1

Open
JunhengH opened this issue Jan 15, 2018 · 4 comments
Open

Readme.md and data processing #1

JunhengH opened this issue Jan 15, 2018 · 4 comments

Comments

@JunhengH
Copy link

Hi, Could you please provide some information on your NetMF code and data preparation? Thank you very much.

@Davidham3
Copy link

Hi, to run this NetMF code, you need a python2.7 environment with numpy, scipy, theano, and scikit-learn, one easy way to install all these required packages is using Anaconda Python distribution rather than official python distribution.
The dataset should be a .mat file, which contains a variable named 'network'. The 'network' variable's type should be Compressed Sparse Column format, which is a sparse matrix in scipy, called ’csc_matrix'. You can download an example dataset called "blogcatalog" from this repository, https://github.com/phanein/deepwalk, in which a folder called example_graphs contains that dataset. That dataset was also mentioned in the NetMF paper.
After preparing all of these steps above, you can run that code, using 'python netmf.py -h', then you can see the instruction which can guide you to adjust the parameters for NetMF. Actually, the author proposed two methods to implement NetMF, one for small window size T, which do not need to use eigen-vector to approximate the original matrix, the other one for large window size T, in which you need to specify the parameter h, which means how many eigen-vector you want to use to approximate the original matrix. So according to your application, you need to choose which type of NetMF you want use. Using the parameter --small or --large can choose which type of NetMF you want to run. There are two example to run netmf.py:

  1. python netmf.py --input blogcatalog.mat --dim 128 --window 1 --small --output test
  2. python netmf.py --input blogcatalog.mat --dim 128 --window 10 --rank 1024 --large --output test
    Actually, the output in the example above, I use '--output test' to specify my output file's name is 'test', but when the computing process finish, the program will generate a file called 'test.npy', that's a file you need to load using numpy, which can be written in python as:
    output = numpy.load('test.npy')
    And then you will get an output matrix.

@JunhengH
Copy link
Author

@Davidham3 Thank you so much for your information in detail!

@JunhengH
Copy link
Author

@Davidham3
We reimplement NetMF with your experiments on BlogCatalog and results seems a little bit different. Could you please kindly send me all your configuration on dim/neg sample rate/T, etc or your vector embedding to avoid mistake caused by my reproduction?

Another issue is that embedding procedure in flickr dataset with 80K nodes is resource-wasting (over 16G) when doing Eigen decomposition or sparse matrix todense() operation. I appreciate your suggestions.

My email is jhao@cs.ucla.edu. Thank you very much for your response and connection.

@franz101
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants