Readme.md and data processing #1

JunhengH · 2018-01-15T00:31:39Z

Hi, Could you please provide some information on your NetMF code and data preparation? Thank you very much.

Davidham3 · 2018-01-15T13:55:45Z

Hi, to run this NetMF code, you need a python2.7 environment with numpy, scipy, theano, and scikit-learn, one easy way to install all these required packages is using Anaconda Python distribution rather than official python distribution.
The dataset should be a .mat file, which contains a variable named 'network'. The 'network' variable's type should be Compressed Sparse Column format, which is a sparse matrix in scipy, called ’csc_matrix'. You can download an example dataset called "blogcatalog" from this repository, https://github.com/phanein/deepwalk, in which a folder called example_graphs contains that dataset. That dataset was also mentioned in the NetMF paper.
After preparing all of these steps above, you can run that code, using 'python netmf.py -h', then you can see the instruction which can guide you to adjust the parameters for NetMF. Actually, the author proposed two methods to implement NetMF, one for small window size T, which do not need to use eigen-vector to approximate the original matrix, the other one for large window size T, in which you need to specify the parameter h, which means how many eigen-vector you want to use to approximate the original matrix. So according to your application, you need to choose which type of NetMF you want use. Using the parameter --small or --large can choose which type of NetMF you want to run. There are two example to run netmf.py:

python netmf.py --input blogcatalog.mat --dim 128 --window 1 --small --output test
python netmf.py --input blogcatalog.mat --dim 128 --window 10 --rank 1024 --large --output test
Actually, the output in the example above, I use '--output test' to specify my output file's name is 'test', but when the computing process finish, the program will generate a file called 'test.npy', that's a file you need to load using numpy, which can be written in python as:
output = numpy.load('test.npy')
And then you will get an output matrix.

JunhengH · 2018-01-15T19:03:40Z

@Davidham3 Thank you so much for your information in detail!

JunhengH · 2018-01-16T23:22:45Z

@Davidham3
We reimplement NetMF with your experiments on BlogCatalog and results seems a little bit different. Could you please kindly send me all your configuration on dim/neg sample rate/T, etc or your vector embedding to avoid mistake caused by my reproduction?

Another issue is that embedding procedure in flickr dataset with 80K nodes is resource-wasting (over 16G) when doing Eigen decomposition or sparse matrix todense() operation. I appreciate your suggestions.

My email is jhao@cs.ucla.edu. Thank you very much for your response and connection.

franz101 · 2019-03-24T01:46:40Z

Getting an error here too...
https://colab.research.google.com/drive/1k5NLfvLniM4A_v0VWckUm8S1Nmdhzcn_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md and data processing #1

Readme.md and data processing #1

JunhengH commented Jan 15, 2018

Davidham3 commented Jan 15, 2018

JunhengH commented Jan 15, 2018

JunhengH commented Jan 16, 2018

franz101 commented Mar 24, 2019

Readme.md and data processing #1

Readme.md and data processing #1

Comments

JunhengH commented Jan 15, 2018

Davidham3 commented Jan 15, 2018

JunhengH commented Jan 15, 2018

JunhengH commented Jan 16, 2018

franz101 commented Mar 24, 2019