Skip to content

Latest commit

 

History

History
 
 

line

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

LINE: Large-scale Information Network Embedding

LINE is an algorithmic framework for embedding very large-scale information networks. It is suitable to a variety of networks including directed, undirected, binary or weighted edges. Based on PGL, we reproduce LINE algorithms and reach the same level of indicators as the paper.

Datasets

Flickr network is a social network, which contains 1715256 nodes and 22613981 edges.

You can dowload data from here.

Flickr network contains four files:

  • flickr-groupmemberships.txt.gz
  • flickr-groups.txt.gz
  • flickr-links.txt.gz
  • flickr-users.txt.gz

After downloading the data,uncompress them, let's say, in ./data/flickr/ . Note that the current directory is the root directory of LINE model.

Then you can run the below command to preprocess the data.

python data_process.py

Then it will produce three files in ./data/flickr/ directory:

  • nodes.txt
  • edges.txt
  • nodes_label.txt

Dependencies

  • paddlepaddle>=1.6
  • pgl

How to run

For examples, use gpu to train LINE on Flickr dataset.

# multiclass task example
python line.py --use_cuda --order first_order --data_path ./data/flickr/ --save_dir ./checkpoints/model/

python multi_class.py --ckpt_path ./checkpoints/model/model_epoch_20 --percent 0.5

Hyperparameters

  • -use_cuda: Use gpu if assign use_cuda.
  • -order: LINE with First_order Proximity or Second_order Proximity
  • -percent: The percentage of data as training data

Experiment results

Dataset model Task Metric PGL Result Reported Result
Flickr LINE with first_order multi-label classification MacroF1 0.626 0.627
Flickr LINE with first_order multi-label classification MicroF1 0.637 0.639
Flickr LINE with second_order multi-label classification MacroF1 0.615 0.621
Flickr LINE with second_order multi-label classification MicroF1 0.630 0.635