Skip to content

A bigclam algorithm spark implementation,which is resource-efficent and can be applied to huge network

License

Notifications You must be signed in to change notification settings

xiuechen/bigclamSpark-distribute

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bigclamSpark-distribute

This project is sparked by https://github.com/thangdnsf/BigCLAM-ApacheSpark

which also implements BigCLAM models proposed by Yang and Leskovec (2013),

I change most of the collectasmap and broadcast code into rdd join to make it more resource-efficent and robust.

I use this code to detect the communities of network which has tens of millions of nodes in my work,and it worked.

Important Notices of the code:

1.In Bigclam.scala,the graphpath file need to contain paires of edges in network whose lines are delimited by "\n",and whose node is delimited by "\t" like:

1\t2\n 3\t4\n

2.In Bigclam.scala,the nodeid need to be in range(0~max(num_nodes)-1),where num_nodes means the number of distinct nodes in the graph file

3.use sbt assembly to compile the program

About

A bigclam algorithm spark implementation,which is resource-efficent and can be applied to huge network

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages