Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate a trillion edge graph efficiently? And how to convert bigger file than memory size to G-Store format? #1

Open
huiyiliusha opened this issue Nov 30, 2017 · 4 comments
Assignees

Comments

@huiyiliusha
Copy link

  • Could you tell me how to generate a trillion edge graph efficiently? If you have another generator,can you share with me?
  • Could you tell me how to convert bigger file than memory size to G-Store format? Is it the following command?
./gstoreu -s 25 -i kron_25_16b.dat -c 2 -o tile_25_16b.dat
  • If the machine used for the experiments has only 8GB main memory, How big is the largest graph that it convert to G-Store format?
    I'm looking forward to you reply. Thanks a lot.
@pradeep-k
Copy link
Member

pradeep-k commented Nov 30, 2017

There is a separate generator to generate trillion edge graphs. You can access the generator here:
https://github.com/pradeep-k/gConv/tree/master/g500_gen

Follow the instructions. In case of any issues, please let me know.


There are two converters right now. One that is completely in-memory (input graph and G-Store graph both in memory). Another is little tricky and need smaller amount of memory. See next the post below for details.

I will have a look in the options of G-Store and will provide you more information shortly.

@pradeep-k
Copy link
Member

./gstoreu -s 25 -i kron_25_16b.dat -c 2 -o tile_25_16b.dat
worked fine. However, for this graph I had to change bit_shift0 to 24 (it was 26 and will crash). See gstore.h file.

However, for the memory guarantee, I have to look into the details of the overall workflow of converter.
Based on bit_shift0, the big edge list file is broken into many files (based on vertices range,) and written to disk. Then, each of those files generate tiles at independent locations, and hence we read each of the intermediate file and convert it to g-store format.

Hence, the memory requirement should be limited by the size of the buffer that is used for reading the original file, and buffer size of intermediate file into number of intermediate files in the first part. The second part is limited by one intermediate file size. But, we had access to some large memory machines, we didn't rigorously test it for memory limit in the coverter case.

Let me know if you need strict memory guarantee, I can explore the various variables that control the memory, and can let you know.

@pradeep-k pradeep-k self-assigned this Nov 30, 2017
@huiyiliusha
Copy link
Author

Thank you very much for your reply and I will try the generator later.

As you say, the second converter requires only G-Store graph to be in the memory, while input graph need only small amount of memory, so Does converting an Kron-31-256 graph the paper( G-Store: High-Performance Graph Store for Trillion-Edge Processing) used require more than 2TB memory? In other words, I want to know a test for an Kron-31-256 graph in gstore including converting and computing, how much memory it requires? In your test,how much memory the machine?

Now, I have a machine that has only 16GB main memory, I want to test gstore with the machine, so I want to know how big graph it can convert and process respectively.

@pradeep-k
Copy link
Member

pradeep-k commented Dec 1, 2017

The second converter part was wrongly written in my first reply. I corrected it.

For memory requirement of converter: Will calculate and will update you soon.

For memory requirement of running the algorithms: G-Store is semi-external, so it needs memory for algorithmic metadata, tile metadata, and few more for other internal metadata.

The streaming and caching of graph file can be configured (use -m option for total memory or change the value of the variables in the code and compile).

The converter was done in a different machine than the one I run algorithms. The converter was run on a 2 TB machine, but I am sure that the converter does not need 2 TB machine. I will let you know approximate value for kron-31-256 soon.

Here is the formula to calculate the memory requirement for running the algorithm.
(1)Tile metadata: Number of tiles8 bytes = 8 GB for kron-31-256.
No. of tiles = P
P (directed) or PP/2 (undirected), where P = Number of vertex /2**16
(2)Calculate the memory requirement for algorithmic metadata.
BFS = 2GB for 1-byte per vertex level array.
PageRank (Synchronous version) = 3
8GB = 24 GB. We have 3 metadata (4-byte float) per-vertex.
(3) Streaming and Caching memory = choose as per your wish. The default values should be 512MB+512MB + 7.5GB = 8GB. Change this using -m1024 for 1GB total memory.
(4) Others:
2 bitmap file = 2*P bits (very small)
Other internal data structures = generally small, few MB at max.

So, I believe you will be able to run BFS on 16 GB machine. But PageRank might not run as algorithmic metadata need more memory.

pradeep-k added a commit that referenced this issue Jul 31, 2019
pradeep-k added a commit that referenced this issue Jul 31, 2019
Merge pull request #1 from iHeartGraph/master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants