-
Notifications
You must be signed in to change notification settings - Fork 53
/
mainpage.dox
60 lines (60 loc) · 3.76 KB
/
mainpage.dox
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/**
\mainpage The mainpage documentation
* FlashGraph is an SSD-based graph analysis framework that we designed to
* process graphs with billions of vertices and hundreds of billions of edges
* or even larger. We extend FlashGraph to support processing more data
* structures such as sparse matrices and dense matrices. As such, FlashGraph
* is now able to support a wide variety of data mining and machine learning
* algorithms. We address the entire data analysis framework with FlashGraph-ng.
*
* The current implementation of FlashGraph-ng has four main components:
* - _SAFS_: a user-space filesystem designed for large SSD arrays.
* It is capable of achieving the maximal I/O throughput from SSD arrays
* (millions of I/O per second for random I/O and tens of gigabytes per second
* for sequential I/O). The I/O performance is further magnified by
* the light-weight page cache if the workload can generate cache hits.
*
* - _FlashGraph_: a general-purpose graph analysis framework that exposes
* vertex-centric programming interface for users to express varieties of
* graph algorithms. FlashGraph scales graph computation to large graphs
* by keeping the edges of a graph on SSDs and computation state in memory.
* With smart I/O scheduling, FlashGraph is able to achieve performance
* comparable to state-of-art in-memory graph analysis frameworks and
* significantly outperforms state-of-art distributed graph analysis
* frameworks while being able to scale to graphs with billions of vertices
* and hundreds of billions of edges.
*
* - _FlashMatrix_: a matrix computation engine that provides a small set of
* generalized matrix operations on sparse matrices and dense matrices to
* express varieties of data mining and machine learning algorithms.
* For certain graph algorithms such as PageRank, which can be formulated as
* sparse matrix multiplication, FlashMatrix is able to significantly
* outperform FlashGraph.
*
* - _FlashR_: extends the existing R programming framework to process datasets
* at a scale of terabytes. FlashR integrates the matrices and generalized
* operators of FlashMatrix with R so that R users can implement many data
* mining and machine learning algorithms completely in R with performance
* comparable to optimized C implementations. FlashR reimplements
* many existing R functions for matrices to provide users a familiar
* R programming environment. FlashR is implemented as a regular R package.
*
* The figure below shows the architecture of FlashGraph-ng. At the bottem is
* SAFS, which sits on top of an array of SSDs and exposes a unified asynchronous
* I/O interface to the data analysis frameworks. On the left is FlashGraph,
* which exposes a vertex-centric programming interface for users to express
* a varieties of graph algorithms. FlashGraph contains a set of graph algorithm
* library written in C++. The graph library is integrated with R so that
* R users can invoke the graph algorithms in R directly. On the right,
* FlashMatrix provides both in-memory and external-memory vector and
* matrix implementations as well as a small set of generalized operators
* to perform computation on the vectors and matrices. FlashMatrix has
* an optimizer that optimizes a sequence of operations to achieve performance
* of an application comparable to a manually optimized C/C++ implementation.
* FlashR integrates the generalied operators in FlashMatrix to R and
* reimplements the existing R matrix operations with FlashMatrix.
* In the future, we will provide an R compiler so that R users can implement
* the user-defined functions in R and pass them to the generalized operators
* to perform actual computation.
* ![Architecture](architecture.png "The architecture of FlashGraph-ng")
*/