Used to load the output of the LDBC-SNB Data Generator into Apache Flink DataSets for further processing. The LDBC data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by the data generator, as well as the format of the output files, can be found in the latest version of the official LDBC-SNB specification document.
https://raw.githubusercontent.com/ldbc/ldbc_snb_docs/master/figures/schema.pdf
The tool reads the LDBC output files from a given directory (either local or HDFS) and creates two datasets containing all vertices and edges. Vertices and edges are represented by tuples. A vertex stores an id which is unique among all vertices, a vertex label and key-value properties represented by a HashMap. An edge stores an id which is unique among all edges, an edge label, source and target vertex identifiers and key-value properties.
Add dependency to your maven project:
<repositories>
<repository>
<id>dbleipzig</id>
<name>Database Group Leipzig University</name>
<url>https://wdiserv1.informatik.uni-leipzig.de:443/archiva/repository/dbleipzig/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependency>
<groupId>org.s1ck</groupId>
<artifactId>ldbc-flink-import</artifactId>
<version>0.1</version>
</dependency>
Use in your project
LDBCToFlink ldbcToFlink = new LDBCToFlink(
"/path/to/ldbc/output", // or "hdfs://..."
ExecutionEnvironment.getExecutionEnvironment());
DataSet<LDBCVertex> vertices = ldbcToFlink.getVertices();
DataSet<LDBCEdge> edges = ldbcToFlink.getEdges();
Licensed under the GNU General Public License, v3: http://www.gnu.org/licenses/gpl-3.0.html