First Run

At this page you can find instructions for compiling and starting the applications to use PostgreSQL, Spark SQL or Spark Streaming as a back end for STARQL. Before you follow these instructions have a look at the Home page.

Cluster example visualization

At the picture above you can see a example of a computer cluster consisting of several nodes. One master node and several slaves (slave00, slave01, ...). The master node is connected over ssh with the slave nodes. In the instructions are the names used as in the picture. Your nodes may have another names! You can see the names under `/etc/hosts`.

Example Data

You can use the example data under STARQL/Example from the repository for a first run. It consists of a PostgreSQL Backup, STARQL Queries, OWL-File and a OBDA-File.

Make sure you have no Database with the name sport. Then you can restore the PostgreSQL Backup Data.sql with the command:

user@master:~$ git clone git@github.com:SimonUzL/STARQL.git
user@master:~$ cd STARQL/Example/
user@master:~$ sudo -u postgres psql < Data.sql

PostgreSQL

Clone the repository and switch to the directory STARQL/Starqlpostgres/ and build the projekt with maven.

user@master:~$ git clone git@github.com:SimonUzL/STARQL.git
user@master:~$ cd STARQL/Starqlpostgres/
user@master:~$ mvn clean compile assembly:single

Start it with:

user@master:~$ java -jar target/Starqlpostgres-0.0.1-SNAPSHOT.jar

Results will be automatically saved as a table in the current database under the name resultOfStarql plus the current time stamp in milliseconds.

Spark SQL

Clone the repository after you have set up and started Apache Hadoop and Spark correctly:

user@master:~$ git clone git@github.com:SimonUzL/STARQL.git

Switch to the directory STARQL/Spark/ and build the project with maven:

user@master:~$ cd STARQL/Historic/
user@master:~$ mvn clean compile assembly:single
user@master:~$ sudo su starql
starql@master:~$ cp target/Historic-0.0.1-SNAPSHOT.jar /opt/spark-2.1.1-bin-hadoop2.7/

Now you can run the application with the spark submit script:

starql@master:~$ cd /opt/spark-2.1.1-bin-hadoop2.7/
starql@master:~$ ./bin/spark-submit --class de.uzl.ifis.Historic.App --master spark://master:7077 Historic-0.0.1-SNAPSHOT.jar hdfs://master:9000/tmp/ [number of partitons]

Choose the [number of partitons] as the maximum value of spark.default.parallelism and spark.sql.shuffle.partitions.

Spark Streaming

Clone the repository after you have set up and started Apache Hadoop and Spark correctly:

user@master:~$ git clone git@github.com:SimonUzL/STARQL.git

Switch to the directory STARQL/Streaming/ and build the project with maven:

user@master:~$ cd STARQL/Streaming/
user@master:~$ mvn clean compile assembly:single
user@master:~$ sudo su starql
starql@master:~$ cp target/Streaming-0.0.1-SNAPSHOT.jar /opt/spark-2.1.1-bin-hadoop2.7/

You need a generator to generate stream data. It generates random data and writes it to port 9999. The data is compatible with the example data under 'STARQL/Example/':

user@master:~$ cd STARQL/Generator/
user@master:~$ mvn clean compile assembly:single

You can start it with:

user@master:~$ java -jar target/Generator-0.0.1-SNAPSHOT.jar [num of profiles] [frequency in ms]

Switch to /opt/spark-2.1.1-bin-hadoop2.7/ and start the application:

user@master:~$ sudo su starql
starql@master:~$ cd /opt/spark-2.1.1-bin-hadoop2.7/
starql@master:~$ ./bin/spark-submit --class de.uzl.ifis.Streaming.App --master spark://master:7077 Streaming-0.0.1-SNAPSHOT.jar [place for static data from postgres] [num of partitions] spark://master:7077

Choose the [number of partitons] as the maximum value of spark.default.parallelism and spark.sql.shuffle.partitions. For [place for static data from postgres] you can specify any folder, when you use only one node. When you use more than one, then use the hadoop file system and your path should be something like: hdfs://master:9000/<any path>.

When the application is running it will ask you for a ip address and a port. Use master (When master is the name of your master node) for the ip and port 9999.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First Run

Example Data

PostgreSQL

Spark SQL

Spark Streaming

Clone this wiki locally