-
Notifications
You must be signed in to change notification settings - Fork 0
First Run
At this page you can find instructions for compiling and starting the applications to use PostgreSQL, Spark SQL or Spark Streaming as a back end for STARQL. Before you follow these instructions have a look at the Home page.
You can use the example data under STARQL/Example
from the repository for a first run. It consists of a PostgreSQL Backup, STARQL Queries, OWL-File and a OBDA-File.
Make sure you have no Database with the name sport
. Then you can restore the PostgreSQL Backup Data.sql
with the command:
user@master:~$ git clone git@github.com:SimonUzL/STARQL.git
user@master:~$ cd STARQL/Example/
user@master:~$ sudo -u postgres psql < Data.sql
Clone the repository and switch to the directory STARQL/Starqlpostgres/
and build the projekt with maven.
user@master:~$ git clone git@github.com:SimonUzL/STARQL.git
user@master:~$ cd STARQL/Starqlpostgres/
user@master:~$ mvn clean compile assembly:single
Start it with:
user@master:~$ java -jar target/Starqlpostgres-0.0.1-SNAPSHOT.jar
Results will be automatically saved as a table in the current database under the name resultOfStarql
plus the current time stamp in milliseconds.
Clone the repository after you have set up and started Apache Hadoop and Spark correctly:
user@master:~$ git clone git@github.com:SimonUzL/STARQL.git
Switch to the directory STARQL/Spark/
and build the project with maven:
user@master:~$ cd STARQL/Historic/
user@master:~$ mvn clean compile assembly:single
user@master:~$ sudo su starql
starql@master:~$ cp target/Historic-0.0.1-SNAPSHOT.jar /opt/spark-2.1.1-bin-hadoop2.7/
Now you can run the application with the spark submit script:
starql@master:~$ cd /opt/spark-2.1.1-bin-hadoop2.7/
starql@master:~$ ./bin/spark-submit --class de.uzl.ifis.Historic.App --master spark://master:7077 Historic-0.0.1-SNAPSHOT.jar hdfs://master:9000/tmp/ [number of partitons]
Choose the [number of partitons] as the maximum value of spark.default.parallelism
and spark.sql.shuffle.partitions
.
Clone the repository after you have set up and started Apache Hadoop and Spark correctly:
user@master:~$ git clone git@github.com:SimonUzL/STARQL.git
Switch to the directory STARQL/Streaming/
and build the project with maven:
user@master:~$ cd STARQL/Streaming/
user@master:~$ mvn clean compile assembly:single
user@master:~$ sudo su starql
starql@master:~$ cp target/Streaming-0.0.1-SNAPSHOT.jar /opt/spark-2.1.1-bin-hadoop2.7/
You need a generator to generate stream data. It generates random data and writes it to port 9999. The data is compatible with the example data under 'STARQL/Example/':
user@master:~$ cd STARQL/Generator/
user@master:~$ mvn clean compile assembly:single
You can start it with:
user@master:~$ java -jar target/Generator-0.0.1-SNAPSHOT.jar [num of profiles] [frequency in ms]
Switch to /opt/spark-2.1.1-bin-hadoop2.7/
and start the application:
user@master:~$ sudo su starql
starql@master:~$ cd /opt/spark-2.1.1-bin-hadoop2.7/
starql@master:~$ ./bin/spark-submit --class de.uzl.ifis.Streaming.App --master spark://master:7077 Streaming-0.0.1-SNAPSHOT.jar [place for static data from postgres] [num of partitions] spark://master:7077
Choose the [number of partitons]
as the maximum value of spark.default.parallelism
and spark.sql.shuffle.partitions
. For [place for static data from postgres]
you can specify any folder, when you use only one node. When you use more than one, then use the hadoop file system and your path should be something like: hdfs://master:9000/<any path>
.
When the application is running it will ask you for a ip address and a port. Use master
(When master is the name of your master node) for the ip and port 9999.