Skip to content

Troubleshooting

Simon Schiff edited this page Mar 4, 2017 · 8 revisions

At this page you can check your setup.

Cluster example visualization

At the picture above you can see a example of a computer cluster which consists of several nodes. One master node and several slaves (slave00, slave01, ...). The master node is connected over ssh with the slave nodes. In the instructions are the names used as in the picture. Your nodes may have another names! You can see the names under `/etc/hosts`.
  1. Correct permissions
    All these folders with its contents should be owned by the user starql in the group cluster at every node:

    user@master:/opt$ ls -lha
    drwxr-xr-x 10 starql cluster 4,0K Jan 19 21:00 hadoop
    drwxr-xr-x  3 starql cluster 4,0K Jan 19 20:52 hadoop_tmp
    drwxr-xr-x 16 starql cluster 4,0K Mär  1 09:20 spark
  2. SSH
    At the master node you should be able to connect to the master node:

    user@master:~$ sudo su starql
    starql@master:~$ ssh master

    Now you should be connect the the master node

    At the master node you should be able to connect to every slave node:

    user@master:~$ sudo su starql
    starql@master:~$ ssh slave00
    starql@master:~$ ssh slave01
    ...
  3. PostgreSQL
    You should be able at every node to connect to PostgreSQL which is running at the master node. Try this at every node:

    sudo -u postgres psql -h master -p 5432 -U postgres
  4. Apache Spark
    You can set the logging level back to INFO in the log4j.properties. For this change the line

    log4j.rootCategory=ERROR, console

    to

    log4j.rootCategory=INFO, console

    It can help a lot, when something is not running as it should.

    You can visit the Spark UI in your Browser under http://master:7077. There are also helpful informations. You should have at least one worker there in the list.

    When you get a out of memory exception while running spark, then you have set up Spark not correctly. You can try to increase the partitions for shuffle and partitioning under the spark-defaults.conf file:

    spark.default.parallelism      [number]
    spark.sql.shuffle.partitions   [number]

    I recommend to use the values as described under Spark Cluster Setup. When you then get a out of memory exception then try to increase the values.

Clone this wiki locally