Building Apache Spark

The instructions provided below specify the steps to build Apache Spark version 2.4.4 in Standalone Mode on Linux on IBM Z for the following distributions:

RHEL (7.5, 7.6, 7.7, 8.0)
SLES (12 SP4, 15 SP1)
Ubuntu (16.04, 18.04, 19.10)

General Notes:

When following the steps below please use a standard permission user unless otherwise specified.
A directory /<source_root>/ will be referred to in these instructions, this is a temporary writeable directory anywhere you'd like to place it.

Step 1 : Build using script

If you want to build Spark using manual steps, go to STEP 2.

Use the following commands to build Spark using the build script. Please make sure you have wget installed.

wget -q https://raw.githubusercontent.com/linux-on-ibm-z/scripts/master/ApacheSpark/2.4.4/build_spark.sh

# Build Spark 
bash build_spark.sh   [Provide -h option to print help menu, -j IBM to run with IBMSDK]

If the build completes successfully, go to STEP 4. In case of error, check logs for more details or go to STEP 2 to follow manual build steps.

Step 2. Building Apache Spark

2.1) Install the dependencies

export SOURCE_ROOT=/<source_root>/

RHEL (7.5, 7.6, 7.7, 8.0)
```
 sudo yum groupinstall -y 'Development Tools' 
 sudo yum install -y wget tar git libtool autoconf maven make patch 
```
- With AdoptOpenJDK 9
  - Download and install AdoptOpenJDK (OpenJDK8 with Eclipse OpenJ9) from here
- With IBM SDK
  - Download and Install IBM SDK from here

SLES (12 SP4, 15 SP1)

 sudo zypper install -y wget tar git libtool autoconf gcc make  gcc-c++ zip unzip gzip gawk patch

With AdoptOpenJDK 9
- Download and install AdoptOpenJDK (OpenJDK8 with Eclipse OpenJ9) from here
With IBM SDK
- Download and Install IBM SDK from here

Install maven

cd $SOURCE_ROOT
wget http://mirrors.estointernet.in/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz 
tar -xvf apache-maven-3.6.3-bin.tar.gz
export PATH=$PATH:$SOURCE_ROOT/apache-maven-3.6.3/bin

Ubuntu (16.04, 18.04, 19.10)
```
 sudo apt-get install -y wget tar git libtool autoconf build-essential maven patch
```
- With AdoptOpenJDK 9
  - Download and install AdoptOpenJDK (OpenJDK8 with Eclipse OpenJ9) from here
- With IBM SDK
  - Download and Install IBM SDK from here

2.2) Build LevelDB JNI

Download and configure Snappy

   cd $SOURCE_ROOT
   wget https://github.com/google/snappy/releases/download/1.1.3/snappy-1.1.3.tar.gz
   tar -zxvf  snappy-1.1.3.tar.gz
   export SNAPPY_HOME=`pwd`/snappy-1.1.3
   cd ${SNAPPY_HOME}
   ./configure --disable-shared --with-pic
   make
   sudo make install

Download the source code for LevelDB and LevelDB JNI

    cd $SOURCE_ROOT
    git clone -b s390x https://github.com/linux-on-ibm-z/leveldb.git
    git clone -b leveldbjni-1.8-s390x https://github.com/linux-on-ibm-z/leveldbjni.git

Set the environment variables

export JAVA_HOME=/<path to JDK>/
export PATH=$JAVA_HOME/bin:$PATH
export LEVELDB_HOME=`pwd`/leveldb
export LEVELDBJNI_HOME=`pwd`/leveldbjni
export LIBRARY_PATH=${SNAPPY_HOME}
export C_INCLUDE_PATH=${LIBRARY_PATH}
export CPLUS_INCLUDE_PATH=${LIBRARY_PATH}

Apply the LevelDB patch

cd ${LEVELDB_HOME}
git apply ${LEVELDBJNI_HOME}/leveldb.patch
make libleveldb.a

Build the jar file

cd ${LEVELDBJNI_HOME}
mvn clean install -P download -Plinux64-s390x -DskipTests
jar -xvf ${LEVELDBJNI_HOME}/leveldbjni-linux64-s390x/target/leveldbjni-linux64-s390x-1.8.jar
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SOURCE_ROOT/leveldbjni/META-INF/native/linux64/s390x

2.3) Build ZSTD JNI

RHEL (7.5, 7.6, 7.7, 8.0)

curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
sudo yum install -y sbt

SLES (12 SP4, 15 SP1)

cd $SOURCE_ROOT
wget https://piccolo.link/sbt-1.2.8.zip
unzip sbt-1.2.8.zip
export PATH=$PATH:$SOURCE_ROOT/sbt/bin/

Ubuntu (16.04, 18.04, 19.10)

echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2EE0EA64E40A89B84B2DF73499E82A75642AC823
sudo apt-get update
sudo apt-get install sbt

cd $SOURCE_ROOT
git clone https://github.com/luben/zstd-jni.git
cd zstd-jni
git checkout v1.3.8-2
sbt compile test package
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SOURCE_ROOT/zstd-jni/target/classes/linux/s390x/

2.4) Set Environment Variables

export MAVEN_OPTS="-Xmx3g -XX:ReservedCodeCacheSize=1024m"
export HADOOP_USER_NAME="hadoop" ( IBM SDK only )
ulimit -s unlimited
ulimit -n 999999