-
Notifications
You must be signed in to change notification settings - Fork 0
Apache Livy on DC OS
Apache Livy on DC/OS
The default Livy installation assumes that Spark are installed on the Mesos agent, which can rarely be the case on a generic DC/OS cluster. As a consequence, if you create a Spark session without
specifying spark.mesos.executor.docker.image
, Mesos will create LXC containers that load Spark libraries and executables from the Mesos agents they are running on, and raise errors due to missing
files. Instead, as shown in the code snippet below, you should point the new Spark session to a Docker image in which all the Spark libraries and executables (e.g., pyspark, sparkR) are installed.
The docker image will be used to start Spark executor containers for running Spark tasks submitted to this session.
import json
import requests
host = 'http://<livy-host>:8998'
data = {
'kind': 'spark',
'conf':{
'spark.mesos.executor.docker.image': 'heliumdatacommons/spark:1.0.9-2.1.0-1-hadoop-2.6',
'spark.mesos.executor.home': '/opt/spark/dist',
}
}
headers = {'Content-Type': 'application/json'}
# create a Spark session
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)
print(r.json())
print(r.headers['location'])
Current Livy is built on top of heliumdatacommons/spark:1.0.9-2.1.0-1-hadoop-2.6
. Make sure you are using the same Docker image for creating Spark sessions.
Inconsistent versions of Spark running on Livy and the Spark executors will cause compatibility issues and fail Spark tasks.