Schedule a real-time analytic task & a source that emits events.
We will now configure a Source to emit data into the Kafka brokers. A real-time analytic task using Spark Streaming will then consume the data and write the results to the spatiotemporal-store. The spatiotemporal-store uses Elasticsearch to efficiently index observations by space, time, and all the other attributes of the event. The JavaScript map app periodically queries to reflect the latest state of observations on a map.
Step 1: Review the taxi-stream spark streaming task marathon configuration found at spatiotemporal-esri-analytics/taxi-stream.json. Breaking the marathon app configuration file down:
- deploys a spark streaming 'taxi-stream' job using the mesosphere/spark:1.1.1-2.2.0-hadoop-2.7 Docker image.
- the --class gets bootstraped in via a URI that is downloaded prior to the start of each worker task
- each worker task is allocated 2 cpu shares & 1GB of memory
- each worker task starts up with the spark-submit command with lots of application specific parameters
Step 2: To schedule 'task-stream' go to the DC/OS dashboard and navigate to 'Services - Services'. To run a new Service click the '+' button at the top right of the Services screen.
Step 3: Click the 'Single Container' option.
Step 4: Toggle the 'JSON EDITOR' button to on and cut & paste the contents of spatiotemporal-esri-analytics/taxi-stream.json into the JSON area.
Step 5: Click the 'REVIEW & RUN' button, review the service configuration & click the 'RUN SERVICE' button to schedule 'taxi-stream'.
Step 6: On the 'Services' page note that 'taxi-stream' is in 'Deploying' status. note: The first time you deploy the service it will download the .jar file from S3 and will likely take a couple of minutes so be patient.
Step 7: Once the 'taxi-stream' shows a status of 'Running' click on 'taxi-stream' to see more information.
Step 8: 'taxi-stream' is a Spark Streaming job. Here we can see the host where the Spark Streaming driver was scheduled to as well as the status of the driver. To see the actual worker tasks we must dive into the Mesos Dashboard.
Step 9: Open the Mesos dashboard to view the tasks of 'taxi-stream'. Here we can see the driver task 'taxi-stream' and it's corresponding worker tasks 'taxi-rat 0', 'taxi-rat 1' and 'taxi-rat 2'. note: 'rat' is an abbreviation for real-time analytic task.
Step 10: To view the progress of the spark streaming job click on the 'Sandbox' of the driver task 'taxi-stream'.
Step 11: In the Sandbox of a task we can gain access to the output files such as the stdout file to monitor the verbose print outs of the 'taxi-stream' task. Click on the 'stdout' link to view this. The stdout file is showing that it is saving 0 records to Elasticsearch. This is because we have not yet enabled a 'taxi-source' that will emit events to Kafka for this Spark Streaming job to consume.
Congratulations: You have ...