Skip to content

GiovanniPaoloGibilisco/spark-log-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains two projects

Spark Logger Parser

A tool built with SparkSQL and HIVE to derive some performance metrics from the log files generated by spark. In order to use the tool you need a version of Spark that include support for Hive. To include such support in your spark build blease look at the Spark build documentation

Usage

To generate log files from Spark you should add the following properties to your spark-defaults.conf:

  • spark.eventLog.enabled true
  • spark.eventLog.dir file://some/directory/of/your/choiche

Note that the spark event log directory can also be on another file system like hdfs (e.g. hdfs://user/logs) In order to build the DAG from from the logs the latest development versionof Spark (1.4.0.snaphot) should be used.

Use parameter -u --usage to show the Usage Guide

Once built has been built can be invoked by using spark-submit script available in spark

Build

The tool can be built using Maven with

mvn clean package -Dmaven.test.skip=true

it will generate a fat jar with all the needed dependencies

Performance Estimator

The Performance Estimator is a tool that estimates the runtime of a Spark application starting from an estimation of the time spent on each stages. It uses the DAG build by Spark Log Processor (and exported with -e parameter)

Usage

To use the performance estimation tool just, first build it then run the executable jar using as parameters:

  • -i to specify the input folder containign the DAGs (or a single DAG) of each Job
  • -p to specify the input file containg the performance information of stages, such a file can be exported by the Spark Logger Parser tool

Note: currently only DAGs containing stages can be processed.

To build the tool run mvn install

About

Spark application log processor to derive some performance metrics

Resources

License

Stars

Watchers

Forks

Packages

No packages published