Spark Janelia

A script for launching batch Spark cluster jobs under an LSF or SGE scheduler.

Earlier versions of this script supported a wider range of options but support for them was removed in late 2020 when Janelia's HPC was upgraded. At that time, there was little interest in updating support for the other options.

Initial Setup

Clone this repository to a location that is accessible from the host(s) where you launch High Performance Compute (HPC) jobs:

cd ${HOME}
git clone https://github.com/JaneliaSciComp/spark-janelia

A python 3.6+ environment is needed to run the spark-janelia script, so make sure that is also available on the host(s) where you launch HPC jobs.

Generate and Launch Run

You can generate a run (see details in next section) and launch it in one step like this:

spark-janelia <args> generate-and-launch-run --submitargs <submit_arguments>

Generate Run

The spark-janelia script generates a "run": a set of parameterized scripts that use your HPC scheduler to start up a Spark cluster, run a driver application on it, and then shut everything down.

By default, the generated run scripts are organized as follows:

${HOME}/.spark             # --run_parent_dir default (can be changed on command line) 
    /<YYYYmmdd_HHMMSS>     # timestamp of run generation
        /conf              # spark configuration files for run
        /logs              # log files for run
        /scripts           # parameterized scripts for run

To generate a run without launching it:

spark-janelia <args> generate-run --submitargs <submit_arguments>

Legacy Arguments

--driverslots|-d  {1-48}                 # override default number of slots (32) for driver
--hard_runtime|-t {hours:minutes}        # override default hard runtime of 8 hours 
--lsf_project|-P  {project name}         # sets project to be billed if not users's own group
--minworkers                             # override minimum number of running workers (1) before driver launches
--nnodes|-n       {#}                    # override maximum number of workers (2)
--submitargs|-s   {args ...}             # arguments to pass to spark-submit when launching driver 
--version|-v      {2.3.1|3.0.1|current}  # override default version of Spark (3.0.1) to use
                                         #   ls -d /misc/local/spark-*

New Arguments

--common_job_args  {args}           # additional arguments for all HPC job submissions (e.g. -q test) 
--consolidate_logs                  # overrides --run_worker_dir and writes worker logs to central run logs directory 
--gb_per_slot      {#}              # override default (15GB) memory allocated per slot
--hpc_type         {lsf|sge}        # override default HPC type of lsf
--java_home        {path}           # override environment ${JAVA_HOME}
--run_local_dirs   {path,...,path}  # override default storage directories used on worker nodes for shuffle and RDD data                                     
--run_parent_dir   {path}           # override default parent directory for run scripts and logs 
--run_worker_dir   {path}           # override default directory on worker nodes containing worker jar and log files 
--spark_home       {path}           # override environment ${SPARK_HOME} and --version with this explicit path 
--worker_opts      {args}           # additional arguments to pass to spark-class when launching workers 
                                    #   e.g. -Dspark.worker.cleanup.enabled=true (to remove SPARK_WORKER_DIR on exit) 
--worker_slots     {#}              # override default numbner of slots (32) for each worker

Note: Janelia cluster resource information is listed at http://wiki.int.janelia.org/wiki/display/ScientificComputing/Janelia+Compute+Cluster

Name		Name	Last commit message	Last commit date
Latest commit History 346 Commits
templates		templates
README.md		README.md
flintstone.sh		flintstone.sh
spark-janelia		spark-janelia

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Janelia

Initial Setup

Generate and Launch Run

Generate Run

Legacy Arguments

New Arguments

About

Releases

Packages

Languages

JaneliaSciComp/spark-janelia

Folders and files

Latest commit

History

Repository files navigation

Spark Janelia

Initial Setup

Generate and Launch Run

Generate Run

Legacy Arguments

New Arguments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages