Skip to content

Bigjob Tutorial Part 1: Introduction

melrom edited this page Oct 26, 2012 · 3 revisions

This page is part of the BigJob Tutorial.

Introduction to BigJob

BigJob, a SAGA-based Pilot-Job, is a general purpose Pilot-Job framework. Pilot-Jobs support the use of container jobs with sophisticated workflow management to coordinate the launch and interaction of actual computational tasks within the container. This results in the decoupling of workload submission from resource assignment, allowing a flexible execution model that enables the distributed scale-out of applications on multiple and possibly heterogeneous resources. It allows the execution of jobs without the necessity to queue each individual job.

Additional information about BigJob can be found on the website: http://saga-project.github.com/BigJob/. A comprehensive API documentation is available at http://saga-project.github.com/BigJob/apidoc/.

Below are the descriptions of two important constructs used to build workflows using Pilot-API.

Pilot Description

Pilot description defines the resource specification for managing the jobs on that resource. The following are the resource specifications that need to be provided:

  • service_url - specifies the SAGA Bliss job adaptor and resource hostname on which jobs can be executed. For remote hosts password less login need to be enabled.
  • number_of_processes - specifies the total number of processes need to be allocated to run the jobs.
  • queue - specifies the job queue to be used.
  • working_directory - specifies the directory in which the Pilot-Job agent executes
  • walltime - specifies the number of minutes the resources are requested.
  • file_transfer - specifies the files that need to be transferred in order to execute the jobs successfully. Generally files common to all the jobs need to be listed here.
pilot_compute_description.append({ "service_url": "sge+ssh://localhost",
                                   "number_of_processes": 12,
                                   "allocation": "XSEDE12-SAGA",
                                   "queue": "development",
                                   "working_directory": os.getenv("HOME")+"/agent",
                                   "walltime":10
                                })

Compute Unit Description

The Compute Unit Description allows the user to specify the actual job parameters and data needed to execute the job.

  • executable - specifies the executable.
  • arguments - specifies the list of arguments to be passed to executable.
  • environment - specifies the list of environment variables to be set for the successful of job execution.
  • working_directory - specifies the directory in which the job has to execute. If not specified Pilot-Job creates a default directory.
  • number_of_processes - specifies the number of processes to be assigned for the job execution.
  • spmd_variation - specifies the type of job. By default it is single job.
  • output - specifies the file in which the standard output of the job execution to be stored.
  • error - specifies the file in which the standard error of the job execution to be stored.
  • file_transfer - specifies the files that need to be transferred in order to execute the job successfully. Generally files specific to the job need to be listed here.
compute_unit_description = { "executable": "/bin/echo",
                             "arguments": ["Hello","$ENV1","$ENV2"],
                             "environment": ['ENV1=env_arg1','ENV2=env_arg2'],
                             "number_of_processes": 4,            
                             "spmd_variation":"mpi",
                             "output": "stdout.txt",
                             "error": "stderr.txt"
                           }    

Back: [Tutorial Home](BigJob Tutorial)    Next: BigJob Tutorial Part 2: Installation