Skip to content

Latest commit

 

History

History
73 lines (48 loc) · 2.1 KB

README.md

File metadata and controls

73 lines (48 loc) · 2.1 KB

runlib

Some python utility tools for massively parallel processing and data handling.

  1. Installation

This module can be installed with pip from the git repository:

pip install --no-deps git+http://git@github.com/fsteinmetz/runlib.git
  1. condor.py

A simple python interface to massively parallel computing frameworks. Initially developped for HTCondor ; it has been extended to support Sun Grid Engine (qsub).

Example:

from condor import CondorPool  # or QsubPool

def f(x):
    return x**2

if __name__ == '__main__':
    p = CondorPool()
    results = p.map(f, range(5))
  1. tmpfiles.py

Management of temporary files: inputs to a processing (TmpManager().input), outputs of a processing (TmpManager().output), pure temporary files (TmpManager().file) and temporary directories (TmpManager().directory). Includes several features: cleanup after use, automatic uncompress of input files (gz, bz, tar, zip), check disck space, unique paths, etc.

Example:

    with TmpManager('/tmp/') as tm:  # instantiate the tmp manager on directory '/tmp/'

        # decompress a file to tmp directory and return the name
        # of the decompressed file
        input1 = tm.input('/data/file.gz')

        # if the input is an archive, returns a list of all the files in
        # the archive
        file_list = tm.input('/data/file.tar.gz')

        # returns a temporary file that will be cleaned up
        tmp = tm.file('filename.txt')

        # returns a temporary directory
        dir = tm.directory()

        # returns a filename in tmp directory
        # this file will be created afterwards, and moved to destination
        # upon commit()
        out = tm.output('/data/result.dat') 


        # move all output files to their destination
        # (otherwise they are cleared)
        tm.commit()

    # NOTE: all temporary files are cleared up when leaving the 'with' context
    # even in case of error in the python code.