Skip to content

deploy Dask.distributed to GridEngine (sge/pbs) cluster systems

License

Notifications You must be signed in to change notification settings

mfouesneau/dasksge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dask on GridEngine system

Deploy a pool of Dask workers through SGE/PBS system using the drmaa library.

This package extends dask.distributed, a lightweight library for distributed computing in Python, which exports dask APIs to moderate sized clusters. One signle object will start scheduler and workers and will also takes care of stopping them once the work is done.

The default behavior is to run as many workers as requested with 1 thread each. The main class takes care of starting a scheduler similar to distributed.LocalCluster (which includes web monitor for instance) and submit an array of jobs to the SGE/PBS system to start dask-worker processes.

Note that the workers are using the command line dask-worker for simplicity.

Examples

A first example could be

sge = GridEngineScheduler()
sge.start_pool(10)
# ... work ...
sge.stop_pool()

But the provided scheduler can also be used as context manager:

with GridEngineScheduler(nworkers=10) as sge:
    # ... work ....

Installation

pip install 'git+ssh://git@github.com/mfouesneau/dasksge.git'

This work was inspired by Matthew Rocklin's blog post

About

deploy Dask.distributed to GridEngine (sge/pbs) cluster systems

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages