Skip to content

Latest commit

 

History

History
194 lines (152 loc) · 7.1 KB

README.md

File metadata and controls

194 lines (152 loc) · 7.1 KB

cicodecov

NOTE: The interfaces of flux-sched are being actively developed and are not yet stable. The github issue tracker is the primary way to communicate with the developers.

Fluxion: An Advanced Graph-Based Scheduler for HPC

Welcome to Fluxion1, an advanced job scheduling software tool for High Performance Computing (HPC). Fluxion combines graph-based resource modeling with efficient temporal plan management schemes to schedule a wide range of HPC resources (e.g., compute, storage, power etc) in a highly scalable, customizable and effective fashion.

Fluxion has been integrated with flux-core to provide it with both system-level batch job scheduling and nested workflow-level scheduling.

See our resource-query utility, if you want to test your advanced HPC resource modeling and selection ideas with Fluxion in a simplified, easy-to-use environment.

Fluxion Scheduler in Flux

Fluxion introduces queuing and resource matching services to extend Flux to provide advanced batch scheduling. Jobs are submitted to Fluxion via flux job submit which are then added to our queues for scheduling.

At the core of its functionality lie its two service modules: sched-fluxion-qmanager and sched-fluxion-resource. The first module is designed to manage our job queues and to enforce queueing policies that are configurable (e.g., first-come-first-served, EASY, conservative backfilling policies etc). The second module uses a graph to represent resources of arbitrary types as well as their complex relationships and to match the highly sophisticated resource requirements of a Flux jobspec to the compute and other resources on this graph. Both of these modules are loaded into a Flux instance and work in tandem to provide highly effective scheduling.

Clearly, we recognize that a single scheduling policy will not sufficiently optimize the scheduling of different kinds of workflows. In fact, one of the main design points of flux-sched is its ability to customize the scheduling behaviors. Users can use environment variables or module-load time options to select and to tune the policies as to how resources are selected and when to run their jobs.

Overall, the advanced job scheduling facility within Fluxion offers vastly many opportunities for modern HPC and other worfklows to meet their highly challenging scheduling objectives.

Building Fluxion

Fluxion requires an installed flux-core package. Instructions for installing flux-core can be found in the flux-core README.

Click to expand and see our full dependency table

Fluxion also requires the following packages to build:

redhat ubuntu version note
hwloc-devel libhwloc-dev >= 1.11.1
boost-devel libboost-dev == 1.53 or > 1.58 1
boost-graph libboost-graph-dev == 1.53 or > 1.58 1
boost-system libboost-system-dev == 1.53 or > 1.58 1
boost-filesystem libboost-filesystem-dev == 1.53 or > 1.58 1
boost-regex libboost-regex-dev == 1.53 or > 1.58 1
libedit-devel libedit-dev >= 3.0
libxml2-devel libxml2-dev >= 2.9.1
python3-pyyaml python3-yaml >= 3.10
yaml-cpp-devel libyaml-cpp-dev >= 0.5.1

Note 1 - Boost package versions 1.54-1.58 contain a bug that leads to compilation error.

The following optional dependencies enable additional testing:

redhat ubuntu version
valgrind-devel valgrind
jq jq
Installing RedHat/CentOS Packages
sudo yum install hwloc-devel boost-devel boost-graph boost-system boost-filesystem boost-regex libedit-devel libxml2-devel python3-pyyaml yaml-cpp-devel
Installing Ubuntu Packages
sudo apt-get update
sudo apt install libhwloc-dev libboost-dev libboost-system-dev libboost-filesystem-dev libboost-graph-dev libboost-regex-dev libedit-dev libxml2-dev libyaml-cpp-dev python3-yaml

Clone flux-sched, the repo name for Fluxion, from an upstream repo and prepare for configure:

git clone <flux-sched repo of your choice>
cd flux-sched
./autogen.sh

The Fluxion's configure will attempt to find a flux-core in the same --prefix as specified on the command line. If --prefix is not specified, then it will default to the same prefix as was used to install the first flux executable found in PATH. Therefore, if which flux returns the version of flux-core against which Fluxion should be compiled, then ./configure may be run without any arguments. If flux-core is side-installed, then --prefix should be set to the same prefix as was used to install the target flux-core.

For example, if flux-core was installed in $FLUX_CORE_PREFIX:

./configure --prefix=${FLUX_CORE_PREFIX}
make
make check
make install
Flux Instance

The examples below walk through exercising functioning flux-sched modules (i.e., sched-fluxion-qmanager and sched-fluxion-resource) in a Flux instance. The following examples assume that flux-core and Fluxion were both installed into ${FLUX_CORE_PREFIX}. For greater insight into what is happening, add the -v flag to each flux command below.

Create a comms session comprised of 3 brokers:

${FLUX_CORE_PREFIX}/bin/flux start -s3

This will create a new shell in which you can issue various flux commands such as following.

Check to see whether the qmanager and resource modules are loaded:

flux module list

Submit jobs:

flux mini submit -N3 -n3 hostname
flux mini submit -N3 -n3 sleep 30

Examine the status of these jobs:

flux jobs -a

Examine the output of the first job

flux job attach <jobid printed from the first submit>

Examine the ring buffer for details on what happened.

flux dmesg

Exit the Flux instance

exit

1 The name was inspired by Issac Newton's Method of Fluxions where fluxions and fluents are the key terms to define his calculus. As his calculus describes the motion of points in time for time-varying variables, our Fluxion scheduler uses scalable techniques to describe the motion of scheduled points in time for a diverse set of resources.

License

SPDX-License-Identifier: LGPL-3.0

LLNL-CODE-764420