Skip to content

QCG AdvancedClient

Joris Borgdorff edited this page Aug 30, 2016 · 1 revision

Currently submitting co-allocated MUSCLE application is only possible using the XML !JobProfile (compare QCG-SimpleClient). Beside the different job description format you have to suffix the qcg-sub command with the QCG keyword:

$ qcg-sub muscle.xml QCG

Example (Fusion - Transport Turbulence Equilibrium)

  • Install your application on every cluster you wish to use
  • register it on every cluster using [http://apps.man.poznan.pl/trac/qcg-computing/wiki/ComunityModules QCG Community Modules (QCE)] mechanism:
qcg-module-create -g plggmuscle Fusion/Turbulence

The module must bear the same name on every cluster. Inside the module you can set/prepend any environment variable, add dependencies to other modules, e.g.:

#%Module 1.0


proc ModulesHelp { } {
        puts stderr "\tName: Fusion/Turbulence"
        puts stderr "\tVersion: 0.1"
        puts stderr "\tMaintainer: plgmamonski"
}

module-whatis   "Fusion/Turbulence, 0.1"

#load all needed modules
module add muscle2

#sets TCL variable
set FUSION_KERNELS "/home/plgrid-groups/plggmuscle/fusionkernels"
#sets environment variable
setenv FUSION_KERNELS $FUSION_KERNELS
#add to the PATH native kernels
prepend-path PATH ${FUSION_KERNELS}/bin/

set curMod [module-info name]

if { [ module-info mode load ] } {
        puts stderr "$curMod load complete."
}

if { [ module-info mode remove ] } {
        puts stderr "$curMod unload complete."
}

You can set there two environment variables interpreted by the MUSCLE framework, namely: MUSCLE_CLASSPATH and MUSCLE_LIBPATH which set the Java classpath and the path of dynamically loadable libraries respectively. Thanks to this mechanism you can use single abstract CxA that do not consist of any site-specific paths. Also you can load the module in the interactive QCG job, e.g:

bash-4.1$ module load Fusion/Turbulence
openmpi/openmpi-open64_4.5.2-1.4.5-2 load complete.
Fusion/Turbulence load complete.
bash-4.1$ muscle2 -ma -c $FUSION_KERNELS/cxa/testSimpleModelsB_shared.cxa.rb
Running both MUSCLE2 Simulation Manager and the Simulation
### Running MUSCLE2 Simulation Manager
  • Prepare XML job description:
<qcgJob appId="MAPPER" xmlns:jxb="http://java.sun.com/xml/ns/jaxb" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <task persistent="true" taskId="task">
    <requirements>
      <topology>
        <processes masterGroup="true" processesId="init:transp:dupCorep:turb">
	  				<processesCount>
						<value>1</value>
					</processesCount>
          <candidateHosts>
            <hostName>inula.man.poznan.pl</hostName>
          </candidateHosts>
        </processes>
        <processes processesId="equil:dupEquil">
					<processesCount>
						<value>1</value>
					</processesCount>
          <candidateHosts>
            <hostName>zeus.cyfronet.pl</hostName>
          </candidateHosts>
        </processes>
      </topology>
    </requirements>
    <execution type="mapper">
      <executable>
        <application name="muscle2"/>
      </executable>
      <arguments>
        <value>FusionSimpleModels.cxa.rb</value>
        <value>--verbose</value>
      </arguments>
      <stdout>
        <directory>
          <location type="URL">gsiftp://qcg.man.poznan.pl/~/MAPPER/${JOB_ID}.output</location>
        </directory>
      </stdout>
      <stderr>
        <directory>
          <location type="URL">gsiftp://qcg.man.poznan.pl/~/MAPPER/${JOB_ID}.error</location>
        </directory>
      </stderr>
      <stageInOut>
        <file name="FusionSimpleModels.cxa.rb" type="in">
          <location type="URL">gsiftp://qcg.man.poznan.pl/~/MAPPER/FusionSimpleModels.cxa.rb</location>
        </file>
        <file name="fusion-preprocess.sh" type="in">
          <location type="URL">gsiftp://qcg.man.poznan.pl/~/MAPPER/fusion-preprocess.sh</location>
        </file>
        <file name="fusion-postprocess.sh" type="in">
          <location type="URL">gsiftp://qcg.man.poznan.pl/~/MAPPER/fusion-postprocess.sh</location>
        </file>

        <directory name="data" type="in">
          <location type="URL">gsiftp://qcg.man.poznan.pl/~/MAPPER/data</location>
        </directory>
        <directory name="out" type="out">
          <location type="URL">gsiftp://qcg.man.poznan.pl/~/MAPPER/${JOB_ID}.out</location>
        </directory>

      </stageInOut>
      <environment>
        <variable name="QCG_MODULES_LIST">Fusion/Turbulence</variable>
        <variable name="QCG_PREPROCESS">fusion-preprocess.sh</variable>
        <variable name="QCG_POSTPROCESS">fusion-postprocess.sh</variable>
      </environment>
    </execution>
    <executionTime>
      <executionDuration>P0Y0M0DT0H30M</executionDuration>
    </executionTime>
  </task>
</qcgJob>

In the above example we:

  • run the simulation on the two clusters using advance reservations created automatically by the QCG-Broker (in the co-allocation process) on two clusters: inula and zeus (<candidateHosts>),
  • we requested 30 minutes of maximum job walltime (<executionDuration>),
  • we specify the kernels to be run in the processesId attribute of the <processes> element (multiple kernels must be separated with a colon ":"),
  • we specify the number of processes to be allocated (using advance reservation mechanism). Alternatively you can submit your job in "opportunistic mode" (i.e. if you believe that all kernels would start immediately because there are enough free resources) by adding the <reservation> tag in every kernels group, e.g.:
</candidateHosts>
<reservation type="LOCAL">NO_RESERVATION</reservation>

if using NO_RESERVATION keyword you can also request instead of <processesCount> (what means - give me N cores anywhere) some particular topology (e.g. 5 machines, 24 cores per every machine, 24 processes per every machine):

         <processesMap slotsPerNode="24">
            <processesPerNode>24</processesPerNode>
            <processesPerNode>24</processesPerNode>
            <processesPerNode>24</processesPerNode>
            <processesPerNode>24</processesPerNode>
            <processesPerNode>24</processesPerNode>
          </processesMap>
  • or use advance reservation created by an administrator:
</candidateHosts>
<reservation type="LOCAL">my-reservation.id</reservation>
  • we give the module name: <variable name="QCG_MODULES_LIST">Fusion/Turbulence</variable> that has to be loaded before starting MUSCLE
  • we specify pre and post process scripts. Example scripts:
$cat fusion-preprocess.sh
#!/bin/bash
#copy all file from the data dir into current directory:
cp data/* .
$cat fusion-postprocess.sh
#!/bin/bash
#copy all CPOs into the `out` directory
mkdir -p out
cp *.cpo out
  • provide staging directives
  • Example session:
#submit
qcg-sub fusion.xml QCG

https://elder7.man.poznan.pl:8443/qcg/services/
/C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl
UserDN = /C=PL/O=GRID/O=PSNC/CN=Mariusz Mamonski
ProxyLifetime = 23 Days 18 Hours 12 Minutes 0 Seconds
jobId = J1353272598993_MAPPER_0845

#get info
qcg-info J1353272598993_MAPPER_0845

https://elder7.man.poznan.pl:8443/qcg/services/
/C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl
UserDN = /C=PL/O=GRID/O=PSNC/CN=Mariusz Mamonski
ProxyLifetime = 23 Days 18 Hours 11 Minutes 50 Seconds

Note:
TaskType: MAPPER
SubmissionTime: Sun Nov 18 22:03:20 CET 2012
FinishTime:
ProxyLifetime: P23DT18H11M47S
Status: QUEUED
StatusDesc:
ReservedTimeSlot: Sun Nov 18 22:04:00 CET 2012 - Sun Nov 18 22:35:00 CET 2012
StartTime:

Allocation:
HostName: inula.man.poznan.pl
ProcessesCount: 1
ProcessesGroupId: init:transp:dupCorep:turb
Status: UNCOMMITTED
StatusDescription:
SubmissionTime: Sun Nov 18 22:03:24 CET 2012
FinishTime:
LocalSubmissionTime:
LocalStartTime:
LocalFinishTime:


#peek output of running job
qcg-peek J1353272598993_MAPPER_0845
https://elder7.man.poznan.pl:8443/qcg/services/
/C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl
UserDN = /C=PL/O=GRID/O=PSNC/CN=Mariusz Mamonski
ProxyLifetime = 23 Days 18 Hours 9 Minutes 43 Seconds

openmpi/openmpi-open64_4.5.2-1.4.5-2 load complete.
Fusion/Turbulence load complete.
openmpi/openmpi-open64_4.5.2-1.4.5-2 load complete.
Running both MUSCLE2 Simulation Manager and the Simulation
### Running MUSCLE2 Simulation Manager
Executing: java -server -Xms20m -Xmx100m -classpath /home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/muscle.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/oncrpc.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/platform.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/JadeLeap.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/jmml-util-0.1.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/jna.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/junit-4.10.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/msgpack-0.6.6.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/jsr-275-1.0-beta-2.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/javassist-3.15.0-GA.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/json_simple-1.1.jar:/home/plgrid-groups/plggmuscle/2.0/devel-debug/share/muscle/java/thirdparty/jcommander-1.17.jar -Dpl.psnc.muscle.socket.factory=muscle.net.CrossSocketFactory -Dpl.psnc.mapper.muscle.mto.address=192.168.11.102 -Dpl.psnc.map

#when the job is finished
[qcg] /home/plgrid/plgmamonski/reef/MAPPER > qcg-info J1353272598993_MAPPER_0845
https://elder7.man.poznan.pl:8443/qcg/services/
/C=PL/O=GRID/O=PSNC/CN=qcg-broker/qcg-broker.man.poznan.pl
UserDN = /C=PL/O=GRID/O=PSNC/CN=Mariusz Mamonski
ProxyLifetime = 23 Days 18 Hours 8 Minutes 57 Seconds

Note:
TaskType: MAPPER
SubmissionTime: Sun Nov 18 22:03:20 CET 2012
FinishTime: Sun Nov 18 22:04:37 CET 2012
ProxyLifetime: P23DT18H8M54S
Status: FINISHED
StatusDesc:
ReservedTimeSlot: Sun Nov 18 22:04:00 CET 2012 - Sun Nov 18 22:35:00 CET 2012
StartTime: Sun Nov 18 22:03:32 CET 2012

Allocation:
HostName: inula.man.poznan.pl
ProcessesCount: 1
ProcessesGroupId: init:transp:dupCorep:turb
Status: FINISHED


#we can see the results:
 ls -l J1353272598993_MAPPER_0845.out/
total 60608
-rw-r--r-- 1 plgmamonski plgrid-users 2503411 Nov 18 22:04 bdseq_equilibrium_0000.cpo
-rw-r--r-- 1 plgmamonski plgrid-users 2503411 Nov 18 22:04 bdseq_equilibrium_0001.cpo
-rw-r--r-- 1 plgmamonski plgrid-users 2503411 Nov 18 22:04 bdseq_equilibrium_0002.cpo
-rw-r--r-- 1 plgmamonski plgrid-users 2503411 Nov 18 22:04 bdseq_equilibrium_0003.cpo
-rw-r--r-- 1 plgmamonski plgrid-users 2503411 Nov 18 22:04 bdseq_equilibrium_0004.cpo
...
Clone this wiki locally