Skip to content

Conversation

@nickhrdy
Copy link
Contributor

Add flux plugin to mpibind repo. Currently the flux plugin is created by taking the mpibind.so file and wrapping that with the code that interfaces with flux. To use the plugin, you can specify the value of initrc in the command. Note that the folder containing the plugin's .so file needs to be in the LD_LIBRARY_PATH.
(e.g. flux mini run -o initrc=mpibind.lua -N2 -n2 -c8 -g1 /bin/true).

To change the mpibind bind input parameters, you can either use more -o flags on the command line
(e.g. flux mini run -o initrc=mpibind.lua -o mpibind.smt=2 -N2 -n2 -c8 -g1 /bin/true) or with the conf variable in the lua script (e.g. conf = {smt = 1, greedy = 0, gpu_optim = 1}).

The currently exposed parameters are: smt, greedy, gpu_optim, and disable. The ntasks, restrict_ids and restrict_type aren't exposed because they are set using from information provided by flux at runtime.

Things of note:

  • the parameter nthreads is not able to be set through the plugin.
  • flux might be able to specify the number of hardware threads per core; if this happens, we'll have to rediscuss how the smt parameter is set (i.e. exposed it to the user or getting it from flux at runtime).

TODOs:

  • Once Test Suite and Autotools #1 is merged in, update this branch to build the plugin.so with autools
    • Also update the documentation to reflect this change
  • Check on whether in_nthreads can be set in the plugin without conflicting with other parts of flux's scheduler

Waiting on #1
Closes #4

@nickhrdy nickhrdy changed the title WIP: Flux plugin Flux plugin Aug 12, 2020
Copy link
Collaborator

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickhrdy, I just did a quick first pass. Everything is looking pretty good to me!

I will try to do some actual testing with the plugin in the near future, but my time may be limited the next few days.

Nice work!

@@ -0,0 +1,4 @@
This file contains example programs that may serve useful when trying to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This file contains example programs that may serve useful when trying to
This directory contains example programs that may serve useful when trying to

@grondo
Copy link
Collaborator

grondo commented Aug 13, 2020

FYI - I can't get the mpibind plugin to work, it seems like parsing the mpibind shell option is always failing.
E.g.:

ƒ(s=1,d=0) fluxuser@69598212adb2:~/mpibind/flux$ flux mini run -o verbose=2 -o initrc=mpibind.lua -o mpibind=smt:2,gpu_optim=1,verbose=1 /bin/true
0.031s: flux-shell[0]: DEBUG: 0: task_count=1 slot_count=1 cores_per_slot=1 slots_per_node=-1
0.031s: flux-shell[0]: DEBUG: 0: tasks [0] on cores 0
0.031s: flux-shell[0]: DEBUG: Loading mpibind.lua
0.033s: flux-shell[0]: DEBUG: output: batch timeout = 0.500s
0.037s: flux-shell[0]: FATAL: mpibind: Unknown mpibind parameters
0.038s: job.exception type=exec severity=0 Unknown mpibind parameters
flux-job: task(s) exited with exit code 1
ƒ(s=1,d=0) fluxuser@69598212adb2:~/mpibind/flux$ flux mini run -o verbose=2 -o initrc=mpibind.lua -o mpibind='{"smt":2,"gpu_optim":1,"verbose":1}' /bin/true
0.036s: flux-shell[0]: DEBUG: 0: task_count=1 slot_count=1 cores_per_slot=1 slots_per_node=-1
0.036s: flux-shell[0]: DEBUG: 0: tasks [0] on cores 0
0.036s: flux-shell[0]: DEBUG: Loading mpibind.lua
0.040s: flux-shell[0]: DEBUG: output: batch timeout = 0.500s
0.044s: flux-shell[0]: FATAL: mpibind: Unknown mpibind parameters
0.046s: job.exception type=exec severity=0 Unknown mpibind parameters
flux-job: task(s) exited with exit code 1
ƒ(s=1,d=0) fluxuser@69598212adb2:~/mpibind/flux$ flux mini run -o verbose=2 -n2 -o initrc=mpibind.lua -o verbose=2 -o mpibind hostname
0.034s: flux-shell[0]: DEBUG: 0: task_count=2 slot_count=2 cores_per_slot=1 slots_per_node=-1
0.034s: flux-shell[0]: DEBUG: 0: tasks [0-1] on cores 0-1
0.034s: flux-shell[0]: DEBUG: Loading mpibind.lua
0.036s: flux-shell[0]: DEBUG: output: batch timeout = 0.500s
0.041s: flux-shell[0]: FATAL: mpibind: Unknown mpibind parameters
0.043s: job.exception type=exec severity=0 Unknown mpibind parameters
flux-job: task(s) exited with exit code 1

@grondo
Copy link
Collaborator

grondo commented Aug 13, 2020

BTW, I'm running under the flux-core ubuntu 20.04 docker image, which may be convenient for testing.

Eventually you could set up a CI (Travis or GitHub workflow) which builds and tests mpibind including this flux plugin by running the tests via the flux-core docker image.

 grondo@asp:~/git/flux-core$ docker run -ti fluxrm/flux-core:focal
sudo: setrlimit(RLIMIT_CORE): Operation not permitted
ƒ(s=1,d=0) fluxuser@78829de41944:~$ git clone https://github.com/nickhrdy/mpibind
Cloning into 'mpibind'...
remote: Enumerating objects: 287, done.
remote: Counting objects: 100% (287/287), done.
remote: Compressing objects: 100% (167/167), done.
remote: Total 287 (delta 153), reused 242 (delta 113), pack-reused 0
Receiving objects: 100% (287/287), 186.51 KiB | 1.62 MiB/s, done.
Resolving deltas: 100% (153/153), done.
ƒ(s=1,d=0) fluxuser@78829de41944:~$ cd mpibind
ƒ(s=1,d=0) fluxuser@78829de41944:~/mpibind$ git checkout flux_plugin
Branch 'flux_plugin' set up to track remote branch 'flux_plugin' from 'origin'.
Switched to a new branch 'flux_plugin'

@grondo
Copy link
Collaborator

grondo commented Aug 14, 2020

@nickhrdy, let me know if you need any other help with Flux interfaces. I realize the docs are spotty for now so I'm willing to help in any way I can!

eleon added a commit that referenced this pull request Aug 22, 2020
codes to the hardware in Flux. My thanks to @grondo and @nickhrdy for
thier contributions. This resolves #5.
@eleon eleon closed this Aug 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mpibind Flux plugin

3 participants