-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GPU parameters at ReReco spec level #10388
Comments
Hi, |
As a complement to Alan's comment, I would like to say that |
Hi @justinasr, it's good to make sure we are talking about the same things.
and inject it into ReqMgr2. On the ReqMgr2 side, we would have to load that string (JSON object) as a python dictionary, so:
and then the whole validation would happen on top of this python dictionary.
That's correct. I have fixed the data structure example in the initial description. I also added an example with the full set of parameters.
Thanks Ceyhun, I have also fixed the CUDACapability vs CUDACapabilities inconsistency in the initial description. |
And here is a ticket to address the job creation and submission to HTCondor: #10393 |
@amaltaro are these parameters
the ones that will be published by the nodes, or the ones that will be requested by the jobs ? |
Hi Alan - OK, thanks, then the list looks correct, as far as I can say. By the way, I've posted a brief description of the fields here: cms-sw/cmssw#33057 (comment) . |
Thanks for providing a description for all those parameters, Andrea. I have linked them at the top description of this GH issue, but it might be better to expand those in a wiki under this project. Thanks again! |
As an outcome of the GPU discussion we had today, I have just updated the initial description of this PR, mostly concerned with the |
PR for runTheMatrix is updated, Here is how it looks like to step/task that runs GPU.
|
In addition, there is a comment on Since it matches with node's |
I don't think we should change it:
|
@srimanob the Unfortunately we have not yet started working on this issue on the WMCore side, so we still do not have anything for you to test on. However, if your CMSSW pull requests can wait for a week more (maybe two...), we might be able to get the basic changes in ReqMgr2 such that you could test in testbed. |
@justinasr @fwyzard @srimanob @mrceyhun Hi everyone, I just wanted to let you know that I'm starting to work on this issue, based on the initial specification in the first comment of this GH issue. In short, new workflow/spec parameters are going to be:
Please let us know in case you see any inconsistency, or if there are new updates relevant to GPU support. |
Thanks @amaltaro |
@amaltaro GPUMemoryMB is fine for me. I've added the PR at cms-sw/cmssw#35263. Could you please have a look? Thanks. |
Impact of the new feature
ReqMgr2
Is your feature request related to a problem? Please describe.
It's a new project to support GPU processing in central production workflows, where WM system will be the bridge between Offline/CMSSW and the grid resources made available to CMS (through the glideinWMS layer).
There will be a series of such tickets, such that we can try to break work down in even parts.
Describe the solution you'd like
In short, we need to support new workflow spec attributes for the ReReco/DataProcessing spec modules.
Now expanding on the important details:
RequiresGPU
, with one of the possible 3 string values:forbidden
: must not use GPUs (default)optional
: can use GPU, if possiblerequired
: must use GPUsGPUParams
, which will contain all the necessary information to match/leverage GPU resources. It's meant to be JSON encoded python dictionary (thus ofstring
type). There are 3 required and 3 optional parameters. The required parameters are:GPUMemory
: integer with the amount of memory, in Megabytes (MB). Validate as> 0
. E.g.: 8000CUDACapabilities
: a list of short strings (could be real though). Validation should ensure at least one item in the list and matching this regex:r"^\d+.\d$"
. E.g.: ["7.5", "8.0"]CUDARuntime
: a short string with the runtime version (could be real though). Validated against this regex:r"^\d+.\d+$"
. E.g.: "11.2"The list of optional parameters are:
GPUName
: a string with the GPU name. Validate against<= 100 chars
. E.g. "Tesla T4", "Quadro RTX 6000";CUDADriverVersion
: a string with the CUDA driver version. Validated against this regex:r"^\d+.\d+\d+$"
E.g. "460.32.03"CUDARuntimeVersion
: a string with the CUDA runtime version. Validated against this regex:r"^\d+.\d+\d+$"
E.g. "11.2.152"RequiresGPU
isoptional
orrequired
, then we should enforceGPUParams
to contain the 3 required parameters with correct values;RequiresGPU
isforbidden
, thenGPUParams
should be an empty string (FIXME: not sure we should enforce it(?))Just as an example, here is how
GPUParams
required parameters would look like:and this is an example with the full set of parameters:
Describe alternatives you've considered
Not to validate the optional parameters.
Additional context
Major discussion happened here: cms-sw/cmssw#33057
and description of these parameters: cms-sw/cmssw#33057 (comment)
The text was updated successfully, but these errors were encountered: