Automatic Optimizer for CMS Reco [Not to merge] #4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR propose a new
optimize_reco.py
script, modelled on top ofoptimize.py
, thought to make the MOPSO work with a genericcms-sw
reconstruction config in input. Given a list of modules we want to tune, a target we want to validate, and the parameters to tune, it automaically builds acmsRun
config derived from the input one that is ready to be run and "tuned" by the MOPSO.It works like this. Let's say we have a
step3_pixel.py
config that runs pixel tracking RECO (+VALIDATION). Could be the one generated withThen one can launch the
optimize_reco.py
with something like:going through the options:
-t\--tune
gets the name of the module we want to tune (this could be a list but for the moment implemented only for a single module)-v\--validate
is the modules that produces the object on which we want to validate, given in input to the validation.--pars
gets the list of parameters that we want to tune with the MOPSO. This could be either a list of parameters, either an input text file with each parameter comma (,
) separated.-f
to specify the list of files to be used in input.-p\--num_particles
the number of agents.-i\--num_iterations
the number of iterations.-b\--bounds
takes care of the definition of the upper and lower bounds for the parameters. These can be parsed in two ways:-b m M
where the first define the lower bounds asdefault_values / m
and the second the upper ones asdefault_values * M
.json
file containing the dictionary for the bounds. Such as, for this case,{"z0Cuts":12, "phiCuts": [400,400,400,400,400]}
. Note that the options may be mixed, e.g.-b 3 max.json
.Launching this the
optimize_reco.py
will run the following steps:loads the
process
defined in the input config adding to it theDependencyGraph
Service
and setting it to run with no source (EmptySource
) and zero events. The newprocess_zero
is then run just to get the graph of the modules used in the config.given the graphs it gets all the modules that are need to go from the module(s)
tune
to the modulevalidate
.define the upper and lower bounds (
ub
/lb
) that will be parsed to the MOPSO object.write a new
process_to_run.py
config that is modified in order to get the results of the previous steps and to be able to get the needed params in input from a csv file (the output of the MOPSO basically). This is done by prependingheader.py
and appendingfooter.py
.the new config takes in input also the number of threads (
--num_thrads
), events (--num_events
) and the input files.Then, the
process_to_run.py
is the config actually run by the MOPSO and it uses the results from theoptimize_reco.py
to build thenum_particles
different chains to go from (i-th) module(s)tune
to the (i-th) modulevalidate
taking care of the tasks definition, of rewriting all the inputs and defining the final validation step (removing the possible output steps).All of this happens in an ad-hoc folder and one may continue the previous run by specifing in which folder (
--dir
) the script should look for the previous end state and for how many extra iterations--continuing
. E.g.For the moment I'm opening it here to make it available. But I would like to include this in
The-Optimizer
examples folder.TODOs for the future (will go in an issue):
hgcal
option to be added;--timing
option to add throughput/timing calculations for each run;