Common framework for FCC related analyses. This framework allows one to write full analysis, taking EDM4hep input ROOT files and producing the plots.
As usual, if you aim at contributing to the repository, please fork it, develop your feature/analysis and submit a pull requests.
To have access to the FCC samples, you need to be subscribed to one of the following e-groups (with owner approval)
fcc-eos-read-xx
withxx=ee,hh,eh
. For the time being, the configuration files are accessible onhelsens
public AFS. This is not optimal and will be changed in the future, thus you are also kindly asked to contactclement.helsens@cern.ch
and request access to/afs/cern.ch/work/h/helsens/public/FCCDicts/
.
Detailed code documentation can be found here.
Using ROOT dataframe allows to use modern, high-level interface and very quick processing time as it natively supports multithreading. In this README, everything from reading EDM4hep files on EOS and producing flat n-tuples, to running a final selection and plotting the results will be explained.
ROOT dataframe documentation is available here.
In order to use the FCC analysers within ROOT dataframe, a dictionary needs to
be built and put into LD_LIBRARY_PATH
(this happens in setup.sh
). The
following needs to be done when running local code and for developers.
source ./setup.sh
mkdir build install
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=../install
make install
cd ..
Each time changes are made in the C++ code, for example in
analyzers/dataframe/
please do not forget to re-compile :)
Analyses in the FCCAnalyses framework usually follow standardized workflow, which consists of multiple files inside a single directory. Individual files denote steps in the analysis and have the following meaning:
-
analysis.py
oranalysis_stage<num>
: In this file(s) the class of typeRDFanalysis
is used to define the list of analysers and filters to run on (analysers
function) as well as the output variables (output
function). It also contains the configuration parametersprocessList
,prodTag
,outputDir
,inputDir
,nCPUS
andrunBatch
. User can define multiple stages ofanalysis.py
. The first stage will most likely run on centrally produced EDM4hep events, thus the usage ofprodTag
. When running a second analysis stage, user points to the directory where the samples are located usinginputDir
. -
analysis_final.py
: This analysis file contains the final selections and it runs over the locally produced n-tuples from the various stages ofanalysis.py
. It contains a link to theprocDict.json
such that the samples can be properly normalised by getting centrally produced cross sections. (this might be removed later to include everything in the yaml, closer to the sample). It also contains the list of processes (matching the standard names), the number of CPUs, the cut list, and the variables (that will be both written in aTTree
and in the form ofTH1
properly normalised to an integrated luminosity of 1pb-1. -
analysis_plots.py
: This analysis file is used to select the final selections from runninganalysis_final.py
to plot. It usually contains information about how to merge processes, write some extra text, normalise to a given integrated luminosity etc... For the moment it is possible to only plot one signal at the time, but several backgrounds.
To better explain the FCCAnalyses workflow let's run our example analysis. The
analysis should be located at examples/FCCee/higgs/mH-recoil/mumu/
.
The pre-selection runs over already existing and properly registered FCCSW
EDM4hep events. The dataset names with the corresponding statistics can be found
here
for the IDEA spring 2021 campaign. The processList
is a dictionary of
processes, each process having it's own dictionary of parameters. For example
'p8_ee_ZH_ecm240':{'fraction':0.2, 'chunks':2, 'output':'p8_ee_ZH_ecm240_out'}
where p8_ee_ZH_ecm240
should match an existing sample in the database,
fraction
is the fraction of the sample you want to run on (default is 1),
chunks
is the number of jobs to run (you will have the corresponding number
of output files) and output
in case you need to change the name of the output
file (please note that then the sample will not be matched in the database for
finalSel.py
histograms normalisation). The other parameters are explained in
the example file.
To run the pre-selection stage of the example analysis run:
fccanalysis run examples/FCCee/higgs/mH-recoil/mumu/analysis_stage1.py
This will create the output files in the ZH_mumu_recoil/stage1
subdirectory
of the output director specified with parameter outDir
in the file.
You also have the possibility to bypass the samples specified in the
processList
variable by using command line parameter --output
, like so:
fccanalysis run examples/FCCee/higgs/mH-recoil/mumu/analysis_stage1.py \
--output <myoutput.root> \
--files-list <file.root or file1.root file2.root or file*.root>
The example analysis consists of two pre-selection stages, to run the second one slightly alter the previous command:
fccanalysis run examples/FCCee/higgs/mH-recoil/mumu/analysis_stage2.py
It is also possible to run the pre-selection step on the batch. For that the
runBatch
parameter needs to be set to true. Please make sure you select a
long enough batchQueue
and that your computing group is properly set
compGroup
(as you might not have the right to use the default one
group_u_FCC.local_gen
as it request to be part of the FCC computing e-group
fcc-experiments-comp
). When running on batch, you should use the chunk
parameter for each sample in your processList
such that you benefit from high
parallelisation.
The final selection runs on the pre-selection files that were produced in the
Pre-selection step. In the configuration file
analysis_final.py
various cuts are defined to be run on and the final
variables to be stored in both a TTree
and histograms. This is why the
variables needs extra fields like title
, number of bins and range for the
histogram creation. In the example analysis it can be run like this:
fccanalysis final examples/FCCee/higgs/mH-recoil/mumu/analysis_final.py
This will create 2 files per selection SAMPLENAME_SELECTIONNAME.root
for the
TTree
and SAMPLENAME_SELECTIONNAME_histo.root
for the histograms.
SAMPLENAME
and SELECTIONNAME
correspond to the name of the sample and
selection respectively in the configuration file.
The plotting analysis file analysis_plots.py
contains not only details for
the rendering of the plots but also ways of combining samples for plotting.
In the example analysis it can be run in the following manner:
fccanalysis plots examples/FCCee/higgs/mH-recoil/mumu/analysis_plots.py
Resulting plots will be located the outdir
defined in the analysis file.