Make and run the condor job on all MC and data:
./running/runBatchJobs.py -m TreeAnalyzer/background_estimation/predMacros/makeBETrees.C -b -i procdatasets.conf -j jobs/4_30_beTreees -o out
Compile the outputs into a single file per process:
./running/compile.sh jobs/4_30_beTreees/ compiled/
You will likely have to update the script to correspond to your actual datasets. Also, make the "compiled" directory.
On your computer, run the following script to setup your area with the complete directory structure, download your files form lxplus, and skim the trees for use. You will have to update some hardcoded links in the script. This should be run in a clean directory that will be where you run all of the later scripts.
. background_estimation/predMacros/getAreaReady.sh /uscms/home/nmccoll/nobackup/2011-04-15-susyra2/rel_HbbWW/work/analyzer_running/compiled/ trees/ background_estimation/predMacros/skimTree.C
Parameters
- Directory where the compiled trees are
- Where the trees will go. By default this should be "trees/"
- Skimming script, the location you give the script should be accessible
This and all future steps assume that "background_estimation" is linked in your work directory (where you ran getAreaReady.sh).
. background_estimation/predMacros/runInputs.sh . background_estimation/predMacros/
Parameters
- Where to run
- Where the macros are
It is that easy! Wait a few minutes and you will have all of your background and signal templates and you can make your datacards. It assumes that you have a few cores as there a lot of parallel processes running. Depending on what you want to do, you will probably want to turn some off. Here is the breakdown by block:
This macro makes all of the signal inputs.
Parameters
- Step of the estimation
- 0 Make the histograms to be fit and the yields
- 1 Fit the histograms and make templates
- 4 Empty, just compile
- Signal type enum
- Where to find the trees
Make the ttbar normalization scale factor. There is a single scale factor for all regions (control and search).
Parameters
- Step of the estimation
- 0 Make the histograms to be fit and then fit them
- 1 Test the scale factors
- Where to find the trees
Make the background search region templates.
Parameters
- Enum of the background to run
- Where to find the trees
For each background, the macro goes step by step to make the templates. In general, the steps are:
- Preparation (like the QCD ratio)
- Enum of the background to run
- Where to find the trees
The last entry is to make the data distributions. (4) as the parameter is used for pseuod-data, while (-1) is used for the real data. At every step of the way there are a bunch of checks that can be run to verify that everything is working.
Every step of the template production can be tested. This is done across a few different files. Each take three command line arguments:
- Step of the test (the tests go in the order of the makeBKGInputs, but the numbers are not 1-to-1 with those numbers)
- Background or signal to test
- For the background tests only: which region to test (0: SR, 1: tt CR, 2: q/g CR)
- (3 for signal tests) directory name for the outputs
The last one is important. While the the scripts will display the plots on your screen, if you give the script a name of a directory (that already exists), the pdfs and the root files of the tests will be stored.
rr 'background_estimation/plotting/plotSignalTests.C+(4,0,false ,"baseline")'
rr 'background_estimation/plotting/plotNonResBkgTests.C+(0,true,0,"baseline")'
rr 'background_estimation/plotting/plotResBkgTests.C+(0,true,0,"baseline")'
Tests the signal, one thing to point out is that the signal yield tests also output the yields saved in the supplementary material.
Tests the q/g and lost t/w backgrounds.
Tests the mt and mW backgrounds. Step 6 is used to test all four backgrounds combined.
Make the cards by running the following in the same directory as where you did the template building:
. background_estimation/predMacros/makeCards.sh nonCond_ePTRatio0p4_wQCD_sepQGemu 0 0 background_estimation/predMacros/
Parameters
- Card label, should be something unique so you can keep track of things.
- What region to run in (0 SR, 1 tt CR, 2 q/g CR).
- What type of signal you are running.
- Where the macros are located.
The macro will also automaticly send the cards to fermilab, you may want to change where they do that or turn that off. This macro calls makeCard.C
. This file is where the backgrounds and systematics are defined. Also, this file is where you say if you want to look at real or pseudo data.
After running this you should have some data cards in limits/CARDLABEL
. In particular, the combinedCard.txt
and combined.root
are the files corresponding to the card for all combined regions. The directory plots
in here is where all of the plots go when you test these cards.
Most of the tests are done with plotDataTests.C
. We pretty much assume that you have a complete card at this point, but it will also use information from the inputs directories.
rr `background_estimation/plotting/plotDataTests.C+(0,0,"limits/c_baseline2_data")'
Parameters
- Step of the test
- 0 Run the post-fit so you can make post-fit plots (takes some time)
- 1 Test the pre-fit model
- 2 Test the post-fit model, as used in the AN
- 3 Do the saturated GOF test for projections of the model
- 4 Make summary plots, this is the global GOF test and the systematic pulls. This requires that you have run GoodnessOfFit and FitDiagnostic steps of the statistical tests.
- 5 Plot the bias test results. Requires that you run the bias tests.
- 6 Post-fit CR plots for the paper
- 7 Post-fit SR plots for the paper
- 8 Supplemental post-fit SR plots
- 9 More Supplemental post-fit SR plots
- What region you are running
- Data card directory
This macro makes the search region variable plots for the paper. You can turn off blinding in the search region if you want. It runs in two steps, make the histograms in the first and the plots in the second. The second argument is for which region you are running on. The 1 and 2 steps have to be done after the 0 completes.
rr -b -q 'background_estimation/plotting/plotSRVariables.C+ (0,0,"trees/betrees_mc.root","mc")' &
rr -b -q 'background_estimation/plotting/plotSRVariables.C+ (0,0,"trees/betrees_data.root","data")' &
rr -b -q 'background_estimation/plotting/plotSRVariables.C+ (0,0,"trees/out_radion_hh_bbinc_m1000_0.root","m1000")' &
rr -b -q 'background_estimation/plotting/plotSRVariables.C+ (0,0,"trees/out_radion_hh_bbinc_m2500_0.root","m2500")' &
rr 'background_estimation/plotting/plotSRVariables.C+ (1,0,"","")'
rr 'background_estimation/plotting/plotSRVariables.C+ (2,0,"","")'
nohup combine -m 800 -M AsymptoticLimits --run expected --rMax 0.5 -v2 --rAbsAcc=0.00001 combinedCard.txt &
Parameters
nohup
and&
Good to run in the background-m
Mass of the signal you want to test--run expected
Only run the expected limits...turn this off if you want the observed too--rMax 0.5
Maximum signal strength value...this should be tuned for the tested mass-v 2
Decent verbositycombinedCard.txt
your datacard
The macro limit_plotting/doLXPLimits.py
is used to do the limits as batch jobs at LXPlus.
combine -M Significance -m 2300 ./combined.root
This takes some time, so it we use crab and the combine tool to do it:
combineTool.py -d combinedCard.root -m 2500 -M HybridNew --LHCmode LHC-limits -v 2 --singlePoint 0.002:0.05:0.001 -T 100 --clsAcc 0 --iterations 2 --fork 2 --saveToys --saveHybridResult --job-mode crab3 --task-name grid-test --custom-crab cuscrab.py
You care about --singlePoint 0.002:0.05:0.001 -T 100 --clsAcc 0 --iterations 2 --fork 2
which is how the job is defined. The first argument is the scan of signal strength (min,max,binning). -T 100
says do 100 toys. --iterations 2
says do two batches of 100 toys. --fork 2
says fork this into two process, you need some forking to get root to cleanup after each iteration. The problem is that there are memory leaks after each toy, so you have to balance between startup costs and the memory leak. You need --custom-crab cuscrab.py
to tell the tool how to run on crab. The file should be:
cuscrab.py contents:
def custom_crab(config):
config.Site.storageSite = 'T3_US_FNALLPC'
Then, after everything runs, combine all of the outputs:
cd crab_grid-test/results/
for a in `ls -1 *.tar`; do tar -xvf $a; done
cd ../..
hadd higgsCombine.Test.HybridNew.m3500.123456.root crab_grid-test/results/higgsCombine*root
Then get the final result:
combine -d combinedCard.root -m 2500 -M HybridNew --LHCmode LHC-limits -v 2 --readHybridResults --grid=higgsCombine.Test.HybridNew.mH2500.123456.root --expectedFromGrid=0.5
You can also do the same thing....but interactively. This time you need to break up your iterations and give it random seeds. An example:
combine -M HybridNew -m 2300 ../combined.root --LHCmode LHC-significance --saveToys --fullBToys --saveHybridResult -T 10 -i 5 -s -1 &
combine -M HybridNew -m 2300 ../combined.root --LHCmode LHC-significance --saveToys --fullBToys --saveHybridResult -T 10 -i 5 -s -1 &
combine -M HybridNew -m 2300 ../combined.root --LHCmode LHC-significance --saveToys --fullBToys --saveHybridResult -T 10 -i 5 -s -1 &
combine -M HybridNew -m 2300 ../combined.root --LHCmode LHC-significance --saveToys --fullBToys --saveHybridResult -T 10 -i 5 -s -1 &
combine -M HybridNew -m 2300 ../combined.root --LHCmode LHC-significance --saveToys --fullBToys --saveHybridResult -T 10 -i 5 -s -1 &
combine -M HybridNew -m 2300 ../combined.root --LHCmode LHC-significance --saveToys --fullBToys --saveHybridResult -T 10 -i 5 -s -1 &
combine -M HybridNew -m 2300 ../combined.root --LHCmode LHC-significance --saveToys --fullBToys --saveHybridResult -T 10 -i 5 -s -1 &
hadd higgsCombineTest.HybridNew.mH2300.allToys.root higgsCombineTest.HybridNew.mH2300.*
combine -M HybridNew -m 2300 ../combined.root --LHCmode LHC-significance --readHybridResult --toysFile=higgsCombineTest.HybridNew.mH2300.allToys.root
These are a part of the "standard tests." You will need to make sure that you have your card in root file form:
text2workspace.py combinedCard.txt -o combined.root
Do a fit to the data with some mass hypothesis:
combine -M FitDiagnostics -m 1000 combined.root
Get the fit diagnostics (e.g. NP pulls):
python ~/pathToTreeAnalyzer/TreeAnalyzer/framework/HiggsAnalysis/CombinedLimit/test/diffNuisances.py -g plots.root fitDiagnostics.root
Run first, to get the observed value:
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq
Then run your toys:
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 &
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 -s 1 &
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 -s 2 &
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 -s 3 &
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 -s 4 &
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 -s 5 &
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 -s 6 &
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 -s 7 &
combine -M GoodnessOfFit combinedCard.txt --algo saturated --fixedSignalStrength 0 -m 1000 --toysFreq -t 100 -s 8 &
hadd higgsCombineTest.GoodnessOfFit.mH2000.toys.root higgsCombineTest.GoodnessOfFit.mH2000.*.root
Start by getting your input root file:
text2workspace.py combinedCard.txt -m 1000 -o forImpact.root
Step one is to do the initial fit, it depends on the mass and the test that you are doing.
combineTool.py -M Impacts -d forImpact.root --doInitialFit --robustFit 1 -m 1000 --rMax 0.42 --rMin 0.047 --toysFrequentist -t -1 --expectSignal=0.14
Do a 1 TeV signal with bounds on the fitted signal strength (important if you want it to converge). --toysFrequentist -t -1
says use the Asimov dataset. --expectSignal=0.14
say inject signal with a certain strength. You can change this to 0.
combineTool.py -M Impacts -d forImpact.root --doInitialFit --robustFit 1 -m 1000 --rMax 0.2
Is very similar, byt this time you are fitting to data. Then you have to do the second step...actually computing the impacts. How you do the fit and the toys have to be the same as step one. If you wanted to run interactively for the previous two tests you would:
combineTool.py -M Impacts -d forImpact.root --doFits --robustFit 1 --parallel 20 -m 1000 --rMax 0.42 --rMin 0.047 --toysFrequentist -t -1 --expectSignal=0.14
combineTool.py -M Impacts -d forImpact.root --doFits --robustFit 1 --parallel 20 -m 1000 --rMax 0.2
If you wanted to run with crab instead (crab.py is the same as for observation with toys):
combineTool.py -M Impacts -d forImpact.root --doFits --robustFit 1 --parallel 20 -m 1000 --rMax 0.42 --rMin 0.047 --toysFrequentist -t -1 --expectSignal=0.14 --job-mode crab3 --task-name grid-test --custom-crab cuscrab.py
combineTool.py -M Impacts -d forImpact.root --doFits --robustFit 1 --parallel 20 -m 1000 --rMax 0.2 --job-mode crab3 --task-name grid-test --custom-crab cuscrab.py
Once all these jobs have been completed, collect the results and write them to a json:
combineTool.py -M Impacts -d forImpact.root -m 1000 -o impacts.json
Finally, you can then make your pretty plots:
plotImpacts.py -i impacts.json -o impacts