This is a CMSSW code for creating small ROOT ntuples from CMS data/MC samples. Check out this twiki for more details: CMS/UserCodeNWUntupleProducer
- Set up the environment
setenv SCRAM_ARCH slc5_amd64_gcc462
setenv CVSROOT :ext:<cern-user-account>@lxplus.cern.ch:/afs/cern.ch/user/c/cvscmssw/public/CMSSW
cmsrel CMSSW_5_3_13_patch3
cd CMSSW_5_3_13_patch3/src
cmsenv
Replace the <cern-user-account>
with your CERN account.
Since the new recipe for CVS connection is done through ssh to lxplus you will have to type your CERN password
every time when checkout from CVS. It is inconveniet, but we have to live with it for now.
- Met recipes, according to workbook and met-recipe:
git cms-addpkg PhysicsTools/PatAlgos
git cms-merge-topic 1472
git cms-merge-topic -u TaiSakuma:53X-met-131120-01
scram b -j 9
- Met filters according to MissingETOptionalFilters:
cvs co -r V00-00-13-01 RecoMET/METFilters
cvs co -r V00-03-23 CommonTools/RecoAlgos
cvs co -r V01-00-11-01 DPGAnalysis/Skims
cvs co -r V00-11-17 DPGAnalysis/SiStripTools
cvs co -r V00-00-08 DataFormats/TrackerCommon
cvs co -r V01-09-05 RecoLocalTracker/SubCollectionProducers
scram b -j 9
- Ecal tools so we can get more photon variables for photon MVA:
git cms-addpkg RecoEcal/EgammaCoreTools
- Tools for recommended Electron Iso EgammaPFBasedIsolation
git-cms-addpkg CommonTools/ParticleFlow
- Egamma tools from MVA Electron ID:
cvs co -r V00-00-09 EgammaAnalysis/ElectronTools
cvs co -r V09-00-01 RecoEgamma/EgammaTools
cd EgammaAnalysis/ElectronTools/data
cat download.url | xargs wget
cd ../../../
scram b -j 9
- MVA MET Code (Just for PU Jet ID) Jet PU ID:
cvs co -r METPU_5_3_X_v4 RecoJets/JetProducers
cvs up -r HEAD RecoJets/JetProducers/data/
cvs up -r HEAD RecoJets/JetProducers/python/PileupJetIDCutParams_cfi.py
cvs up -r HEAD RecoJets/JetProducers/python/PileupJetIDParams_cfi.py
cvs up -r HEAD RecoJets/JetProducers/python/PileupJetID_cfi.py
cvs co -r V05-00-16 DataFormats/JetReco
scram b -j 9
- Extra code (for boosted Z->ee isolation), (-d option not working since new CVS directory) following boostedZ and heep twikies:
mkdir TSWilliams
mkdir SHarper
cvs co -r V00-02-03 UserCode/TSWilliams/BstdZee/BstdZeeTools
cvs co -r V00-09-03 UserCode/SHarper/HEEPAnalyzer
mv UserCode/TSWilliams/BstdZee/BstdZeeTools TSWilliams/.
mv UserCode/SHarper/HEEPAnalyzer SHarper/.
rm -r UserCode
- PF footprint removal Supercluster footprint removal twiki:
git clone https://github.com/peruzzim/SCFootprintRemoval.git PFIsolation/SuperClusterFootprintRemoval
cd PFIsolation/SuperClusterFootprintRemoval
git checkout V01-06
cd ../..
scram b -j 9
- Now check out the ntuple producer code and then the specific tag/branch of the code that is known to work
git clone https://github.com/NWUHEP/ntupleProducer NWU/ntupleProducer
cd NWU/ntupleProducer
git checkout v9.9
cd ../..
scram b -j 9
Once compiled, we are ready to run it
cd NWU/ntupleProducer/test
cmsRun ntupleProducer_cfg.py
it assumes you are running over an MC sample. If you want to run on data, do:
cmsRun ntupleProducer_cfg.py isRealData=1
that will set up an appropriate global tag etc.
NB
By defualt, the ntuples require that there be at least one muon(electron) with pT > 3(5) GeV in order for an event to be saved.
In the case that this is not desired (for instance, in jet or photon based studies),
you should switch off the skimLeptons
option in ntupleProducer_cfg.py
In addition to this, there are various flags the configuration file, ntupleProducer_cfg.py, that allow to save/not save certain objects (muons, jets, etc). All are saved by default.
For running over individual datasets, it's best to use standard crab. The configuration files for MC and data are crabNtuples_MC.cfg
and crabNtuples_Data.cfg
. Submission goes as follows,
crab -create -cfg crabNtuples_<type>.cfg
crab -submit -c <ui_working_dir>
To check the status of your jobs,
crab -status -c <ui_working_dir>
and to get the log files,
crab -get -c <ui_working_dir>
More information can be found in the CMS SW guide chapter on CRAB.
For submission of ntuple production of multiple datasets at once, the multicrab framework can be used. It is described very briefly here. The important feature is that you can use most (?) of the standard crab commands for submission and checking on jobs status by replacing the crab
command with multicrab
. For instance when jobs are submitted in multicrab you can do the following,
multicrab -create -cfg <cfg_file> -submit
and to check their status
multicrab -status -c <ui_working_dir>
which also works for crab
. You can also check the status of individual datasets using standard crab
commands. The main difference is in the format of the configuration files. For multicrab, there is a crab.cfg file with a set of global configuration parameters and a multicrab.cfg file where each dataset is given its own specific configuration. As for the case of normal crab, two configuration files have been prepared for data and MC, multicrab_data.cfg
and multicrab_mc.cfg
.
After CRAB claims that your jobs are finished with exit codes 0 0, you will want to double check because it lies and large jobs tend to have a few extra or missing files.
Run the following command:
./find_goodfiles.py -c Path/To/CrabDir -q
This will check that all the jobs listed in the crab xml files are actually in your output area, and that your output area contains no extra or duplicate files. If it does, the script will tell you what needs to be rerun or what needs to be deleted.
- First, make sure you are on master branch and have the latest code:
git checkout master
git pull
- Then create a new branch and swich to it:
git branch dev-username
git checkout dev-username
- Now you can make any changes you want. Once you are done, commit it and push your branch.
git commit -a
git push origin dev-username
- When you are satisfied with you new code, merge it with master branch. For that:
git checkout master
git merge dev-username
git push
If the changes do not conflict, you are done.
If there are conflicts, markers will be left in the problematic files showing the conflict; git diff
will show this.
Once you have edited the files to resolve the conflicts, git commit -a
.
At any time you can tag your code, and push your tags to remote:
git tag -a test1 -m "my tag"
git push origin --tags
You can use any tags you want, later those can be deleted.
For the global production though, we should stick with a tagging convention. Tags should be vX.Y and I am starting them with v6.1. Such that the tag corresponds to the nutuple_v6 name of ntuple production. If the new code significantly changes the format of the ntuples (substantial changes to class definitions etc.) then the first number of a tag should be incremented (to v7.1 etc.) and the ntuple production path-name should be changed correspondingly. Otherwise, incremental changes should be reflected in changes to the second digit.