Skip to content

Tutorial

jdkio edited this page Sep 24, 2024 · 21 revisions

Installation

This tutorial assumes that you have a FermiLab computing account (see DUNE's atwork for more info), and have set up kerberos authentication.

Connecting to the GPVMs

Run kinit yourusername@FNAL.GOV

There are 16 GPVMs, named dunegpvm01.fnal.gov through dunegpvm16.fnal.gov, with no load balancing. Connect with ssh dunegpvmXY.fnal.gov. Users generally have a favourite one they connect to, but to avoid walking on anybody's toes please check the activity on the GPVM you log into. GPVMs should not be used for sustained heavy workloads, contact the dune computing team to learn how to submit jobs to the grid.

Getting the code

The code lives on github at DUNE/dune-tms.

cd /dune/app/users/${USER}
mkdir some_project_name # optional, but may help keep things tidy :slightly_smiling_face: 
cd some_project_name
git clone git@github.com:DUNE/dune-tms.git # if this fails, run ssh-keygen to make an ssh key
cd dune-tms
source setup_FNAL.sh

The current directory now contains the source code for the detector simulation and reconstruction (located in src/), and various analysis scripts in scripts/ for high and low level performance analysis, a 2D event display, a 3D geometry viewer, and much more!

Building

All dependencies should be provided (using spack) when you source setup_FNAL.sh, and the only step remaining is to run

make

which builds the code and installs it into the current directory. For running the code, see the later section.

Setup

This needs to be done every time you start a new terminal.

cd your_working_directory/dune-tms
source setup_FNAL.sh

Running in SL7

Start an SL7 container using:

/cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/bin/apptainer shell --shell=/bin/bash -B /cvmfs,/exp,/nashome,/pnfs/dune,/opt,/run/user,/etc/hostname,/etc/hosts,/etc/krb5.conf --ipc --pid /cvmfs/singularity.opensciencegrid.org/fermilab/fnal-dev-sl7:latest

Then instead of setup_FNAL.sh, use setup.sh. Everything else is the same. Make sure to make clean away any al9 traces before trying to compile with sl7

Running the detector simulation and reconstruction

For most studies, you can use existing tmsreco.root files provided by the production. However, that may not always be the case. The command to run both detector simulation and reconstruction is ConvertToTMSTree.exe. It is controlled by config files in dune-tms/config.

Warning: turn off the time slicer for the plotting scripts in scripts/Reco to run properly. 
Open config/TMS_Default_Config.toml and set RunTimeSlicer = false
Currently there is a bug where some truth information (in particular, the true muon information) 
doesn't get passed on if the time slicer is run.

Here is an example of running on a single edep file.

ConvertToTMSTree.exe /pnfs/dune/persistent/users/kleykamp/example_edep_file_with_single_event_per_spill.root

To run on more files, you want to run on the grid in parallel. The official way of doing this is through the newer production code which I don't have experience with. Large samples would likely need to go through the ND production group, currently lead by Alex Booth. However, for medium-sized samples, you can use the ProcessND.py script. This script is becoming outdated but can still be used. Explanation of how to use it are on its readme and through --help. This script can also be used to run genie and edep-sim stages, but not LAr nor caf writer.

Plotting

Scripts are in dune-tms/scripts The .cpp scripts are out of date, but being updated by Sushil and Xiaoyan: Assuming you have a target root file, running on the cpp files works like this:

root -l -b -q -x 'muonke.cpp("/dune/data/users/kleykamp/2023-09-15_fix_muon_ke.tmsreco.root")'

If your file is in /pnfs, then you need to find the path using pnfs2xrootd. See xrootd section below. If you have many individual files, see "Combining root files" section below.

More about the muon ke script and how it works

Basics

On a basic level, the plotting code is taking the output of the simulation and making plots for it. It does this by looping through the provided tmsreco file, and event by event adding information to the histogram. The same tmsreco file can be reused, even when cuts are adjusted.

Glossary

  • Hit One scintillator that's lit up in the simulation.
  • Cluster The reconstruction algorithm can group up nearby hits to create clusters. These are groups of hits that don't look track-like. Usually correspond to hits from neutrons or maybe electrons.
  • Track is a reconstructed line of hits done by the reco algorithms. For the TMS, we're often trying to find the muons. The muons will usually make long tracks.
  • Occupancy is the ratio of total energy on a track / total energy in the event. A track with a low occupancy is usually an indication that most of the energy is somewhere else in the event. So either it's an event with a lot of random energy, or there's a track that contains more of the visible energy.

Pseudocode

Looking at a pseudocode breakdown of the code might help with understanding how it works:

open file and get Truth_Info and Line_Candidates TTrees
turn off all root branches, and then turn only the useful ones back on. This sometimes speeds things up
make histograms
for i in range(n events)
   load event in Line_Candidates 
   load same event in Truth_Info
   make sure the two trees are synchronized 
   print status every
   
   # Cut section
   check that there's a muon, if not skip this event 
   we're only using true CCmu events, check that there's a true primary muon (as opposed to muons that were created after the initial neutrino interaction). If not, skip the event
   make sure there's at least one reconstructed line (aka track), which might be the muon. If not, skip the event.
   
   # Adjustable cuts section
   check that n lines <= nLinesCut, otherwise skip event
   check that n clusters <= nClustersCut, otherwise skip event
   check that total cluster energy <= ClusterEnergyCut, otherwise skip event
    
   # Now find the best track
   The "Find the best track" section finds the track with the highest occupancy. The highest occupancy track is most likely the muon that we're interested in.
   
   # Also check the track with the longest track length
   This finds the longest track. Muons make the longest tracks compared to other particles. So, if there's a reconstructed muon, it's most likely longest track. 

   # And also check longest track
   This finds the longest track of the event by using x and z corrdinates. The previous section found the longest track by density.

   Now it tracks how often the longest track by density (lon_trklen) is not the longest track by distance (longtrack).
   Makes the reco track we use, the longest track by density.
   So now longtrack is the index of the track we're using for our plots.

   # Additional cuts
   check if the true muon died inside the detector based on y position. If not, skip event
   check that longtrack stops and starts inside the detector. It first looks at the z position. There are two options. AllDet=true uses the whole detector (starting at x = 11362+55*2), while AllDet=false uses only up to x=13600. The front of the detector has thinner steel so the energy resolution is higher looking only at this region.
    It also checks that the last hit of the track is towards the end (above z = 18294-80*2). Muons make long tracks so tracks shorter than this are unlikely to be muons.
    Finally it checks the x posiitons makes sure the first and last hit of the tracks are at least 20cm inside the TMS from the sides. This is to decrease the chances of the muon leaving.

    Now it fills the occupancy hist
    Now it applies the occupancy cut. If not, skip event
    Now it fills the KE and KEest plots.

do line fit to reco ke vs true ke plot
plot histograms in pdf

Cuts

  • AtLeastOneLine Require that we reconstructed at least one line. Otherwise there's nothing to plot.
  • CCmuOnly Look at only true CC muons, as opposed to all possible muons.
  • AllDet Use whole detector or only the front region where the steel is thinner.
  • nLinesCut An event with a single reco muon is going to be cleaner than an event with many tracks. So the energy resolution for events that allow only 1 reconstructed track maximum might be better at the cost of fewer muons plotted.
  • nClustersCut The maximum number of clusters. More clusters usually mean a dirtier event so the energy resolution is likely less.
  • ClusterEnergyCut The maximum energy in clusters. More energy in clusters usually mean a dirtier event so the energy resolution is likely less.
  • OccupancyCut The minimum occupancy the reconstructed track needs to have to be plotted in the KE plots. Higher occupancy tracks are more likely to be very pure muon events with little energy lost in other processes. So this usually gives you a better true vs reco energy, at the cost of plotting fewer muons. So it's a tradeoff.
  • There are also non-adjustible cuts within the for loop. Clearly Clarence thought they were needed, but their motivation should be understood.

Plotting in python

Also working on python scripts:

python make_hists.py --help
optional arguments:
  -h, --help            show this help message and exit
  --outdir OUTDIR       The output dir. Will be made if it doesn't exist.
  --name NAME           The name of the output files.
  --indir INDIR         The location of the input tmsreco files
  --inlist INLIST       The input filelist
  --filename FILENAME, -f FILENAME
                        The input file, if you have a single file
  --nevents NEVENTS, -n NEVENTS
                        The maximum number of events to loop over
  --allow_overwrite, --no-allow_overwrite
                        Allow the output file to overwrite
  --preview, --no-preview
                        Save preview images of the histograms

And python make_plots.py --help

Example, python make_hists.py --f /dune/data/users/kleykamp/tms_testing_files/2023-10-16_fixing_reco_everything_off_fixed_time_slicer_off_all_events.tmsreco.root --name my_file.root --allow_overwrite --preview Default output will be in /dune/data/users/$USER/dune-tms_hists

Drawing spills

Another thing one might do is to draw some event displays. One way to do that is with Reco/draw_spill.py (not TimeSlicer/draw_spill.py).

Example usage: python draw_spill.py --input_filename /dune/data/users/kleykamp/tms_testing_files/2023-10-16_fixing_reco_everything_off_fixed_time_slicer_off.tmsreco.root --outdir 2023-10-16_fixing_reco/everything_off_only_true_tms_muons --only_true_tms_muons

Combining root files

We can combine root files to use more than one input file in muonke.cpp. Here's an example assuming 2023-09-15_fix_muon_ke.tmsreco.root is your output file and you're merging all files in /pnfs/dune/persistent/users/kleykamp/nd_production_output/2023-09-15_fix_muon_ke/tmsreco/FHC/00m/00/. The code is using the hadd utility to add the TTrees inside each root file. Hadding ttree files can speed things up because there's less overhead. The setup command sets up pnfs2xrootd which isn't setup by default in the regular setup script. That's needed because you don't want to read root files directly.

setup -j duneutil v09_78_03d01 -q e20:prof
hadd -f /exp/dune/data/users/kleykamp/2023-09-15_fix_muon_ke.tmsreco.root $(find /pnfs/dune/persistent/users/kleykamp/nd_production_output/2023-09-15_fix_muon_ke/tmsreco/FHC/00m/00/ -name "*root" -exec pnfs2xrootd {} \;)

Authentication for

kx509
voms-proxy-init --noregen -rfc -voms dune:/dune/Role=Analysis

xrootd

Using xrootd is super important. It prevents overloading of the servers. You can use pnfs2xrootd <filename>. First set up pnfs2xrootd by loading duneutil, which isn't set up by default.

setup -j duneutil v09_78_03d01 -q e20:prof 
pnfs2xrootd /pnfs/dune/persistent/users/kleykamp/nd_production_output/2023-09-15_
fix_muon_ke/tmsreco/FHC/00m/00/neutrino.0_1671124115.tmsreco.root 
# output
root://fndca1.fnal.gov:1094//pnfs/fnal.gov/usr/dune/persistent/users/kleykamp/nd_production_output/2023-09-15_fix_muon_ke/tmsreco/FHC/00m/
00/neutrino.0_1671124115.tmsreco.root

There's also samweb2xrootd if you have a file in sam. See this issue for more information.

Example edep files to run with ConvertToTMSTree

Simple example files edep files which are inputs to ConvertToTMSTree (these cannot be used for the plotting scripts above, which expect tmsreco.root files). The first has 1 event per spill. The second has an overlay simulation without rock samples.

/pnfs/dune/persistent/users/kleykamp/example_edep_file_with_single_event_per_spill.root

/pnfs/dune/persistent/users/kleykamp/example_edep_file_with_overlay.root

To run more, you probably should ask production. There is this script to do it manually if it's a medium amount. Here are some files to to run over.

/pnfs/dune/persistent/users/kleykamp/nd_production_output

This one has decent statistics with 1 event per spill.

/pnfs/dune/persistent/users/kleykamp/nd_production_output/2022-12-15_simple_spill/edep

Also,

/pnfs/dune/persistent/users/marshalc/LArTMSProductionJun23withLArCV/edep/FHC/00m/00