-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from columnflow/sync_overleaf
Sync overleaf
- Loading branch information
Showing
1 changed file
with
41 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,44 @@ | ||
\section{Writing a Producer}\label{sec:producer} | ||
|
||
The \CCSPStlye{Producer} class is used to calculate higher-level observables and define new columns to be written o disk. The corresponding task is called \CCSPStlye{cf.ProduceColumns} (see Ref.~\cite{cf_repo} for detailed info). Naturally, we only want to compute these new variables for the relevant events for our analysis. Thus, the producers are executed after the selection step. | ||
The \CCSPStlye{Producer} class is used to calculate higher-level variables and define new array columns to be written to disk. | ||
The corresponding task is called \CCSPStlye{cf.ProduceColumns} (see Ref.~\cite{cf_repo} for detailed information). | ||
Naturally, we only want to compute these new variables for the relevant events for our analysis. | ||
Thus, the producers are executed after the selection step in the task graph. | ||
|
||
In this part of the tutorial, we will write a producer which calculates the four lepton invariant mass. | ||
The \code{H4L} analysis includes three exemplary \CCSPStlye{Producer}s in \code{h4l/production/example.py}. | ||
You will notice that the script starts by importing all relevant modules, including CMS specific ones. | ||
\columnflow provides \CCSPStlye{Producer}s to compute commonly used event information, such as MC, pdf or pileup weights, which can be found under \code{columnflow.production.cms}. | ||
We also load both \code{numpy} and \code{awkward} with the \code{maybe\_import} mechanism to account for differences between the software environments for different parts of the Task graph | ||
|
||
We start by defining a new \CCSPStlye{Producer} class named \code{features}. | ||
This class requires the transverse jet momentum \code{Jet.pt}, which must be added to its \code{uses} set. | ||
Additionally, it produces two new array columns, the total jet transverse momentum \code{ht} and the number of jets in an event \code{n\_jet}, which are both added to its \code{produces} set. | ||
Each of these new variables is computed and then added to the \code{events} array with the \code{set\_ak\_column} function. | ||
This is necessary to make these variables available outside of the \CCSP{Producer}, e.g.\ for writing the information to disk. | ||
Note that for the case of \code{n\_jet}, we specified that the column element must be an \code{int} value. | ||
|
||
The second \CCSPStlye{Producer} class \code{cutflow\_features} allows us to define and store features to be used for cutflow plots. Here, in addition to \code{Jet.pt} we also require \code{mc\_weight} and \code{category\_ids} to be added to the \code{uses} set. Note that both of these are \CCSPStlye{Producer}s themselves which you can find by following the import path at the beginning of the script. | ||
|
||
The \CCSPStlye{Producer} class \code{mc\_weight} reads in the \code{genWeight} column and, if existent, the \code{LHEWeight} column, both stored in \code{events}. Since these columns are required, they are both added to the \code{uses} set of \code{mc\_weight}. By extension, when we call \code{mc\_weight} in our \code{uses} set, we are calling these columns as well. The \code{mc\_weight} class simply decides which one of these weights to use and saves the decision as a new column, also named \code{mc\_weight}, which is included in its \code{produces} set. At this point, we also have the option to add the \code{mc\_weight} class to our own \code{produces} set. In this way, the new column also gets created and saved to disk. | ||
|
||
Meanwhile, the \code{category\_ids} class assigns each event an array of category ids, which it stores as a new column also named \code{category\_ids}. Thus, we must also add this class to our \code{produces} set. The topic of defining categories is discussed in detail in the Section \ref{sec:categories}. Now that we have access to both these \CCSPStlye{Producer} classes, the \code{cutflow\_features} class can use them to attribute MC weights (if the dataset passed to it is tagged as an MC dataset) and category ids to \code{events}. It then creates a new column in the updated \code{events} object named \code{cutflow.jet1\_pt} which saves the transverse momentum of the most energetic jet in each event stored in \code{Jet.pt\text{[:,0]}}. If the event does not contain jets, it instead saves an \code{EMPTY\_FLOAT} value. | ||
|
||
The last \CCSPStlye{Producer} class defined is \code{example} and follows the same structure as the two previously explained \CCSPStlye{Producer}s. | ||
First, it starts by creating the \code{cutflow.jet1\_pt} column by using the \CCSPStlye{Producer} class \code{features} called at \code{\text{events=self[features](events, **kwargs)}}. | ||
It then applies category ids and deterministic seeds to the updated \code{events} object. | ||
Lastly, two additional modules are called in this example. | ||
First, the \code{normalization_weights} producer is used to reweight the cross section of simulated events to the values that are provided in the metadata database (in this case \code{cmsdb}). | ||
Additionally, the \code{muon_weights} producer applies scale factors for muons provided by the CMS Muon POG to facilitate a better compatibility of data and simulation. | ||
|
||
\begin{exercise}{Understanding some basic Producers} | ||
Familiarize yourself with the \CCSPStlye{Producer} classes mentioned above. | ||
\end{exercise} | ||
|
||
In this \code{H4L} analysis we want to calculate the four-lepton invariant mass, which should exhibit a peak near the Higgs rest mass for signal events. | ||
Note that to perform four-vector calculations, you need to import \code{attach\_coffea\_behavior} from \code{columnflow.production.util}. | ||
You will need to use kinematic information from both the \code{events.Electron} and \code{events.Muon} collections and create a new column which stores your calculated invariant mass. | ||
|
||
\begin{exercise}{Writing a Producer}{h4l/production/invariant_mass.py} | ||
Write a \CCSPStlye{Producer} class which computes the four lepton invariant mass. | ||
\end{exercise} | ||
|