Merge pull request #144 from sofia-calgaro/main

some new docu
legend-exp · Mar 11, 2024 · 98f6ec6 · 98f6ec6
2 parents 3697c48 + d4e594d
commit 98f6ec6
Show file tree

Hide file tree

Showing 6 changed files with 111 additions and 142 deletions.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -6,7 +6,7 @@ In particular, this tool helps:
 
 * set up dataframe objects containing channel map and status for a given subsystems (pulser, geds, spms)
 * get data for parameters (from raw/dsp/hit tiers or user defined ones) of interest based on a given dataset
-* inspect parameters by providing either a time interval, runs or keys to inspect
+* inspect parameters by providing either a time interval, a list of run(s) or key(s) to inspect
 * plotting status maps (e.g., ON/OFF/...) for each channel, spotting those that are problematic when overcoming/undercoming given thresholds
 
 Getting started

diff --git a/docs/source/manuals/avail_pars.rst b/docs/source/manuals/avail_pars.rst
@@ -44,6 +44,14 @@ Available parameters
   - you can pick only ``phy`` or ``all`` entries
   - you can flag special events, like ``pulser``, ``pulser01ana``, ``FCbsln`` or ``muon`` events
 
+.. warning::
+
+  It has been found out that no muon signals were being recorded in the auxiliary channel MUON01 for periods p08 and p09 (up to r003 included).
+  This means the present code is not able to flag the germanium events for which there was a muon crossing the experiment.
+  In other words, the dataframe associated to the ``muon`` events here will be empty.
+  Moreover, if you select ``phy`` entries, these will still contain muons since the cut over this does not work.
+
+
 .. important::
 
   Special parameters are typically saved under ``settings/special-parameters.json`` and carefully handled when loading data.
diff --git a/docs/source/manuals/get_plots.rst b/docs/source/manuals/get_plots.rst
@@ -7,9 +7,9 @@ After the installation, a executable is available at ``~/.local/bin``.
 To automatically generate plots, two different methods are available.
 All methods rely on the existence of a config file containing the output folder (``output``)
 where to store results, the ``dataset`` you want to inspect, and the ``subsystems`` (pulser, geds, spms)
-you want to study and for which you want to load data.
+you want to study and for which you want to load data. See next section for more details.
 
-You can either run it by importing the ``legend-data-monitor`` module:
+You can either run the code by importing the ``legend-data-monitor`` module:
 
 .. code-block:: python
 
@@ -23,11 +23,21 @@ Or run it by parsing to the executable the path to the config file:
 
   $ legend-data-monitor user_prod --config path_to_config.json
 
+If you want to inspect bunches of data (useful to avoid the process to get killed
+when loading lots of heavy files), you can use
+
+.. code-block:: bash
+
+  $ legend-data-monitor user_bunch --config path_to_config.json --n_files N
+
+where ``N`` specifies how many files you want to inspect together at each iteration e.g. ``N=40``
+(one run is usually made up of ca. 160 files).
+
+
 .. warning::
 
   Use the ``user_prod`` command line interface for generating your own plots.
-  ``auto_prod`` was designed to be used during automatic data production, for generating monitoring plots on the fly when processing data. For the moment, no documentation will be provided.
-  ``user_rsync_prod`` was designed to be used by an user for a personal automatic plot generation, using rsync to synchronize with lh5 files automatically produced.
+  ``auto_prod`` and ``user_rsync_prod`` were designed to be used during automatic data production, for generating monitoring plots on the fly for new processed data. For the moment, no documentation will be provided.
 
 
 Configuration file
@@ -40,12 +50,12 @@ Example config
 .. code-block:: json
 
  {
-  "output": "<some_path>/out", // output folder
+  "output": "<output_path>", // output folder
   "dataset": {
     "experiment": "L200",
-    "period": "p02",
-    "version": "v06.00",
-    "path": "/data1/users/marshall/prod-ref",
+    "period": "p09",
+    "version": "tmp-auto",
+    "path": "/data2/public/prodenv/prod-blind/",
     "type": "phy",// data type (either cal, phy, or ["cal", "phy"])
     "start": "2023-02-07 02:00:00",  // time cut (here based on start+end)
     "end": "2023-02-07 03:30:00"
@@ -86,16 +96,8 @@ In particular, ``dataset`` settings are:
   - ``'window': '1d 2h 0m'`` ( time window in the past from current time point) in format ``Xd Xh Xm`` for days, hours, minutes;
   - ``'runs': 1`` (one run) or ``'runs': [1, 2, 3]`` (list of runs) in integer format.
 
-..
-  Note: currently taking range between earliest and latest i.e. also including the ones in between that are not listed, will be modified to either
-
-  1. require only two timestamps as start and end, or
-  2. get only specified timestamps (strange though, because would have gaps in the plot)
-
-  The same happens with run selection.
-
 
-Then, ``subsystems`` can either be ``pulser``, ``geds`` or ``spms`` (note, 2023-03-07: spms plots are not implemented yet, but DataLoader can load the respective data if needed).
+Then, ``subsystems`` can either be ``pulser``, ``geds`` or ``spms`` (note: spms plots are not implemented yet, but DataLoader can load the respective data if needed).
 
 For each subsystem to be plotted, specify
 

diff --git a/docs/source/manuals/get_sc_plots.rst b/docs/source/manuals/get_sc_plots.rst
@@ -7,49 +7,84 @@ How to load SC data
 A number of parameters related to the LEGEND hardware configuration and status are recorded in the Slow Control (SC) database.
 The latter, PostgreSQL database resides on the ``legend-sc.lngs.infn.it`` host, part of the LNGS network.
 To access the SC database, follow the `Confluence (Python Software Stack) <https://legend-exp.atlassian.net/wiki/spaces/LEGEND/pages/494764033/Python+Software+Stack>`_ instructions.
-Data are loaded following the ``pylegendmeta`` tutorial , which shows how to inspect the database.
+Data are loaded following the `pylegendmeta <https://github.com/legend-exp/pylegendmeta>`_ tutorial, which shows how to retrieve info from the SC database.
 
 
-... put here some text on how to specify the plotting of a SC parameter in the config file (no ideas for the moment)...
+Available SC parameters
+-----------------------
+
+Available parameters at the moment include:
+
+* ``PT114``, ``PT115``, ``PT118`` (cryostat pressures)
+* ``PT202``, ``PT205``, ``PT208`` (cryostat vacuum)
+* ``LT01`` (water loop fine fill level)
+* ``RREiT`` (injected air temperature clean room), ``RRNTe`` (clean room temperature north), ``RRSTe`` (clean room temperature south), ``ZUL_T_RR`` (supply air temperature clean room)
+* ``DaqLeft-Temp1``, ``DaqLeft-Temp2``, ``DaqRight-Temp1``, ``DaqRight-Temp2`` (rack present temperatures)
+* if you want more, contact us!
+
+These can be easily access for any time range of interest by giving a my_config.json file as input to the command line in the following way:
+
+.. code-block::
+
+  legend-data-monitor user_scdb --config my_config --port N --pswd ThePassword
+
+.. note::
 
+  - ``N`` is whatever number in the range 1024-65535. Setting a personal port different from the default one (5432) is a safer option, otherwise if a port is already in use by another user, you'll receive an error indicating that the port is already taken and you will not be able to access the SC database;
+  - ``ThePassword`` can be found on Confluence at `this page <https://legend-exp.atlassian.net/wiki/spaces/LEGEND/pages/494764033/Python+Software+Stack#Metadata-access>`_.
 
-Files are collected in the output folder specified in the ``output`` config entry:
+An example of a config.json file is the following:
 
 .. code-block:: json
 
   {
-  "output": "<some_path>/out",
-  // ...
+  "output": "/data1/users/<your_username>/prod-ref-v2",
+  "dataset": {
+    "experiment": "L200",
+    "period": "p09",
+    "version": "tmp-auto",
+    "path": "/data2/public/prodenv/prod-blind/",
+    "type": "phy",
+    "time_selection": ...
+    },
+  "saving": "overwrite",
+  "slow_control": {
+    "parameters": ["DaqLeft-Temp1", "ZUL_T_RR"]
+    }
+  }
 
-In principle, for plotting the SC data you would need just the start and the end of a time interval of interest. This means that SC data does not depend on any dataset info (``experiment``, ``period``, ``version``, ``type``) but ``time_selection``.
-However, there are cases were we want to inspect a given run or time period made of keys as we usually do with germanium.
+The meaning of each entry is explained below:
 
-In the first case, we end up saving data in the following folder:
+* ``output``: foldeer where to store output files;
+* ``dataset``:
 
-.. code-block::
+    * ``experiment``: either *L60* (to be checked) or *L200*
+    * ``period``: period to inspect
+    * ``version``: prodenv version (eg *tmp-auto* or *ref-v1.0.0*)
+    * ``path``: global path to prod-blind prodenv folder
+    * ``type``: type of data to inspect (either *cal* or *phy*)
+    *  ``time selection``: list of either ``runs`` or ``timestamps`` (use the format *YMDTHMSZ*), or add entries ``start`` and ``end`` with format *Y-M-D H:M:S* (see below for more detailed info)
+
+* ``saving``: either *overwrite* (overwrites any already present file) or *append* (takes the previous file and append new data, eg for a new inspected time range)
+* ``slow_control``: filed for specifying SC parameters
+
+    * ``parameters``: list of parameters to inspect (see among the available ones what you can choose)
 
-  <some_path>/out/
-    └── generated
-      └── plt
-        └── SC
-          └── <time_selection>
-            ├── SC-<time_selection>.pdf
-            ├── SC-<time_selection>.log
-            └── SC-<time_selection>.{dat,bak,dir}
 
-Otherwise, we store the SC data/plots as usual:
+In principle, for plotting the SC data you would need just the start and the end of a time interval of interest. This means that SC data does not depend on any dataset info (i.e. on entries ``experiment``, ``period``, ``version``, ``type``).
+However, these entries are important to retrieve any channel map of interest for the given time range of interest.
+
+We store SC data in the following way:
 
 .. code-block::
 
-  <some_path>/out/
+  <output>
     └── generated
       └── plt
         └── <type>
           └── <period>
-            └── SC
               └── <time_selection>
-                ├── SC-<time_selection>.pdf
-                ├── SC-<time_selection>.log
+                ├── SC-<time_selection>.hdf
                 └── SC-<time_selection>.{dat,bak,dir}
 
 
@@ -62,19 +97,3 @@ Otherwise, we store the SC data/plots as usual:
   - if ``{'timestamps': ['20230207T103123Z', '20230207T141123Z', '20230207T083323Z']}`` (multiple keys), then <time_selection> = ``20230207T083323Z_20230207T141123Z`` (min/max timestamp interval)
   - if ``{'runs': 1}`` (one run), then <time_selection> = ``r001``;
   - if ``{'runs': [1, 2, 3]}`` (multiple runs), then <time_selection> = ``r001_r002_r003``.
-
-Shelve output objects
-~~~~~~~~~~~~~~~~~~~~~
-*Under construction...*
-
-
-Available SC parameters
------------------------
-
-Available parameters include:
-
-- ``PT114``, ``PT115``, ``PT118`` (cryostat pressures)
-- ``PT202``, ``PT205``, ``PT208`` (cryostat vacuum)
-- ``LT01`` (water loop fine fill level)
-- ``RREiT`` (injected air temperature clean room), ``RRNTe`` (clean room temperature north), ``RRSTe`` (clean room temperature south), ``ZUL_T_RR`` (supply air temperature clean room)
-- ``DaqLeft-Temp1``, ``DaqLeft-Temp2``, ``DaqRight-Temp1``, ``DaqRight-Temp2`` (rack present temperatures)
diff --git a/docs/source/manuals/index.rst b/docs/source/manuals/index.rst
@@ -6,4 +6,5 @@ User Manual
 
    avail_pars
    get_plots
+   get_sc_plots
    inspect_plots
diff --git a/docs/source/manuals/inspect_plots.rst b/docs/source/manuals/inspect_plots.rst
@@ -4,24 +4,30 @@ How to inspect plots
 Output files
 ------------
 
-After the code has run, shelve object files containing the data and plots generated for the inspected parameters/subsystems
+After the code has run, hdf object files containing the data and plots generated for the inspected parameters/subsystems
 are produced, together with a pdf file containing all the generated plots and a log file containing running information. In particular,
 the last two files are created for each inspected subsystem (pulser, geds, spms).
 
+.. warning::
+
+  Shelve files are produced as an output as well, this was the first format chosen for the output.
+  The code still has to be fixed to remove these files from routines.
+  At the moment, they are important when using the ``"saving": "append"`` option, so do not remove them if you are going to use it!
+
 Files are usually collected in the output folder specified in the ``output`` config entry:
 
 .. code-block:: json
 
   {
-  "output": "<some_path>/out",
+  "output": "<output_path>",
   // ...
 
 Then, depending on the chosen dataset (``experiment``, ``period``, ``version``, ``type``, time selection),
 different output folders can be created. In general, the output folder is structured as it follows:
 
 .. code-block::
 
-  <some_path>/out/
+  <output_path>
     └── prod-ref
       └── <version>
         └── generated
@@ -32,6 +38,7 @@ different output folders can be created. In general, the output folder is struct
                   ├── <experiment>-<period>-<time_selection>-<type>-<subsystem>.pdf
                   ├── <experiment>-<period>-<time_selection>-<type>-<subsystem>.log
                   └── <experiment>-<period>-<time_selection>-<type>.{dat,bak,dir}
+                  �~T~T�~T~@�~T~@ <experiment>-<period>-<time_selection>-<hdf
 
 
 Files are usually saved using the following format ``exp-period-datatype-time_interval``:
@@ -52,95 +59,27 @@ Files are usually saved using the following format ``exp-period-datatype-time_in
   - if ``{'runs': [1, 2, 3]}`` (multiple runs), then <time_selection> = ``r001_r002_r003``.
 
 
-Shelve output objects
-~~~~~~~~~~~~~~~~~~~~~
-*Under construction... (structure might change over time, but content should remain the same)*
+Output .hdf files
+-------------
 
-The output object ``<experiment>-<period>-<time_selection>-<type>.{dat,bak,dir}`` has the following structure:
+Output hdf files for ``geds`` have the following dictionary structure, where ``<param>`` is the name of one of the inspected parameters, ``<flag>`` is the event type, e.g. *IsPulser* or *IsBsln*:
 
-.. code-block::
+- ``<flag>_<param>_info`` = some useful info
+- ``<flag>_<param>`` = absolute values
+- ``<flag>_<param>_mean`` = average over the first 10% of data (within the selected time window) of ``<flag>_<param>``
+- ``<flag>_<param>_var`` = % variations of ``<param>`` wrt ``<flag>_<param>_mean``
+- ``<flag>_<param>_pulser01anaRatio`` = ratio of absolute values ``<flag>_<param>`` with PULS01ANA absolute values
+- ``<flag>_<param>_pulser01anaRatio_mean`` = average over the first 10% of data (within the selected time window) of ``<flag>_<param>_pulser01anaRatio``
+- ``<flag>_<param>_pulser01anaRatio_var`` = % variations of ``<flag>_<param>_pulser01anaRatio`` wrt ``<flag>_<param>_pulser01anaRatio_mean``
+- ``<flag>_<param>_pulser01anaDiff`` = difference of absolute values ``<flag>_<param>`` with PULS01ANA absolute values
+- ``<flag>_<param>_pulser01anaDiff_mean`` = average over the first 10% of data (within the selected time window) of ``<flag>_<param>_pulser01anaDiff``
+- ``<flag>_<param>_pulser01anaDiff_var`` = % variations of ``<flag>_<param>_pulser01anaDiff`` wrt ``<flag>_<param>_pulser01anaDiff_mean``
 
-  <experiment>-<period>-<time_selection>-<type>
-      └── monitoring
-            ├── pulser // event type
-            │   └── cuspEmax_ctc_cal // parameter
-            │   	├── 4 // this is the channel FC id
-            │   	│       ├── values // these are y plot-values shown
-            │           │       │     ├── all // every timestamp entry
-            │           │       │     └── resampled // after the resampling
-            │           │	├── timestamp // these are plot-x values shown
-            │           │       │     ├── all
-            │           │       │     └── resampled
-            │           │ 	├── mean // mean over the first 10% of data within the range inspected by the user
-            │   	│	└── plot_info // some useful plot-info: ['title', 'subsystem', 'locname', 'unit', 'plot_style', 'parameter', 'label', 'unit_label', 'time_window', 'limits']
-            │   	├── ...other channels...
-            │   	├── df_geds // dataframe containing all geds channels for a given parameter
-            │   	├── <figure> // Figure object
-            │   	└── map_geds // geds status map (if present)
-            ├─all
-            │   └── baseline
-            │   	├── ...channels data/info...
-            │   	└── ...other summary objects (df/status map/figures)...
-            │   └── wf_max
-            │   	└── ...
-            └──phy
-                └── ...
-
-One way to open it and inspect the saved objects for a given channel, eg. ID='4', is to do
-
-.. code-block:: python
-
-  import shelve
-
-  with shelve.open("<experiment>-<period>-<time_selection>-<type>") as file:
-    # get y values
-    all_data_ch4 = file['monitoring']['pulser']['baseline']['4']['values']['all']
-    resampled_data_ch4 = file['monitoring']['pulser']['baseline']['4']['values']['resampled']
-    # get info for plotting data
-    plot_info_ch4 = file['monitoring']['pulser']['baseline']['4']['plot_info']
-
-To get the corresponding dataframe (containing all channels with map/status info and loaded parameters), you can use
-
-.. code-block:: python
-
-  import shelve
-
-  with shelve.open("<experiment>-<period>-<time_selection>-<type>") as file:
-    df_geds = file['monitoring']['pulser']['baseline']['df_geds'].data
-
-To open the saved figure for a given parameter, one way to do it is through
-
-.. code-block:: python
-
-  import io
-  from PIL import Image
-  with io.BytesIO(shelf['monitoring']['pulser']['baseline']['<figure>']) as obj:
-    # create a PIL Image object from the bytes
-    pil_image = Image.open(obj)
-    # convert the image to RGB color space (to enable PDF saving)
-    pil_image = pil_image.convert('RGB')
-    # save image to disk
-    pil_image.save('figure.pdf', bbox_inches="tight")
-
-.. important::
-
-The key name ``<figure>`` changes depending on the used ``plot_style`` for producing that plot. In particular,
-
-- if you use ``"plot_style": "per channel"``, then ``<figure> = figure_plot_string_<string_no>``, where ``string_no`` is the number of one of the available strings;
-- if you use ``"plot_style": "per cc4"`` or ``"per string"`` or ``"array"``, then ``<figure> = figure_plot``;
-- if you use ``"plot_style": "per barrel"``, then ``<figure> = figure_plot_<location>_<position>``, where ``<location>`` is either "IB" or "OB, while ``<position>`` is either "top" or "bottom".
 
-.. note::
-
-  There is no need to create one shelve object for each inspected subsystem.
-  Indeed, one way to separate among pulser, geds and spms is to look at channel IDs.
-  In any case, the subsystem info is saved under ``["monitoring"][<event_type>][<parameter>]["plot_info"]["subsystem"]``.
 
 
 Inspect plots
 -------------
 
-*Under construction*
-
-- Near future: `Dashboard <https://legend-exp.atlassian.net/wiki/spaces/LEGEND/pages/637861889/Monitoring+Dashboard+Manual>`_ tool
-- Future: notebook to interactively inspect plots (with buttons?)
+- Some standard plots to monitor detectors' response can be found online on the `Dashboard <https://legend-exp.atlassian.net/wiki/spaces/LEGEND/pages/637861889/Monitoring+Dashboard+Manual>`_
+- Some notebooks to interactively inspect plots can be found under the ``notebook`` folder