Skip to content

Commit

Permalink
Merge pull request #144 from sofia-calgaro/main
Browse files Browse the repository at this point in the history
some new docu
  • Loading branch information
sofia-calgaro authored Mar 11, 2024
2 parents 3697c48 + d4e594d commit 98f6ec6
Show file tree
Hide file tree
Showing 6 changed files with 111 additions and 142 deletions.
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ In particular, this tool helps:

* set up dataframe objects containing channel map and status for a given subsystems (pulser, geds, spms)
* get data for parameters (from raw/dsp/hit tiers or user defined ones) of interest based on a given dataset
* inspect parameters by providing either a time interval, runs or keys to inspect
* inspect parameters by providing either a time interval, a list of run(s) or key(s) to inspect
* plotting status maps (e.g., ON/OFF/...) for each channel, spotting those that are problematic when overcoming/undercoming given thresholds

Getting started
Expand Down
8 changes: 8 additions & 0 deletions docs/source/manuals/avail_pars.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ Available parameters
- you can pick only ``phy`` or ``all`` entries
- you can flag special events, like ``pulser``, ``pulser01ana``, ``FCbsln`` or ``muon`` events

.. warning::

It has been found out that no muon signals were being recorded in the auxiliary channel MUON01 for periods p08 and p09 (up to r003 included).
This means the present code is not able to flag the germanium events for which there was a muon crossing the experiment.
In other words, the dataframe associated to the ``muon`` events here will be empty.
Moreover, if you select ``phy`` entries, these will still contain muons since the cut over this does not work.


.. important::

Special parameters are typically saved under ``settings/special-parameters.json`` and carefully handled when loading data.
36 changes: 19 additions & 17 deletions docs/source/manuals/get_plots.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ After the installation, a executable is available at ``~/.local/bin``.
To automatically generate plots, two different methods are available.
All methods rely on the existence of a config file containing the output folder (``output``)
where to store results, the ``dataset`` you want to inspect, and the ``subsystems`` (pulser, geds, spms)
you want to study and for which you want to load data.
you want to study and for which you want to load data. See next section for more details.

You can either run it by importing the ``legend-data-monitor`` module:
You can either run the code by importing the ``legend-data-monitor`` module:

.. code-block:: python
Expand All @@ -23,11 +23,21 @@ Or run it by parsing to the executable the path to the config file:
$ legend-data-monitor user_prod --config path_to_config.json
If you want to inspect bunches of data (useful to avoid the process to get killed
when loading lots of heavy files), you can use

.. code-block:: bash
$ legend-data-monitor user_bunch --config path_to_config.json --n_files N
where ``N`` specifies how many files you want to inspect together at each iteration e.g. ``N=40``
(one run is usually made up of ca. 160 files).


.. warning::

Use the ``user_prod`` command line interface for generating your own plots.
``auto_prod`` was designed to be used during automatic data production, for generating monitoring plots on the fly when processing data. For the moment, no documentation will be provided.
``user_rsync_prod`` was designed to be used by an user for a personal automatic plot generation, using rsync to synchronize with lh5 files automatically produced.
``auto_prod`` and ``user_rsync_prod`` were designed to be used during automatic data production, for generating monitoring plots on the fly for new processed data. For the moment, no documentation will be provided.


Configuration file
Expand All @@ -40,12 +50,12 @@ Example config
.. code-block:: json
{
"output": "<some_path>/out", // output folder
"output": "<output_path>", // output folder
"dataset": {
"experiment": "L200",
"period": "p02",
"version": "v06.00",
"path": "/data1/users/marshall/prod-ref",
"period": "p09",
"version": "tmp-auto",
"path": "/data2/public/prodenv/prod-blind/",
"type": "phy",// data type (either cal, phy, or ["cal", "phy"])
"start": "2023-02-07 02:00:00", // time cut (here based on start+end)
"end": "2023-02-07 03:30:00"
Expand Down Expand Up @@ -86,16 +96,8 @@ In particular, ``dataset`` settings are:
- ``'window': '1d 2h 0m'`` ( time window in the past from current time point) in format ``Xd Xh Xm`` for days, hours, minutes;
- ``'runs': 1`` (one run) or ``'runs': [1, 2, 3]`` (list of runs) in integer format.

..
Note: currently taking range between earliest and latest i.e. also including the ones in between that are not listed, will be modified to either
1. require only two timestamps as start and end, or
2. get only specified timestamps (strange though, because would have gaps in the plot)

The same happens with run selection.


Then, ``subsystems`` can either be ``pulser``, ``geds`` or ``spms`` (note, 2023-03-07: spms plots are not implemented yet, but DataLoader can load the respective data if needed).
Then, ``subsystems`` can either be ``pulser``, ``geds`` or ``spms`` (note: spms plots are not implemented yet, but DataLoader can load the respective data if needed).

For each subsystem to be plotted, specify

Expand Down
95 changes: 57 additions & 38 deletions docs/source/manuals/get_sc_plots.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,49 +7,84 @@ How to load SC data
A number of parameters related to the LEGEND hardware configuration and status are recorded in the Slow Control (SC) database.
The latter, PostgreSQL database resides on the ``legend-sc.lngs.infn.it`` host, part of the LNGS network.
To access the SC database, follow the `Confluence (Python Software Stack) <https://legend-exp.atlassian.net/wiki/spaces/LEGEND/pages/494764033/Python+Software+Stack>`_ instructions.
Data are loaded following the ``pylegendmeta`` tutorial , which shows how to inspect the database.
Data are loaded following the `pylegendmeta <https://github.com/legend-exp/pylegendmeta>`_ tutorial, which shows how to retrieve info from the SC database.


... put here some text on how to specify the plotting of a SC parameter in the config file (no ideas for the moment)...
Available SC parameters
-----------------------

Available parameters at the moment include:

* ``PT114``, ``PT115``, ``PT118`` (cryostat pressures)
* ``PT202``, ``PT205``, ``PT208`` (cryostat vacuum)
* ``LT01`` (water loop fine fill level)
* ``RREiT`` (injected air temperature clean room), ``RRNTe`` (clean room temperature north), ``RRSTe`` (clean room temperature south), ``ZUL_T_RR`` (supply air temperature clean room)
* ``DaqLeft-Temp1``, ``DaqLeft-Temp2``, ``DaqRight-Temp1``, ``DaqRight-Temp2`` (rack present temperatures)
* if you want more, contact us!

These can be easily access for any time range of interest by giving a my_config.json file as input to the command line in the following way:

.. code-block::
legend-data-monitor user_scdb --config my_config --port N --pswd ThePassword
.. note::

- ``N`` is whatever number in the range 1024-65535. Setting a personal port different from the default one (5432) is a safer option, otherwise if a port is already in use by another user, you'll receive an error indicating that the port is already taken and you will not be able to access the SC database;
- ``ThePassword`` can be found on Confluence at `this page <https://legend-exp.atlassian.net/wiki/spaces/LEGEND/pages/494764033/Python+Software+Stack#Metadata-access>`_.

Files are collected in the output folder specified in the ``output`` config entry:
An example of a config.json file is the following:

.. code-block:: json
{
"output": "<some_path>/out",
// ...
"output": "/data1/users/<your_username>/prod-ref-v2",
"dataset": {
"experiment": "L200",
"period": "p09",
"version": "tmp-auto",
"path": "/data2/public/prodenv/prod-blind/",
"type": "phy",
"time_selection": ...
},
"saving": "overwrite",
"slow_control": {
"parameters": ["DaqLeft-Temp1", "ZUL_T_RR"]
}
}
In principle, for plotting the SC data you would need just the start and the end of a time interval of interest. This means that SC data does not depend on any dataset info (``experiment``, ``period``, ``version``, ``type``) but ``time_selection``.
However, there are cases were we want to inspect a given run or time period made of keys as we usually do with germanium.
The meaning of each entry is explained below:

In the first case, we end up saving data in the following folder:
* ``output``: foldeer where to store output files;
* ``dataset``:

.. code-block::
* ``experiment``: either *L60* (to be checked) or *L200*
* ``period``: period to inspect
* ``version``: prodenv version (eg *tmp-auto* or *ref-v1.0.0*)
* ``path``: global path to prod-blind prodenv folder
* ``type``: type of data to inspect (either *cal* or *phy*)
* ``time selection``: list of either ``runs`` or ``timestamps`` (use the format *YMDTHMSZ*), or add entries ``start`` and ``end`` with format *Y-M-D H:M:S* (see below for more detailed info)

* ``saving``: either *overwrite* (overwrites any already present file) or *append* (takes the previous file and append new data, eg for a new inspected time range)
* ``slow_control``: filed for specifying SC parameters

* ``parameters``: list of parameters to inspect (see among the available ones what you can choose)

<some_path>/out/
└── generated
└── plt
└── SC
└── <time_selection>
├── SC-<time_selection>.pdf
├── SC-<time_selection>.log
└── SC-<time_selection>.{dat,bak,dir}

Otherwise, we store the SC data/plots as usual:
In principle, for plotting the SC data you would need just the start and the end of a time interval of interest. This means that SC data does not depend on any dataset info (i.e. on entries ``experiment``, ``period``, ``version``, ``type``).
However, these entries are important to retrieve any channel map of interest for the given time range of interest.

We store SC data in the following way:

.. code-block::
<some_path>/out/
<output>
└── generated
└── plt
└── <type>
└── <period>
└── SC
└── <time_selection>
├── SC-<time_selection>.pdf
├── SC-<time_selection>.log
├── SC-<time_selection>.hdf
└── SC-<time_selection>.{dat,bak,dir}
Expand All @@ -62,19 +97,3 @@ Otherwise, we store the SC data/plots as usual:
- if ``{'timestamps': ['20230207T103123Z', '20230207T141123Z', '20230207T083323Z']}`` (multiple keys), then <time_selection> = ``20230207T083323Z_20230207T141123Z`` (min/max timestamp interval)
- if ``{'runs': 1}`` (one run), then <time_selection> = ``r001``;
- if ``{'runs': [1, 2, 3]}`` (multiple runs), then <time_selection> = ``r001_r002_r003``.

Shelve output objects
~~~~~~~~~~~~~~~~~~~~~
*Under construction...*


Available SC parameters
-----------------------

Available parameters include:

- ``PT114``, ``PT115``, ``PT118`` (cryostat pressures)
- ``PT202``, ``PT205``, ``PT208`` (cryostat vacuum)
- ``LT01`` (water loop fine fill level)
- ``RREiT`` (injected air temperature clean room), ``RRNTe`` (clean room temperature north), ``RRSTe`` (clean room temperature south), ``ZUL_T_RR`` (supply air temperature clean room)
- ``DaqLeft-Temp1``, ``DaqLeft-Temp2``, ``DaqRight-Temp1``, ``DaqRight-Temp2`` (rack present temperatures)
1 change: 1 addition & 0 deletions docs/source/manuals/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ User Manual

avail_pars
get_plots
get_sc_plots
inspect_plots
111 changes: 25 additions & 86 deletions docs/source/manuals/inspect_plots.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,30 @@ How to inspect plots
Output files
------------

After the code has run, shelve object files containing the data and plots generated for the inspected parameters/subsystems
After the code has run, hdf object files containing the data and plots generated for the inspected parameters/subsystems
are produced, together with a pdf file containing all the generated plots and a log file containing running information. In particular,
the last two files are created for each inspected subsystem (pulser, geds, spms).

.. warning::

Shelve files are produced as an output as well, this was the first format chosen for the output.
The code still has to be fixed to remove these files from routines.
At the moment, they are important when using the ``"saving": "append"`` option, so do not remove them if you are going to use it!

Files are usually collected in the output folder specified in the ``output`` config entry:

.. code-block:: json
{
"output": "<some_path>/out",
"output": "<output_path>",
// ...
Then, depending on the chosen dataset (``experiment``, ``period``, ``version``, ``type``, time selection),
different output folders can be created. In general, the output folder is structured as it follows:
.. code-block::
<some_path>/out/
<output_path>
└── prod-ref
└── <version>
└── generated
Expand All @@ -32,6 +38,7 @@ different output folders can be created. In general, the output folder is struct
├── <experiment>-<period>-<time_selection>-<type>-<subsystem>.pdf
├── <experiment>-<period>-<time_selection>-<type>-<subsystem>.log
└── <experiment>-<period>-<time_selection>-<type>.{dat,bak,dir}
�~T~T�~T~@�~T~@ <experiment>-<period>-<time_selection>-<hdf
Files are usually saved using the following format ``exp-period-datatype-time_interval``:
Expand All @@ -52,95 +59,27 @@ Files are usually saved using the following format ``exp-period-datatype-time_in
- if ``{'runs': [1, 2, 3]}`` (multiple runs), then <time_selection> = ``r001_r002_r003``.


Shelve output objects
~~~~~~~~~~~~~~~~~~~~~
*Under construction... (structure might change over time, but content should remain the same)*
Output .hdf files
-------------

The output object ``<experiment>-<period>-<time_selection>-<type>.{dat,bak,dir}`` has the following structure:
Output hdf files for ``geds`` have the following dictionary structure, where ``<param>`` is the name of one of the inspected parameters, ``<flag>`` is the event type, e.g. *IsPulser* or *IsBsln*:

.. code-block::
- ``<flag>_<param>_info`` = some useful info
- ``<flag>_<param>`` = absolute values
- ``<flag>_<param>_mean`` = average over the first 10% of data (within the selected time window) of ``<flag>_<param>``
- ``<flag>_<param>_var`` = % variations of ``<param>`` wrt ``<flag>_<param>_mean``
- ``<flag>_<param>_pulser01anaRatio`` = ratio of absolute values ``<flag>_<param>`` with PULS01ANA absolute values
- ``<flag>_<param>_pulser01anaRatio_mean`` = average over the first 10% of data (within the selected time window) of ``<flag>_<param>_pulser01anaRatio``
- ``<flag>_<param>_pulser01anaRatio_var`` = % variations of ``<flag>_<param>_pulser01anaRatio`` wrt ``<flag>_<param>_pulser01anaRatio_mean``
- ``<flag>_<param>_pulser01anaDiff`` = difference of absolute values ``<flag>_<param>`` with PULS01ANA absolute values
- ``<flag>_<param>_pulser01anaDiff_mean`` = average over the first 10% of data (within the selected time window) of ``<flag>_<param>_pulser01anaDiff``
- ``<flag>_<param>_pulser01anaDiff_var`` = % variations of ``<flag>_<param>_pulser01anaDiff`` wrt ``<flag>_<param>_pulser01anaDiff_mean``

<experiment>-<period>-<time_selection>-<type>
└── monitoring
├── pulser // event type
│ └── cuspEmax_ctc_cal // parameter
│ ├── 4 // this is the channel FC id
│ │ ├── values // these are y plot-values shown
│ │ │ ├── all // every timestamp entry
│ │ │ └── resampled // after the resampling
│ │ ├── timestamp // these are plot-x values shown
│ │ │ ├── all
│ │ │ └── resampled
│ │ ├── mean // mean over the first 10% of data within the range inspected by the user
│ │ └── plot_info // some useful plot-info: ['title', 'subsystem', 'locname', 'unit', 'plot_style', 'parameter', 'label', 'unit_label', 'time_window', 'limits']
│ ├── ...other channels...
│ ├── df_geds // dataframe containing all geds channels for a given parameter
│ ├── <figure> // Figure object
│ └── map_geds // geds status map (if present)
├─all
│ └── baseline
│ ├── ...channels data/info...
│ └── ...other summary objects (df/status map/figures)...
│ └── wf_max
│ └── ...
└──phy
└── ...
One way to open it and inspect the saved objects for a given channel, eg. ID='4', is to do

.. code-block:: python
import shelve
with shelve.open("<experiment>-<period>-<time_selection>-<type>") as file:
# get y values
all_data_ch4 = file['monitoring']['pulser']['baseline']['4']['values']['all']
resampled_data_ch4 = file['monitoring']['pulser']['baseline']['4']['values']['resampled']
# get info for plotting data
plot_info_ch4 = file['monitoring']['pulser']['baseline']['4']['plot_info']
To get the corresponding dataframe (containing all channels with map/status info and loaded parameters), you can use

.. code-block:: python
import shelve
with shelve.open("<experiment>-<period>-<time_selection>-<type>") as file:
df_geds = file['monitoring']['pulser']['baseline']['df_geds'].data
To open the saved figure for a given parameter, one way to do it is through

.. code-block:: python
import io
from PIL import Image
with io.BytesIO(shelf['monitoring']['pulser']['baseline']['<figure>']) as obj:
# create a PIL Image object from the bytes
pil_image = Image.open(obj)
# convert the image to RGB color space (to enable PDF saving)
pil_image = pil_image.convert('RGB')
# save image to disk
pil_image.save('figure.pdf', bbox_inches="tight")
.. important::

The key name ``<figure>`` changes depending on the used ``plot_style`` for producing that plot. In particular,

- if you use ``"plot_style": "per channel"``, then ``<figure> = figure_plot_string_<string_no>``, where ``string_no`` is the number of one of the available strings;
- if you use ``"plot_style": "per cc4"`` or ``"per string"`` or ``"array"``, then ``<figure> = figure_plot``;
- if you use ``"plot_style": "per barrel"``, then ``<figure> = figure_plot_<location>_<position>``, where ``<location>`` is either "IB" or "OB, while ``<position>`` is either "top" or "bottom".

.. note::

There is no need to create one shelve object for each inspected subsystem.
Indeed, one way to separate among pulser, geds and spms is to look at channel IDs.
In any case, the subsystem info is saved under ``["monitoring"][<event_type>][<parameter>]["plot_info"]["subsystem"]``.


Inspect plots
-------------

*Under construction*

- Near future: `Dashboard <https://legend-exp.atlassian.net/wiki/spaces/LEGEND/pages/637861889/Monitoring+Dashboard+Manual>`_ tool
- Future: notebook to interactively inspect plots (with buttons?)
- Some standard plots to monitor detectors' response can be found online on the `Dashboard <https://legend-exp.atlassian.net/wiki/spaces/LEGEND/pages/637861889/Monitoring+Dashboard+Manual>`_
- Some notebooks to interactively inspect plots can be found under the ``notebook`` folder

0 comments on commit 98f6ec6

Please sign in to comment.