How to produce plots¶
How to run legend-data-monitor¶
After the installation, a executable is available at ~/.local/bin.
To automatically generate plots, two different methods are available.
All methods rely on the existence of a config file containing the output folder (output)
where to store results, the dataset you want to inspect, and the subsystems (pulser, geds, spms)
you want to study and for which you want to load data. See next section for more details.
You can either run the code by importing the legend-data-monitor module:
import legend-data-monitor as ldm
user_config = "path_to_config.yaml"
ldm.control_plots(user_config)
Or run it by parsing to the executable the path to the config file:
$ legend-data-monitor user_prod --config path_to_config.yaml
If you want to inspect bunches of data (e.g. useful to avoid the process to get killed when loading lots of files), you can use
$ legend-data-monitor user_bunch --config path_to_config.yaml --n_files N
where N specifies how many files you want to inspect together at each iteration e.g. N=40.
You can also run a full cal/phy inspection for a given period and run and produce some output plots via the following line:
- $ usage: legend-data-monitor auto_run [-h] [–cluster CLUSTER] [–ref_version REF_VERSION] [–output_folder OUTPUT_FOLDER] [–data_type DATA_TYPE]
[–partition PARTITION] [–pswd PSWD] [–sc SC] [–port PORT] [–pswd_email PSWD_EMAIL] [–chunk_size CHUNK_SIZE] [–p P] [–r R] [–escale ESCALE] [–pdf PDF]
All options can be inspected via $ legend-data-monitor auto_run --h.
Warning
All available subcommands can be listed with:
$ legend-data-monitor --h usage: legend-data-monitor [-h] [--version] {user_scdb,user_prod,user_bunch,user_rsync_prod, auto_prod,auto_run,get_exposure,get_runinfo} ...
Configuration file¶
In the following, we describe the structure of the configuration file in detail.
Example config¶
output: "<output_path>" # output folder
dataset:
experiment: "L200"
period: "p09"
version: "tmp-auto"
path: "/data2/public/prodenv/prod-blind/"
type: "phy" # data type (either cal, phy, or ["cal", "phy"])
start: "2023-02-07 02:00:00" # time cut (here based on start+end)
end: "2023-02-07 03:30:00"
saving: "overwrite"
subsystems:
geds: # type of subsystem to plot (geds, spms, pulser)
"Baselines in pulser events":
parameters: "baseline"
event_type: "pulser"
plot_structure: "per channel"
plot_style: "vs time"
variation: true
resampled: "also"
time_window: "1H"
status: true
The argument output is the path where plots and inspected data will be saved.
In particular, dataset settings are:
experiment: experiment name (e.g.L60orL200);period: formatpXX. Note that this entry is not mandatory when using DataLoader as it finds the period of interest automatically based on the provided time selection (see below); the period is useful for output filenames;version: version of production cycle, formatvXX.YYorref-vXX.YY.ZZortmp-auto;path: path toprod-reffolder, e.g./data2/public/prodenv/prod-blind/on LNGS cluster or/global/cfs/cdirs/m2676/data/lngs/l200/public/prodenv/prod-blind/on NERSC cluster;type: type of data, either physics (phy), calibration (cal), or both (["phy", "cal"]);selection: time window to select data.
Note
Time selection is based on:
'start': '2023-02-07 02:00:00', 'end': '2023-02-07 03:30:00'(start + end) in formatYYYY-MM-DD hh:mm:ss;'timestamps': ['20230207T103123Z', '20230207T141123Z', ...](list of keys) in formatYYYYMMDDThhmmssZ;'window': '1d 2h 0m'(time window in the past from current time point) in formatXd Xh Xmfor days, hours, minutes;'runs': 1(one run) or'runs': [1, 2, 3](list of runs) in integer format.
A saving option is available to either "overwrite" any already present output file (or create a new one if not present) or "append" new data to the previously obtained output files.
Then, subsystems can either be pulser, geds or spms (note: SiPMs plots are not implemented yet, but DataLoader can load the respective data if needed).
For each subsystem to be plotted, specify
"<some title>": the title of the plot you want to generate. eg. “Baselines in pulser events”parameters: one or multiple parameters of interest to be plotted for this subsystem. In addition to any parameter present inlh5files, the following special parameters were implemented and are available for plotting (see provided examples below for more details on how to select these parameters):"K_lines": events whose energy is contained within 1430 and 1575 keV (40K and 42K regions)"FWHM": FWHM values for each channel"wf_max_rel": relative difference betweenwf_maxand baseline"event_rate": event rate calculated in windows specified in the field"sampling"underplotting.parameters
"event_type": which events to plot. Choose amongpulser(events flagged as pulser based on AUX channel),FCbsln(events flagged as FlashCam triggered baseline, i.e. flat events),muon(events flagged as in coincidence with a muon),phy(physical, i.e. non-pulser events),all(pulser + physical events),K_events(physical events with energies in [1430; 1575] keV)"plot_structure": plot arrangement. Choose amongper channel(geds): group plots by channel, i.e. each channel has its canvasper cc4(geds): group plots by CC4, i.e. all channels belonging to the same CC4 are in the same canvasper string(geds): group plots by string, i.e. all channels belonging to the same string are in the same canvasarray(geds): group all channels in the same canvasper fiber(spms): group channels separating them into the inner barrel (IB) and outer barrel (OB), and put top/bottom channels of a given fiber together to look for correlations within the fiber and among neighbouring fibersper barrel(spms): group channels separating them into top/bottom IB/OB
"plot_style": plot style. Choose amongvs time: plot parameter VS time (all timestamps), as well as resampled values in a time window specified in plot settings (seetime_window)vs ch: plot parameter VS channel IDhistogram: plot distribution of given parameterscatter: plot all entries of a parameter with pointsheatmap: plot 2d histos, with time on x axis
"variation": set it toTrueif you want % variation instead of absolute values for your parameter. Percentage variations are evaluated as:(param/mean - 1)*100, wheremeanis the mean of the parameter under study evaluated over the first 10% of the time interval you specified in thedatasetentry"resampled": set it to"also"if you want to plot resampled values for the parameter under study. Resampling is done using the"time_window"you specify. Possible values are:"none": do not plot resampled values, i.e. plot only events for each saved timestamps"only": plot only resampled values, i.e. averaged parameter values after computing an average in a time window equal to"time_window""also": plot both resampled and not resampled values
"time_window": resampling time (``T``=minutes, ``H``=hours, ``D``=days) used to print resampled values (useful to spot trends over time)"status": set it toTrueif you want to generate a GEDs status map for the subsystem and parameter under study. Before using this option, you first need to specify the limits you want to set as a low/high threshold for the parameter under study by adding the % or absolute threshold for the subsystem of interest insrc/legend-data-monitor/settings/par-setting.yaml.
Warning
There is no event type selection ready for calibration data.
This means you always have to use "event_type": "all" as long as the different event selections are not properly implemented for calibration data too.
More than one subsystem can be entered at once, for instance:
subsystems:
pulser:
"Pulser event rate":
parameters: "event_rate"
event_type: "pulser"
plot_structure: "per channel"
plot_style: "vs time"
variation: false
time_window: "1H"
"AUX channel waveform maximum":
parameters: "wf_max"
event_type: "all"
plot_structure: "per channel"
plot_style: "histogram"
variation: false
geds:
"Baselines in pulser events":
parameters: "baseline"
event_type: "pulser"
plot_structure: "per channel"
plot_style: "vs time"
variation: true
time_window: "1H"
Quality cuts¶
Different methods were implemented to either apply or retrieve quality cuts (QC).
Apply QC¶
If you are loading a parameter for GEDs channels and you want to apply one or multiple QC flags, you just specify it in the subsystem plot entry:
"subsystems": {
"geds": {
"Baselines in pulser events": {
"parameters": "baseline",
"event_type": "pulser",
"plot_structure": "per channel",
"plot_style": "vs time",
"variation": true,
"time_window": "1H",
"cuts": ["is_valid_bl_slope"]
}
}
In the above example, you are loading the baseline for pulser events and applying the is_valid_bl_slope QC as well to remove events for which the baseline slope is not valid.
Any bitmask entry is automatically converted into a boolean entry based on the information stored in legend-metadata.
Retrieve QC¶
QC are not parameters like the baseline, energy, etc. so there is no purpose in plotting them as they are.
However, QC rates are of fundamental importance as well as distribution of QC classifiers.
Below, we show a way to retrieve all available QC flags and/or classifiers by selecting "parameters": "quality_cuts":
subsystems:
geds:
"Baselines in pulser events":
parameters: "baseline"
event_type: "pulser"
plot_structure: "per channel"
plot_style: "vs time"
variation: true
time_window: "1H"
cuts:
- "is_valid_bl_slope"
This will create a unique table with QC flags/classifiers as columns, with an entry for each hit in each GEDs detector. Any bitmask entry is automatically converted into a boolean entry based on the information stored in legend-metadata.
Warning
At the moment, there is no differentiation based on the detector type for the available QC flags/classifiers.
In other words, to load all QC info we read at the flags/classifiers listed under a path of type <path_to_prod_blind>/ref-v2.1.5/inputs/dataprod/config/tier_hit/l200-p01-r%-T%-ICPC-hit_config.yaml.
If any of these listed flags/classifiers is not present for a given detector type (eg COAX), then all entries of the flag/classifier are set to False by default.
Any difference will be better handled in the future.
Removal of pulser effects¶
When plotting GEDs events that coincide with an injected pulser trace, you might want to remove any effects related to the pulser system (e.g. noise). To do this, you can configure the config file to adjust a parameter by subtracting or dividing it by the corresponding auxiliary pulser parameter. In the configuration file, you can specify whether to compute either a ratio or a difference relative to the original parameter param:
In the config file, you just need to set either the key AUX_ratio or ``AUX_diff” to true (note: it’s not possible to select both options at the same time):
subsystems:
geds:
"Baselines in pulser events":
parameters: "baseline"
event_type: "pulser"
plot_structure: "per channel"
plot_style: "vs time"
variation: true
AUX_ratio: true
time_window: "1H"
cuts:
- "is_valid_bl_slope"
Note
The AUX channel selected for retrieving \(\text{param}_\text{AUX}\) is always PULSER01ANA=1027203 (when available).
Special parameters¶
More attention must be paid to the following special parameters, for which a particular subsystem entry is required.
K lines¶
To plot events having energies within 1430 and 1575 keV (ie, around the 40K and 42K regions), grouping channels by string and selecting physical (=not-pulser) events, use
subsystems:
geds:
"K events":
parameters: "K_events"
event_type: "K_lines"
plot_structure: "per string"
plot_style: "scatter"
FWHM¶
To plot FWHM values for each channel, grouping them by strings, selecting only pulser events, use
subsystems:
geds:
"FWHM in pulser events":
parameters: "FWHM"
event_type: "pulser"
plot_structure: "array"
plot_style: "vs ch"
Relative maximum of the waveform¶
To plot the relative difference between wf_max and baseline, use
subsystems:
pulser:
"Relative wf_max":
parameters: "wf_max_rel"
event_type: "pulser" # or phy, all, ...
plot_structure: "per channel"
plot_style: "vs time"
variation: true # optional
time_window: "5T"
Event rate¶
To plot the event rate, by sampling over a period of time equal to <time_window> (T=minutes, H=hours, D=days), use:
subsystems:
geds:
"Event rate":
parameters: "event_rate"
event_type: "pulser"
plot_structure: "per channel"
plot_style: "vs time"
resampled: "no"
variation: false
time_window: "5T"