legend_data_monitor package

Subpackages

Submodules

legend_data_monitor.analysis_data module

class legend_data_monitor.analysis_data.AnalysisData(sub_data: DataFrame, **kwargs)

Bases: object

Object containing information for a data subselected from Subsystem data based on given criteria.

sub_data [DataFrame]: subsystem data

Available kwargs:
selection=
dict with the following contents:
  • ‘parameters’ [str or list of str]: parameter(s) of interest e.g. ‘baseline’

  • ‘event_type’ [str]: event type, options: pulser/phy/all

  • ‘cuts’ [str or list of str]: [optional] cuts to apply to data (will be loaded but not applied immediately)

  • ‘variation’ [bool]: [optional] keep absolute value of parameter (False) or calculate % variation from mean (True).

    Default: False

  • ‘time_window’ [str]: [optional] time window in which to calculate event rate, in case that’s the parameter of interest.

    Format: time_window=’NA’, where N is integer, and A is M for months, D for days, T for minutes, and S for seconds. Default: None

aux_info=
str that has info regarding pulser operations (as difference or ratio wrt geds (spms?) data). Available options are:
  • “pulser01anaRatio”

  • “pulser01anaDiff”

Or input kwargs directly parameters=, event_type=, cuts=, variation=, time_window=

add_channel_mean_column()

Add a column to self.data with the per-channel mean of a time-cut DataFrame.

Parameters:

self (AnalysisData object) – An AnalysisData object that has data as a column.

Returns:

self.data – The original data with an additional column for the per-channel mean.

Return type:

DataFrame

apply_all_cuts()
apply_cut(cut: str)

Apply given boolean cut.

Format: cut name as in lh5 files (“is_*”) to apply given cut, or cut name preceded by “~” to apply a “not” cut.

calculate_variation()

Add a new column containing the percentage variation of a given parameter.

The new column is called ‘<parameter>_var’. There is still the <parameter> column containing absolute values. There is only the <parameter> column if variation is set to False.

channel_mean()

Get mean value of each parameter of interest in each channel in the first 10% of the dataset.

Ignore in case of SiPMs, as each entry is a list of values, not a single value.

convert_bitmasks()

Convert float64 bitmask columns into boolean columns based on the conditions saved in metadata.

get_subsys() str

Return ‘pulser’, ‘pulser01ana’, ‘FCbsln’, ‘muon’, ‘geds’ or ‘spms’ depending on the subsystem type.

Return type:

str

is_aux() bool

Return True if the system is an AUX channel.

Return type:

bool

is_fc_bsln() bool

Return True if the system is the FC baseline channel.

Return type:

bool

is_geds() bool

Return True if ‘location’ (=string) and ‘position’ are NOT strings.

Return type:

bool

is_muon() bool

Return True if the system is the muon channel.

Return type:

bool

is_pulser() bool

Return True if the system is the pulser channel.

Return type:

bool

is_pulser01ana() bool

Return True if the system is the pulser channel.

Return type:

bool

is_spms() bool

Return True if ‘location’ (=fiber) and ‘position’ (=top, bottom) are strings.

Return type:

bool

select_events()
special_parameter()
legend_data_monitor.analysis_data.concat_channel_mean(self, channel_mean) DataFrame

Add a new column containing the mean values of the inspected parameter.

Return type:

DataFrame

legend_data_monitor.analysis_data.cut_dataframe(df: DataFrame, fraction: float = 0.1) DataFrame

Get mean value of the parameters under study over the first ‘fraction’ of data present in the selected time range of the input dataframe.

Return type:

DataFrame

legend_data_monitor.analysis_data.get_aux_df(df: DataFrame, parameter: list, plot_settings: dict, aux_ch: str) DataFrame

Get dataframes containing auxiliary (PULS01ANA) data, storing absolute/diff&ratio/mean/% variations values.

Return type:

DataFrame

legend_data_monitor.analysis_data.get_aux_info(df: DataFrame, chmap: dict, aux_ch: str) DataFrame

Return a DataFrame with correct pulser AUX info.

Return type:

DataFrame

legend_data_monitor.analysis_data.get_saved_df_hdf(self, subsys: str, param: str, old_df: DataFrame) DataFrame

Get the already saved dataframe from the already saved output jdf file, for a given parameter `param`. In particular, it evaluates again the mean over the new 10% of data in the new larger time window.

Return type:

DataFrame

legend_data_monitor.analysis_data.get_seconds(time_window: str)

Convert sampling format used for DataFrame.resample() to int representing seconds.

Needed for event rate calculation.

>>> get_seconds('30T')
1800
legend_data_monitor.analysis_data.load_subsystem_data(subsystem: Subsystem, dataset: dict, plots: dict, plt_path: str, saving=None)

legend_data_monitor.automatic_run module

legend_data_monitor.automatic_run.auto_run(cluster, ref_version, output_folder, partition, pswd, get_sc, port, pswd_email, chunk_size, input_period, input_run, save_pdf, escale_val, data_type)

Inspect LEGEND HDF5 (LH5) processed data (and Slow Control data from lngs-login cluster) for a specific period and run (if specified; otherwise the latest being processed are used); plots and summary files are saved; automatic alert emails are sent.

legend_data_monitor.automatic_run.check_calib(auto_dir_path: str, output_folder: str, period: str, current_run: str, pswd_email: str, data_type: str = 'phy', partition: bool = False, save_pdf: bool = False)

Check calibration stability in calibration runs and create monitoring summary file.

Parameters:
  • auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • output_folder (str) – Path to output folder.

  • period (str) – Period to inspect.

  • current_run (str) – Run under inspection.

  • pswd_email (str) – Password to access the legend.data.monitoring@gmail.com account for sending alert messages.

  • data_type (str) – Data type to load; default: ‘phy’.

  • partition (bool) – False if not partition data; default: False.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

legend_data_monitor.automatic_run.qc_avg_series(auto_dir_path: str, output_folder: str, start_key: str, period: str, current_run: str, save_pdf: bool = False)

Plot quality cuts average values across the array and trends in time.

Parameters:
  • auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • output_folder (str) – Path to output folder.

  • start_key (str) – First timestamp of the inspected range.

  • period (str) – Period to inspect.

  • current_run (str) – Run under inspection.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

legend_data_monitor.automatic_run.summary_plots(auto_dir_path: str, phy_mtg_data: str, output_folder: str, start_key: str, period: str, current_run: str, runs: list, pswd_email: str, last_checked: str, data_type: str = 'phy', partition: bool = False, escale_val: float = 2039.0, save_pdf: bool = False, zoom: bool = False, quadratic: bool = False)

Run function for creating summary plots.

Parameters:
  • auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • phy_mtg_data (str) – Path to generated monitoring hdf files.

  • output_folder (str) – Path to output folder.

  • start_key (str) – First timestamp of the inspected range.

  • period (str) – Period to inspect.

  • current_run (str) – Run under inspection.

  • runs (list) – Available runs to inspect for a given period.

  • pswd_email (str) – Password to access the legend.data.monitoring@gmail.com account for sending alert messages.

  • last_checked (str) – Timestamp of the last check.

  • data_type (str) – Data type to load; default: ‘phy’.

  • partition (bool) – False if not partition data; default: False.

  • escale_val (float) – Energy scale at which evaluating the gain differences; default: 2039 keV (76Ge Qbb).

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

  • zoom (bool) – True to zoom over y axis; default: False.

  • quadratic (bool) – True if you want to plot the quadratic resolution too; default: False.

legend_data_monitor.calibration module

legend_data_monitor.calibration.check_calibration(tmp_auto_dir: str, output_folder: str, period: str, run: str, first_run: bool, det_info: dict, save_pdf=False)

Check calibration stability for a given run and update monitoring summary YAML file.

Parameters:
  • tmp_auto_dir (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • output_folder (str) – Path to output folder where the output summary YAML and plots will be stored.

  • period (str) – Period to inspect.

  • run (str) – Run to inspect.

  • first_run (bool) – Flag indicating whether this is the first run of the period.

  • det_info (dict) – Dictionary containing detector metadata.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

legend_data_monitor.calibration.check_calibration_lac_ssc(tmp_auto_dir: str, output_folder: str, period: str, run: str, run_to_apply: str, first_run: bool, det_info: dict, data_type='cal', save_pdf=False)

Check calibration stability for a given run and update monitoring summary YAML file in special LAC or SSC data.

Parameters:
  • tmp_auto_dir (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • output_folder (str) – Path to output folder where the output summary YAML and plots will be stored.

  • period (str) – Period to inspect.

  • run (str) – Run to inspect.

  • run_to_apply (str) – Calibration run to apply to these data.

  • first_run (bool) – Flag indicating whether this is the first run of the period.

  • det_info (dict) – Dictionary containing detector metadata.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

legend_data_monitor.calibration.check_escale(auto_dir_path: str, cal_path: str, output_folder: str, period: str, current_run: str, det_info: dict, save_pdf: bool) None

Run energy-scale calibration checks and generate detector plots.

Parameters:
  • auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • cal_path (str) – Path to the directory containing calibration runs (eg /data2/public/prodenv/prod-blind/tmp-auto/generated/par/<tier>/cal/<period>).

  • output_folder (str) – Path to output folder where the summary plots will be stored.

  • period (str) – Period to inspect.

  • current_run (str) – Run to inspect.

  • det_info (dict) – Dictionary containing detector metadata.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

legend_data_monitor.calibration.check_psd(auto_dir_path: str, cal_path: str, pars_files_list: list, output_dir: str, period: str, current_run: str, det_info: dict, save_pdf: bool)

Evaluate the PSD usability for a set of detectors based on calibration results; save results in a YAML summary file; plot per-detector PSD stability data and store them as shelve file (and pdf if wanted).

Parameters:
  • auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • cal_path (str) – Path to the directory containing calibration runs (eg /data2/public/prodenv/prod-blind/tmp-auto/generated/par/<tier>/cal/<period>).

  • pars_files_list (list) – List of YAML/JSON files containing results for each calibration run.

  • output_dir (str) – Path to output folder where the output summary YAML and plots will be stored.

  • period (str) – Period to inspect.

  • current_run (str) – Run to inspect.

  • det_info (dict) – Dictionary containing detector metadata.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

legend_data_monitor.calibration.evaluate_psd_performance(mean_vals: list, sigma_vals: list, run_labels: list, current_run: str, det_name: str)

Evaluate PSD performance metrics: slow shifts and sudden shifts and return a dict with evaluation results.

legend_data_monitor.calibration.evaluate_psd_usability_and_plot(period: str, current_run: str, fit_results_cal: dict, det_name: str, location, output_dir: str, psd_data: dict, save_pdf: bool)

Plot PSD stability results across runs, evaluate performance, and save both plot and evaluation summary.

legend_data_monitor.calibration.fep_gain_variation(period: str, run: str, pars: dict, chmap: dict, timestamps: ndarray, values: ndarray, output_dir: str, save_pdf: bool, shelf: Shelf)

Compute and plot FEP gain variation for a single detector; optional pdf saving; store a serialized plot in a shelve object.

Parameters:
  • period (str) – Period to inspect.

  • run (str) – Run to inspect.

  • pars (dict) – Calibration results dictionary for a given detector.

  • chmap (dict) – Dictionary with detector info, must include ‘name’, ‘string’, ‘position’.

  • timestamps (np.ndarray) – Array of timestamps for a given detector.

  • values (np.ndarray) – Array of energies for a given detector.

  • output_dir (str) – Path to output folder where plots will be stored.

  • save_pdf (bool) – If True, save a PDF of the plot.

  • shelf (shelve.Shelf) – Open shelve object where serialized plots will be stored.

legend_data_monitor.calibration.get_partitions_params(ge_keys: list, detector_status: dict, run_dict: dict, hit_map: dict, dsp_map: dict) dict

Build per-detector calibration and analysis parameters across runs.

Returns a nested dictionary: det -> parameter -> peak -> run_key -> value

Parameters:
  • ge_keys (list of str) – Detector names.

  • detector_status (dict) – Detector status per period-run: detector_status[det][period-run][‘processable’/’usability’].

  • run_dict (dict) – Mapping period to list of runs.

  • hit_map (dict) – Mapping (period, run) to hit file path.

  • dsp_map (dict) – Mapping (period, run) to dsp file path.

Return type:

dict

legend_data_monitor.calibration.load_fit_pars_from_yaml(pars_files_list: list, detectors_list: list, detectors_name: list, avail_runs: list)

Load detector data from YAML files and return directly as a dict.

Parameters:
  • pars_files_list (list) – List of file paths to YAML parameter files.

  • detectors_list (list) – List of detector raw IDs (eg. ‘ch1104000’) to extract data for.

  • detectors_name (list) – List of detector names (eg. ‘V11925A’) to extract data for.

  • avail_runs (list or None) – Available runs to inspect (e.g. [4, 5, 6]); if None, keep all.

Returns:

{
“V11925A”: {

“r004”: {“mean”: …, “mean_err”: …, “sigma”: …, “sigma_err”: …}, “r005”: {…}, …

}, “V11925B”: {

”r004”: {…}, …

}

}

Return type:

dict

legend_data_monitor.core module

legend_data_monitor.core.auto_control_plots(config: str, file_keys: str, prod_path: str, prod_config: str, n_files=None)

Set the configuration file and the output paths when a config file is provided during automathic plot production.

legend_data_monitor.core.control_plots(user_config_path: str, n_files=None)

Set the configuration file and the output paths when a user config file is provided. The function to generate plots is then automatically called.

legend_data_monitor.core.generate_plots(config: dict, plt_path: str, n_files=None)

Generate plots once the config file is set and once we provide the path and name in which store results. n_files specifies if we want to inspect the entire time window (if n_files is not specified), otherwise we subdivide the time window in smaller datasets, each one being composed by n_files files.

legend_data_monitor.core.make_plots(config: dict, plt_path: str, saving: str)
legend_data_monitor.core.retrieve_exposure(period: str, runs: str | list[str], runinfo_path: str, path: str, version: str)
legend_data_monitor.core.retrieve_scdb(config: str, port: int, pswd: str)

Set the configuration file and the output paths when a user config file is provided. The function to retrieve Slow Control data from database is then automatically called.

legend_data_monitor.monitoring module

legend_data_monitor.monitoring.add_calibration_runs(period: str | list, run_list: list | dict) list

Add special calibration runs to the run list for a given period.

Parameters:
  • period (str | list) – Either a string or list of periods

  • run_list (list | dict) – Either a list of runs or a dictionary with period keys

Return type:

list

legend_data_monitor.monitoring.box_summary_plot(period: str, run: str, pars: dict, det_info: dict, results: dict, info: dict, output_dir: str, data_type: str, save_pdf: bool, run_to_apply=None)

Box plot summary for FEP gain variations for multiple detectors.

Parameters:
  • period (str) – Period to inspect.

  • run (str) – Run to inspect.

  • pars (dict) – Calibration results for each detector.

  • det_info (dict) – Dictionary with channel names, IDs, and mapping to string and position.

  • results (dict) – Dictionary with arrays values (per detector); None if invalid.

  • info (dict) – Dictionary containing info on a parameter basis (eg label name, file title, colours, limits, …).

  • output_dir (str) – Output folder for saving plots and shelve data.

  • data_type (str) – Type of data, either ‘cal’ or ‘phy’.

  • save_pdf (bool) – If True, save the summary plot as a PDF.

  • run_to_apply – Run to apply (eg see ssc data).

legend_data_monitor.monitoring.build_new_files(generated_path: str, period: str, run: str, data_type='phy')

Generate and store resampled HDF files for a given data run and extract summary info.

This function:

  • loads the original .hdf file for the specified period and run

  • extracts available keys from the HDF file

  • resamples all applicable time series data into multiple time intervals (10min, 60min)

  • stores each resampled dataset into a separate HDF file

  • extracts metadata from the ‘info’ key and saves it as a .yaml file

Parameters:
  • generated_path (str) – Root directory where the data is stored and where new files will be written.

  • period (str) – Period (e.g. ‘p03’) used to construct paths.

  • run (str) – Run (e.g. ‘r001’) used to construct paths.

  • data_type (str) – Data type to load; default: ‘phy’.

legend_data_monitor.monitoring.compute_dead_time(df, window_ms=10)

Compute dead time percentage based on discharge windows.

Parameters:
  • df (pd.DataFrame) – Timestamps and boolean detector columns with is_discharge entries.

  • window_ms (float) – Dead time window after each discharge; default: 10 ms.

legend_data_monitor.monitoring.compute_diff(values: ndarray, initial_value: float | int, scale: float | int) ndarray

Compute relative differences with respect to an initial value. If the initial value is zero, returns an array of nan values.

Parameters:
  • values (np.ndarray) – Array of values to compute the differences for.

  • initial_value (float) – Reference value for computing relative differences.

  • scale (float) – Scaling factor.

Return type:

ndarray

legend_data_monitor.monitoring.compute_diff_and_rescaling(series: Series, reference: float, escale: float, variations: bool)

Compute relative differences (if ‘variations’ is True) and rescale values by ‘escale’.

Parameters:
  • series (pd.Series) – Input time series of numerical values.

  • reference (float) – Reference value used to compute relative differences.

  • escale (float) – Scaling factor, eg 2039 keV.

  • variations (bool) – If true, compute relative difference (series - reference)/reference.

legend_data_monitor.monitoring.evaluate_fep_cal(pars_dict: dict, channel: str, fep_peak_pos: float, fep_peak_pos_err: float)

Return calibrated FEP position (fep_cal) and error (fep_cal_err).

Parameters:
  • pars_dict (dict) – Dictionary containing calibration outputs.

  • channel (str) – Channel name or IDs.

  • fep_peak_pos (float) – Uncalibrated FEP position.

  • fep_peak_pos_err (float) – Uncalibrated FEP position error.

legend_data_monitor.monitoring.extract_fep_peak(pars_dict: dict, channel: str)

Return fep_peak_pos, fep_peak_pos_err, fep_gain, fep_gain_err.

Parameters:
  • pars_dict (dict) – Dictionary containing calibration outputs.

  • channel (str) – Channel name or IDs.

legend_data_monitor.monitoring.extract_resolution_at_q_bb(pars_dict: dict, channel: str, key_result: str, fit: str = 'linear')

Return Qbb_fwhm (linear resolution) and Qbb_fwhm_quad (quadratic resolution).

Parameters:
  • pars_dict (dict) – Dictionary containing calibration outputs.

  • channel (str) – Channel name or IDs (eg ch10000).

  • key_result (str) – Key name used to extract the resolution results from the parsed file.

  • fit (str) – Fitting method used for energy resolution, either ‘linear’ or ‘quadratic’.

legend_data_monitor.monitoring.filter_by_period(series: Series, period: str | list) Series

Return a series filtered by ignore keys for the given period(s).

Parameters:
  • series (pd.Series) – Input time series (indexed by timestamps) to filter.

  • period (str or list) – Period (or list of periods) to inspect.

Return type:

Series

legend_data_monitor.monitoring.filter_series_by_ignore_keys(series_to_filter: Series, skip_keys: dict, period: str)

Remove data from a time-indexed pandas Series that falls within time ranges specified by start and stop timestamps for a given period.

Parameters:
  • series_to_filter (pd.Series) – The time-indexed pandas Series to be filtered.

  • skip_keys (dict) – Dictionary mapping periods to sub-dictionaries containing ‘start_keys’ and ‘stop_keys’ lists with timestamp strings in the format ‘%Y%m%dT%H%M%S%z’.

  • period (str) – The period to check for keys to ignore. If not present, the series is returned unmodified.

legend_data_monitor.monitoring.find_hdf_file(directory: str, include: list[str], exclude: list[str] | None = None) str | None

Find the original HDF monitoring file in a given directory, matching inclusion/exclusion filters.

Parameters:
  • directory (str) – Path to the folder containing the HDF monitoring files.

  • include (list[str]) – List of words that the HDF monitoring file to retrieve must contain.

  • exclude (list[str] = None) – List of words that the HDF monitoring file to retrieve must NOT contain.

Return type:

str | None

legend_data_monitor.monitoring.get_calib_data_dict(calib_data: dict, channel_info: list, tiers: list, pars: list, period: str, run: str, tier: str, key_result: str, fit: str, data_type: str)

Extract calibration information for a given run and appends it to the provided dictionary.

This function loads calibration parameters for a specific detector channel and run, parses energy calibration results and resolution information, and evaluates derived values such as gain and calibration constants. It appends the extracted data to the provided calib_data dictionary, which is expected to contain keys like “fep”, “fep_err”, “cal_const”, “cal_const_err”, “run_start”, “run_end”, “res”, and “res_quad”.

Parameters:
  • calib_data (dict) – Dictionary that accumulates calibration results across runs.

  • channel_info (list) – List of [channel ID, channel name].

  • tiers (list of str) – Paths to tier data folders based on the inspected processed version.

  • pars (list of str) – Paths to parameter .yaml/.json files.

  • period (str) – Period to inspect.

  • run (str) – Run to inspect.

  • tier (str) – Tier level for the analysis (‘hit’, ‘phy’, etc.).

  • key_result (str) – Key name used to extract the resolution results from the parsed file.

  • fit (str) – Fitting method used for energy resolution, either ‘linear’ or ‘quadratic’.

  • data_type (str)

legend_data_monitor.monitoring.get_calib_pars(path: str, period: str | list, run_list: list, channel_info: list, partition: bool, data_type: str, escale: float, fit='linear') dict

Retrieve and process calibration parameters across a list of runs for a given channel.

This function loads calibration data from JSON/YAML files for each specified run, computes gain and calibration constant evolution over time, and returns a dictionary of relevant quantities, including their relative changes with respect to the initial values. It optionally appends special calibration runs at the end of a period, if available.

Parameters:
  • path (str) – Base directory containing the tier and parameter folders.

  • period (str or list) – Period to inspect. Can be a list if multiple periods are inspected.

  • run_list (list) – List of run to inspect, or a dictionary mapping periods to lists of runs.

  • channel_info (list) – List containing [channel ID, channel name].

  • partition (bool) – True if you want to retrieve partition calibration results.

  • escale (float) – Scaling factor used to compute relative differences in gain and calibration constant.

  • fit (str, optional) – Fit method used for energy resolution (“linear” or “quadratic”), by default “linear”.

Return type:

dict

legend_data_monitor.monitoring.get_calibration_file(folder_par: str) dict

Return the content of the JSON/YAML calibration file in folder_par.

Parameters:

folder_par (str) – Path to the folder containing calibration summary files.

Return type:

dict

legend_data_monitor.monitoring.get_dfs(phy_mtg_data: str, period: str, run_list: list, parameter: str)

Load and concatenate monitoring data from HDF files for a given period and list of runs.

Parameters:
  • phy_mtg_data (str) – Path to the base directory containing monitoring HDF5 files (typically ending in /mtg/phy).

  • period (str) – Period to inspect.

  • run_list (list) – List of available runs.

  • parameter (str) – Parameter name used to construct the HDF key for loading specific datasets (e.g., ‘TrapemaxCtcCal’ looks for ‘IsPulser_TrapemaxCtcCal’).

legend_data_monitor.monitoring.get_energy_key(ecal_results: dict) dict

Retrieve the energy calibration results from a given dictionary.

This function searches for specific keys (‘cuspEmax_ctc_runcal’ or ‘cuspEmax_ctc_cal’) in the input ecal_results dictionary. It returns a sub-dictionary if one of the keys is found, otherwise an empty dictionary is returned.

Parameters:

ecal_results (dict) – Dictionary containing energy calibration results.

Return type:

dict

legend_data_monitor.monitoring.get_pulser_data(resampling_time: str, period: str | list, dfs: list, channel: str, escale: float, variations=False) dict

Return a dictionary of geds and pulser filtered dataframes for which a time resampling is performed.

Parameters:
  • resampling_time (str) – Resampling time, eg ‘1HH’ or ‘10T’.

  • period (str | list) – Period or list of periods to inspect.

  • dfs (list) – List of dataframes for geds and pulser events.

  • channel (str) – Channel to inspect.

  • escale (float) – Scaling factor used to compute relative differences in gain and calibration constant.

  • variations (bool) – True if you want to retrieve % variations (default: False).

Return type:

dict

legend_data_monitor.monitoring.get_run_start_end_times(sto, tiers: list, period: str, run: str, tier: str)

Determine the start and end timestamps for a given run, including the special case for additional final calibration runs.

Parameters:
  • sto – Store object to read timestamps from LH5 files.

  • tiers (list of str) – Paths to tier data folders based on the inspected processed version.

  • period (str) – Period to inspect.

  • run (str) – Run to inspect.

  • tier (str) – Tier level for the analysis (‘hit’, ‘phy’, etc.).

legend_data_monitor.monitoring.get_tier_keyresult(tiers: list)

Retrieve proper tier name (pht or hit) and key_result (partition_ecal or ecal) depending if partitioning data exists or not.

Parameters:

tiers (list) – Base directory containing the tier and parameter folders.

legend_data_monitor.monitoring.get_traptmax_tp0est(phy_mtg_data: str, period: str, run_list: list)

Load and concatenate trapTmax and tp0est data from HDF files for a given period and list of runs.

Parameters:
  • phy_mtg_data (str) – Path to the base directory containing monitoring HDF5 files (typically ending in /mtg/phy).

  • period (str) – Period to inspect.

  • run_list (list) – List of available runs.

legend_data_monitor.monitoring.mhz_to_percent(mhz, avg_total_forced_mhz)
legend_data_monitor.monitoring.percent_to_mhz(pct, avg_total_forced_mhz)
legend_data_monitor.monitoring.plot_time_series(auto_dir_path: str, phy_mtg_data: str, output_folder: str, data_type: str, period: str, runs: list, current_run: str, det_info: dict, save_pdf: bool, escale_val: float, last_checked: float | None, partition: bool, quadratic: bool, zoom: bool)

Generate and save time-series plots of calibration and monitoring data for germanium detectors across multiple runs.

This function collects physics and calibration data from HDF5 monitoring files and visualizes stability over time. Channels with no pulser entries are automatically skipped. Corrections are applied to the gain if pulser data is available (‘GED corrected’), otherwise uncorrected data is plotted. The plots are saved as pickled objects for later retrieval (eg. in the online Dashboard) and optionally as PDFs:

  • plots saved in shelve database files under <output_folder>/<period>/mtg/l200-<period>-phy-monitoring;

  • if save_pdf=True, PDF copies saved under <output_folder>/<period>/mtg/pdf/st<string>/.

Parameters:
  • auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • phy_mtg_data (str) – Path to generated monitoring hdf files.

  • output_folder (str) – Path to output folder.

  • period (str) – Period to inspect.

  • runs (list) – Available runs to inspect for a given period.

  • current_run (str) – Run under inspection.

  • det_info (dict) – Dictionary containing detector metadata.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

  • escale_val (float) – Energy scale at which evaluating the gain differences; default: 2039 keV (76Ge Qbb).

  • last_checked (float | None) – Timestamp of the last check.

  • partition (bool) – False if not partition data; default: False.

  • quadratic (bool) – True if you want to plot the quadratic resolution too; default: False.

  • zoom (bool) – True to zoom over y axis; default: False.

legend_data_monitor.monitoring.qc_and_evt_summary_plots(auto_dir_path: str, phy_mtg_data: str, output_folder: str, start_key: str, period: str, run: str, det_info: dict, save_pdf: bool)
legend_data_monitor.monitoring.qc_average(auto_dir_path: str, output_folder: str, det_info: dict, period: str, run: str, save_pdf: bool, pars_to_inspect: list | None = None)

Evaluate the average rate of passing quality cuts for a given run and period across the whole array for different QC flags.

Parameters:
  • auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • output_folder (str) – Path to generated monitoring hdf files.

  • det_info (dict) – Dictionary with channel names, IDs, and mapping to string and position.

  • period (str) – Period to inspect.

  • run (str) – Run under inspection.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

  • pars_to_inspect (list) – List of parameters (boolean flags) to inspect.

legend_data_monitor.monitoring.qc_distributions(auto_dir_path: str, phy_mtg_data: str, output_folder: str, start_key: str, period: str, run: str, det_info: dict, save_pdf: bool)
legend_data_monitor.monitoring.qc_ft_failure_rates(auto_dir_path: str, phy_mtg_data: str, output_folder: str, start_key: str, period: str, run: str, det_info: dict, save_pdf: bool)
legend_data_monitor.monitoring.qc_time_series(auto_dir_path: str, output_folder: str, det_info: dict, period: str, run: str, save_pdf: bool, pars_to_inspect: list | None = None)

Evaluate rate over time of passing quality cuts for a given run and period across the whole array for different QC flags.

Parameters:
  • auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).

  • output_folder (str) – Path to generated monitoring hdf files.

  • det_info (dict) – Dictionary with channel names, IDs, and mapping to string and position.

  • period (str) – Period to inspect.

  • run (str) – Run under inspection.

  • save_pdf (bool) – True if you want to save pdf files too; default: False.

  • pars_to_inspect (list) – List of parameters (boolean flags) to inspect.

legend_data_monitor.monitoring.read_if_key_exists(hdf_path: str, key: str) DataFrame | None

Read an HDF dataset if the key exists, otherwise return None; handle the case where the parameter is saved under either ‘/key’ or ‘key’.

Parameters:
  • hdf_path (str) – Path to the HDF file.

  • key (str) – Key to inspect.

Return type:

DataFrame | None

legend_data_monitor.monitoring.resample_series(series: Series, resampling_time: str, mask: Series)

Calculate mean/std for resampled time ranges to which a mask is then applied. The function already adds UTC timezones to the series.

Parameters:
  • series (pd.Series) – Input time series of numerical values.

  • resampling_time (str) – Resampling frequency, eg ‘1h’.

  • mask (pd.Series) – Boolean mask aligned to the datetime index; false values mark timestamps that should be excluded, ie set to nan value.

legend_data_monitor.plot_styles module

legend_data_monitor.plot_styles.par_vs_ch(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)
legend_data_monitor.plot_styles.plot_heatmap(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)
legend_data_monitor.plot_styles.plot_histo(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)
legend_data_monitor.plot_styles.plot_par_vs_par(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)
legend_data_monitor.plot_styles.plot_scatter(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)
legend_data_monitor.plot_styles.plot_vs_time(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)

legend_data_monitor.plotting module

legend_data_monitor.plotting.align_to_keys(all_keys: list, keys: list, values: list, categorical=False)

Align values to a reference list of keys.

Creates an array matching all_keys and fills in values where keys match. Missing entries are filled with NaN (numeric) or None (categorical). Returns array of values aligned to all_keys.

Parameters:
  • all_keys (list) – Reference list of keys defining the output order.

  • keys (list) – Keys corresponding to provided values.

  • values (list) – Values to align.

  • categorical (bool, optional) – If True, output array is object dtype with None for missing values. Otherwise (default), uses float dtype with NaN for missing values.

legend_data_monitor.plotting.apply_cal_to_following_run(mu_vals: ndarray, cal_vals: ndarray)

Apply calibration parameters from each run to the following run’s ADC values.

Returns a list of calibrated peak positions in keV for each following run.

Assumes mu_vals and cal_vals have the same length. If mu_vals and cal_vals do not have the same length an error is raised. The function shifts the arrays so that each calibration is applied to the subsequent run: - drops the first element of mu_vals - drops the last element of cal_vals

Each ADC value is converted to keV using a polynomial calibration.

Parameters:
  • mu_vals (np.ndarray) – Sequence of ADC peak positions (one per run).

  • cal_vals (np.ndarray) – Sequence of calibration polynomial coefficients (one per run).

legend_data_monitor.plotting.filter_period(keys: list, vals: list, *periods)

Filter key-value pairs by matching key prefixes (e.g. ‘p18’); only entries where the key starts with any of the provided period prefixes (e.g. ‘p18’, ‘p19’) are retained.

Returns filtered (keys, values), otherwise empty lists if no matches are found.

Parameters:
  • keys (list) – List of keys

  • vals (list) – Values corresponding to keys.

  • *periods (ntuple of str) – Variable number of prefix strings to filter by.

legend_data_monitor.plotting.get_fwhm_for_fixed_ch(data_channel: DataFrame, parameter: str) float

Calculate the FWHM of a given parameter for a given channel.

Return type:

float

legend_data_monitor.plotting.make_subsystem_plots(subsystem: Subsystem, plots: dict, dataset_info: dict, plt_path: str, saving=None)
legend_data_monitor.plotting.plot_all_detector_info(det_name: str, det_info: dict, partitions_params: dict, detector_status: dict, period: str, current_run: str, output_folder: str, save_pdf=False, exclude_period=None)

Generate a comprehensive multi-panel summary plot of detector performance.

Produces a grid of subplots showing key quantities such as: - Slow control voltage - Energy resolution (FWHM) - Peak positions and residuals - Baseline properties - Pulse shape parameters - Calibration stability metrics

Internally extracts, aligns, and plots multiple variables using plot_variable.

Parameters:
  • det_name (str) – Detector identifier.

  • partitions_params (dict) – Dictionary containing per-detector analysis results and calibration data.

  • detector_status (dict) – Dictionary with detector usability and slow control information.

  • period (str) – Period to inspect.

  • current_run (str) – Run to inspect.

  • output_folder (str) – Output folder where to save plots.

  • save_pdf (bool, optional) – True if you want to save pdf files too; default: False.

  • exclude_period (list of str, optional) – Period prefixes to exclude from plotting.

legend_data_monitor.plotting.plot_array(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)
legend_data_monitor.plotting.plot_det_status(det_name: str, ax: Axes, detector_status: dict, keys: list)

Overlay detector usability status as shaded regions on a plot: ‘ac’ (‘off’) grey (red) shaded region.

Parameters:
  • det_name (str) – Detector identifier.

  • ax (Axes) – Axis object to draw on.

  • detector_status (dict) – Nested dictionary containing detector status information, with ‘processable’ and ‘usability’ keys, per detector.

  • keys (list) – Ordered run keys corresponding to x-axis positions.

legend_data_monitor.plotting.plot_limits(ax: Axes, params: list, limits: list | dict)

Plot limits (if present) on the plot. The multi-params case is carefully handled.

legend_data_monitor.plotting.plot_per_barrel_and_position(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)
legend_data_monitor.plotting.plot_per_cc4(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)
legend_data_monitor.plotting.plot_per_ch(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)
legend_data_monitor.plotting.plot_per_fiber_and_barrel(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)
legend_data_monitor.plotting.plot_per_string(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)
legend_data_monitor.plotting.plot_variable(det_name: str, ax: Axes, all_keys: ndarray, keys: list, vals: list, det_status: dict, periods: list | str, current_run: str, errs=None, title='', units='keV', alpha=1, fixed_thr=None, err_thr=None, plot_det_stat=False, plot_mean=True, exclude_period=None, ylabel=None)

Plot a detector variable over runs, grouped by data-taking periods.

Data are aligned to all_keys, split by period prefixes (e.g., ‘p16’), and plotted with optional error bands and threshold lines. Mean values are computed per period using only runs where the detector usability is ‘on’.

Parameters:
  • det_name (str) – Detector identifier.

  • ax (Axes) – Axis to plot on.

  • all_keys (np.ndarray) – Master list of run keys defining x-axis.

  • keys (list) – Keys corresponding to vals.

  • vals (list) – Values to plot.

  • det_status (dict) – Detector status dictionary containing usability information.

  • periods (list | str) – Period to inspect.

  • current_run (str) – Run to inspect.

  • errs (sequence, optional) – Uncertainties corresponding to vals.

  • title (str, optional) – Plot title.

  • units (str, optional) – Units for y-axis label.

  • alpha (float, optional) – Transparency for plotted data.

  • fixed_thr (float, optional) – Fixed threshold to draw around the mean.

  • err_thr (float, optional) – Multiplier for mean error-based thresholds.

  • plot_det_stat (bool, optional) – If True, overlays detector status shading.

  • plot_mean (bool, optional) – If True, plots mean lines per period.

  • exclude_period (list of str, optional) – Period prefixes to exclude.

  • ylabel (str, optional) – Custom y-axis label (overrides default).

legend_data_monitor.plotting.save_pdf(plt, pdf: PdfPages)

Save the plot to a PDF file. The plot is closed after save_data.

legend_data_monitor.run module

legend_data_monitor.run.add_auto_prod_parser(subparsers)

Configure core.auto_control_plots() command line interface.

legend_data_monitor.run.add_auto_run_parser(subparsers)

Configure core.auto_run() command line interface.

legend_data_monitor.run.add_get_exposure(subparsers)

Configure core.retrieve_exposure() command line interface.

legend_data_monitor.run.add_get_runinfo(subparsers)

Configure core.build_runinfo() command line interface.

legend_data_monitor.run.add_user_bunch_parser(subparsers)

Configure core.control_plots() command line interface.

legend_data_monitor.run.add_user_config_parser(subparsers)

Configure core.control_plots() command line interface.

legend_data_monitor.run.add_user_rsync_parser(subparsers)

Configure core.auto_control_plots() command line interface.

legend_data_monitor.run.add_user_scdb(subparsers)

Configure core.control_plots() command line interface.

legend_data_monitor.run.auto_prod_cli(args)

Pass command line arguments to core.auto_control_plots().

legend_data_monitor.run.auto_run_cli(args)

Pass command line arguments to core.auto_run().

legend_data_monitor.run.get_exposure_cli(args)

Pass command line arguments to core.retrieve_exposure().

legend_data_monitor.run.get_runinfo_cli(args)

Pass command line arguments to core.build_runinfo().

legend_data_monitor.run.main()

legend-data-monitor’s starting point.

Here you define the path to the YAML configuration file you want to use when generating the plots. To learn more, have a look at the help section:

legend_data_monitor.run.user_bunch_cli(args)

Pass command line arguments to core.control_plots().

legend_data_monitor.run.user_config_cli(args)

Pass command line arguments to core.control_plots().

legend_data_monitor.run.user_rsync_cli(args)

Pass command line arguments to core.auto_control_plots().

legend_data_monitor.run.user_scdb_cli(args)

Pass command line arguments to core.retrieve_scdb().

legend_data_monitor.save_data module

legend_data_monitor.save_data.append_new_data(param: str, plot_settings: dict, plot_info: dict, old_dict: dict, par_dict_content: dict, plt_path: str) dict
Return type:

dict

legend_data_monitor.save_data.build_dict(plot_settings: list, plot_info: list, par_dict_content: dict, out_dict: dict) dict

Create a dictionary with the correct format for being saved in the final shelve object.

Return type:

dict

legend_data_monitor.save_data.build_out_dict(plot_settings: list, par_dict_content: dict, out_dict: dict)

Build the output dictionary based on the input ‘saving’ option.

Parameters:
  • plot_settings (list) – Dictionary with settings for plotting. It contains the following keys: ‘parameters’, ‘event_type’, ‘plot_structure’, ‘resampled’, ‘plot_style’, ‘variation’, ‘time_window’, ‘range’, ‘saving’, ‘plt_path’

  • par_dict_content (dict) – Dictionary containing, for a given parameter, the dataframe with data and a dictionary with info for plotting (e.g. plot style, title, units, labels, …)

  • out_dict (dict) – Dictionary that is returned, containing the objects that need to be saved.

legend_data_monitor.save_data.check_existence_and_overwrite(file: str)

Check for the existence of a file, and if it exists removes it.

legend_data_monitor.save_data.check_level0(dataframe: DataFrame) DataFrame

Check if a dataframe contains the ‘level_0’ column. If so, remove it.

Return type:

DataFrame

legend_data_monitor.save_data.get_param_df(parameter: str, df: DataFrame) DataFrame

Subselect from ‘df’ only the dataframe columns that refer to a given parameter. The case of ‘parameter’ being a special parameter is carefully handled.

Return type:

DataFrame

legend_data_monitor.save_data.get_param_info(param: str, plot_info: dict) dict

Subselect from ‘plot_info’ the plotting info for the specified parameter `param`. This is needed for the multi-parameters case.

Return type:

dict

legend_data_monitor.save_data.get_pivot(df: DataFrame, parameter: str, key_name: str, file_path: str, saving: str)

Get pivot: datetimes (first column) vs channels (other columns).

legend_data_monitor.save_data.save_df_and_info(df: DataFrame, plot_info: dict) dict

Return a dictionary containing a dataframe for the parameter(s) under study for a given subsystem. The plotting info are saved too.

Return type:

dict

legend_data_monitor.save_data.save_hdf(saving: str, file_path: str, df, aux_ch: str, aux_analysis, aux_ratio_analysis, aux_diff_analysis, plot_info: dict) dict

Save the input dataframe in an external hdf file, using a different structure (time vs channel, with values in cells). Plot info are saved too.

Return type:

dict

legend_data_monitor.slow_control module

class legend_data_monitor.slow_control.SlowControl(parameter: str, port: int, pswd: str, **kwargs)

Bases: object

Object containing Slow Control database information for a data subselected based on given criteria.

parameter [str] : diode_vmon | diode_imon | PT114 | PT115 | PT118 | PT202 | PT205 | PT208 | LT01 | RREiT | RRNTe | RRSTe | ZUL_T_RR | DaqLeft-Temp1 | DaqLeft-Temp2 | DaqRight-Temp1 | DaqRight-Temp2

Options for kwargs

dataset=
dict with the following keys:
  • ‘experiment’ [str]: ‘L60’ or ‘L200’

  • ‘period’ [str]: period format pXX

  • ‘path’ [str]: path to prod-ref folder (before version)

  • ‘version’ [str]: version of pygama data processing format vXX.XX

  • ‘type’ [str]: ‘phy’ or ‘cal’

  • the following key(s) depending in time selection
    1. ‘start’ : <start datetime>, ‘end’: <end datetime> where <datetime> input is of format ‘YYYY-MM-DD hh:mm:ss’

    2. ‘window’[str]: time window in the past from current time point, format: ‘Xd Xh Xm’ for days, hours, minutes 2. ‘timestamps’: str or list of str in format ‘YYYYMMDDThhmmssZ’ 3. ‘runs’: int or list of ints for run number(s) e.g. 10 for r010

Or input kwargs separately experiment=, period=, path=, version=, type=; start=&end=, (or window= - ???), or timestamps=, or runs=

get_sc_param()

Load the corresponding table from SC database for the process of interest and apply already the flags for the parameter under study.

legend_data_monitor.slow_control.apply_flags(df: DataFrame, sc_parameters: dict, flags_param: list) DataFrame

Apply the flags read from ‘settings/SC-params.yaml’ to the input dataframe.

Return type:

DataFrame

legend_data_monitor.slow_control.get_plotting_info(parameter: str, sc_parameters: dict, first_tstmp: str, last_tstmp: str, scdb: LegendSlowControlDB) Tuple[str, float, float]

Return units and low/high limits of a given parameter.

Return type:

Tuple[str, float, float]

legend_data_monitor.slow_control.include_more_diode_info(df: DataFrame, scdb: LegendSlowControlDB) DataFrame

Include more diode info, such as the channel name and the string number to which it belongs.

Return type:

DataFrame

legend_data_monitor.string_visualization module

legend_data_monitor.string_visualization.exposure_plot(subsystem, data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)
legend_data_monitor.string_visualization.get_info_from_channel(channel_map: DataFrame, channel: int)

Get info (name, location, position) from a channel number, once the channel map is provided as a DataFrame.

legend_data_monitor.string_visualization.status_plot(subsystem, data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)

legend_data_monitor.subsystem module

class legend_data_monitor.subsystem.Subsystem(sub_type: str, **kwargs)

Bases: object

Object containing information for a given subsystem such as channel map, channels status etc.

sub_type [str]: geds | spms | pulser | pulser01ana | FCbsln | muon

Options for kwargs

dataset=
dict with the following keys:
  • ‘experiment’ [str]: ‘L60’ or ‘L200’

  • ‘period’ [str]: period format pXX

  • ‘path’ [str]: path to prod-ref folder (before version)

  • ‘version’ [str]: version of pygama data processing format vXX.XX

  • ‘type’ [str]: ‘phy’ or ‘cal’

  • the following key(s) depending in time selection
    1. ‘start’ : <start datetime>, ‘end’: <end datetime> where <datetime> input is of format ‘YYYY-MM-DD hh:mm:ss’

    2. ‘window’ [str]: time window in the past from current time point, format: ‘Xd Xh Xm’ for days, hours, minutes 2. ‘timestamps’: str or list of str in format ‘YYYYMMDDThhmmssZ’ 3. ‘runs’: int or list of ints for run number(s) e.g. 10 for r010

Or input kwargs separately experiment=, period=, path=, version=, type=; start=&end=, or window=, or timestamps=, or runs=

Experiment is needed to know which channel belongs to the pulser Subsystem (and its name), “auxs” ch0 (L60) or “puls” ch1 (L200) Period is needed to know channel name (“fcid” or “rawid”) Selection range is needed for the channel map and status information at that time point, and should be the only information needed,

however, pylegendmeta only allows query .on(timestamp=…) but not .on(run=…); therefore, to be able to get info in case of runs selection, we need to know path, version, and run type to look up first timestamp of the run. If this changes in the future, the path will only be asked when data is requested to be loaded with Subsystem.get_data(), but not to just load the channel map and status for given run

Might set default “latest” for version, but gotta be careful.

above_period_3_included() bool
Return type:

bool

below_period_3_excluded() bool
Return type:

bool

construct_dataloader_configs(param_tiers, params: list[str], tier_key: str)

Construct DL and DB configs for DataLoader based on parameters and which tiers they belong to.

params: list of parameters to load

flag_fcbsln_events(fc_bsln=None)

Flag FC baseline events, keeping the ones that are in correspondence with a pulser event too. If a FC baseline object was provided, flag FC baseline events in data based on its flag.

flag_fcbsln_only_events(fc_bsln=None)

Flag FC baseline events. If a FC baseline object was provided, flag FC baseline events in data based on its flag.

flag_muon_events(muon=None)

Flag muon events. If a muon object was provided, flag muon events in data based on its flag.

flag_pulser_events(pulser=None)

Flag pulser events. If a pulser object was provided, flag pulser events in data based on its flag.

get_channel_map()

Build channel map for given subsystem with info like name, position, cc4, HV, DAQ, detector type, … for each channel.

setup_info: dict with the keys ‘experiment’ and ‘period’

Later will probably be changed to get channel map by run, if possible Planning to add:

  • barrel column for SiPMs special case

get_channel_status()

Add status column to channel map with on/off for software status.

setup_info: dict with the keys ‘experiment’ and ‘period’

Later will probably be changed to get channel status by timestamp (or hopefully run, if possible)

get_data(parameters: str | list[str] | tuple[str] = ())

Get data for requested parameters from DataLoader and “prime” it to be ready for analysis.

parameters: single parameter or list of parameters to load.

If empty, only default parameters will be loaded (channel, timestamp; baseline and wfmax for pulser)

get_parameters_for_dataloader(parameters: str | list[str])

Construct list of parameters to query from the DataLoader.

  • parameters that are always loaded (+ pulser special case)

  • parameters that are already in lh5

  • parameters needed for calculation, if special parameter(s) asked (e.g. wf_max_rel)

include_aux(params: str | list, dataset: dict, plot: dict, aux_ch: str)

Include in a new column data coming from PULS01ANA aux channel, to either compute a ratio or a difference with data coming from the inspected subsystem.

remove_timestamps(remove_keys: dict)

Remove timestamps from the dataframes for a given channel.

The time interval in which to remove the channel is provided through an external YAML file.

legend_data_monitor.utils module

legend_data_monitor.utils.add_config_entries(config: dict, file_keys: str, prod_path: str, prod_config: dict) dict

Add missing information (output, dataset) to the configuration file. This function is generally used during automathic data production, where the initiali config file has only the ‘subsystem’ entry.

Return type:

dict

legend_data_monitor.utils.build_detector_info(metadata_path, start_key=None)

Build detector information from LEGEND metadata.

Parameters:
  • metadata_path (str) – Path to the metadata file.

  • start_key (optional) – Starting key for channelmap selection.

Returns:

Dictionary with two main entries: - “detectors”: mapping from detector name to different infos

  • daq_rawid : int

  • channel_str : str (e.g. “ch1234”)

  • string : int

  • position : int

  • processable : bool

  • usability : str

  • mass_in_kg : int

  • ”str_chns”: mapping from string to a list of detector names

Return type:

dict

legend_data_monitor.utils.build_detector_info_per_period(auto_dir_path: str, run_dict: dict, period: str)
legend_data_monitor.utils.build_file_map(base_path: str, tier: str = 'hit') dict

Build mapping from (period, run) to calibration file paths.

Returns (period, run) -> file path mapping.

Parameters:
  • base_path (str) – Base directory of auto production data.

  • tier (str) – Data tier (‘hit’ or ‘dsp’).

Return type:

dict

legend_data_monitor.utils.build_runinfo(path: str, version: str, proc_folder: str, output: str | None)

Build dictionary with main run information (start key, phy livetime in seconds) for multiple data types (phy, cal, fft, bkg, pzc, pul, …).

legend_data_monitor.utils.bunch_dataset(config: dict, n_files=None)

Bunch the full datasets into smaller pieces, based on the number of files we want to inspect at each iteration.

It works for “start+end”, “runs” and “timestamps” in “dataset” present in the config file.

legend_data_monitor.utils.check_cal_phy_thresholds(output_folder: str, period: str, run: str, key: str, detectors: list, pswd_email: str | None)

Check detector calibration/physics thresholds for a given run and optionally send an alert mail.

Parameters:
  • output_folder (str) – Path to output folder where the output summary YAML and plots will be stored.

  • period (str) – Period to inspect.

  • run (str) – Run to inspect.

  • key (str) – Data type key to inspect, either ‘cal’ or ‘phy’.

  • detectors (list) – List of detector names.

  • pswd_email (str or None) – Password for the email account used to send alerts; if None, no email is sent.

legend_data_monitor.utils.check_empty_df(df) bool

Check if df (DataFrame | analysis_data.AnalysisData) exists and is not empty.

Return type:

bool

legend_data_monitor.utils.check_key_existence(hdf_path: str, key_to_load: str) bool

Check if a specific key exists in the specified hdf file path.

Return type:

bool

legend_data_monitor.utils.check_plot_settings(conf: dict) bool
Return type:

bool

legend_data_monitor.utils.check_scdb_settings(conf: dict) bool

Validate the ‘slow_control’ entry in the config dictionary by checking if it contains a ‘slow_control’ section with a ‘parameters’ key. It ensures that the ‘parameters’ value is either a string or a list of strings. Always returns True if the configuration passes all checks. Exits the program otherwise.

Parameters:

conf (dict) – SC configuration dictionary.

Return type:

bool

Examples

>>> conf = {
...     'slow_control': {
...         'parameters': ['RREiT', 'ZUL_T_RR']
...     }
... }
>>> check_scdb_settings(conf)
True
legend_data_monitor.utils.check_threshold(data_series: Series, channel_name: str, last_checked: float | None | str, t0: list, threshold: list, parameter: str, output: dict)

Check if a given parameter is over threshold and update the email message list.

Parameters:
  • data_series (pd.Series) – Series of gain differences indexed by timestamp.

  • last_checked (float) – Timestamp (in seconds since epoch) of last check.

  • t0 (list of pd.Timestamp) – List of start times for time windows.

  • threshold (list) – Threshold (int or float).

  • channel_name (str) – Name of the channel.

  • parameter (str) – Parameter name under inspection.

  • output (dict) – Dictionary containing summary cal and phy info.

legend_data_monitor.utils.convert_to_camel_case(string: str, char: str) str

Remove a character from a string and capitalize all initial letters.

Return type:

str

legend_data_monitor.utils.dataset_validity_check(data_info: dict)

Check the validity of the input dictionary and if it contains all required fields and keys to existing paths.

This function is typically used in Subsystem and SlowControl classes to ensure that all necessary metadata for accessing data is present and correct. The function also checks that the provided path and the combined path/version exist on the filesystem.

Parameters:

data_info (dict) –

Dictionary containing dataset metadata. Required keys:

  • ’experiment’str

    Name of the experiment.

  • ’type’str

    Type of dataset.

  • ’period’str

    Period to inspect.

  • ’path’str

    Path to the base dataset directory.

  • ’version’str

    Processing version. Can be empty string if not needed.

Examples

>>> dataset_info = {
...     'experiment': 'L200',
...     'period': 'p03',
...     'type': 'phy',
...     'path': '/global/cfs/cdirs/m2676/data/lngs/l200/public/prodenv/prod-blind/',
...     'version': 'tmp-auto',
...     // ... additional time selection keys
... }
>>> dataset_validity_check(dataset_info)
# No output if all checks pass; errors otherwise
legend_data_monitor.utils.deep_get(d, keys, default=None, verbose=False)
legend_data_monitor.utils.find_over_threshold(data_series: Series, last_checked: float | None | str, t0: list, threshold: list) bool

Return timestamps where values exceed the given thresholds.

Parameters:
  • data_series (pd.Series) – Series of values indexed by datetime.

  • last_checked (float | None | str) – Epoch time (seconds) of the last check; if None/”None”, no cutoff is applied.

  • t0 (list of pd.Timestamp) – Start times where the first entry defines the window start.

  • threshold (list) – Threshold bounds; either can be None.

Return type:

bool

legend_data_monitor.utils.get_all_plot_parameters(subsystem: str, config: dict)

Get list of all parameters needed for all plots for given subsystem.

legend_data_monitor.utils.get_key(dsp_fname: str) str

Extract key from lh5 filename.

Return type:

str

legend_data_monitor.utils.get_last_timestamp(fname: str) str

Read a lh5 file and return the last timestamp saved in the file. This works only in case of a global trigger where the whole array is entirely recorded for a given timestamp.

Return type:

str

legend_data_monitor.utils.get_livetime(tot_livetime: float)

Get the livetime in a human readable format, starting from livetime in seconds.

Parameters:

tot_livetime (float) –

  • If tot_livetime is more than 0.1 yr, convert it to years.

  • If tot_livetime is less than 0.1 yr but more than 1 day, convert it to days.

  • If tot_livetime is less than 1 day but more than 1 hour, convert it to hours.

  • If tot_livetime is less than 1 hour but more than 1 minute, convert it to minutes.

legend_data_monitor.utils.get_map_dict(data_analysis: DataFrame)

Map string location and geds position for plotting values vs chs.

Parameters:

data_analysis (DataFrame) – DataFrame with geds data information, in particular ‘location’ and ‘position’

legend_data_monitor.utils.get_multiple_run_id(user_time_range: dict) str
Return type:

str

legend_data_monitor.utils.get_output_path(config: dict)

Get output path provided a ‘dataset’ from the config file. The path will be used to save and store pdfs/hdf/etc files.

legend_data_monitor.utils.get_output_plot_path(plt_path: str, extension: str) str

Given a path to the plt directory, generate a corresponding output path in the tmp/mtg/ directory.

Parameters:
  • plt_path (str) – Original plot path (e.g. from ‘plt/hit/phy/’).

  • extension (str) – Extension of the file to save (e.g. ‘pdf’ or ‘log’).

Return type:

str

legend_data_monitor.utils.get_query_timerange(**kwargs)

Get DataLoader compatible time range.

The function accepts either a dataset dictionary or keyword arguments. Only one type of time selection should be provided at a time. Designed in such a way to accommodate Subsystem init kwargs.

Parameters:

dataset (dict, optional) –

Dictionary specifying the time selection. Choose one of the following (or enter kwargs separately):
  1. ’start’str, ‘end’str

    Start and end datetime in the format ‘YYYY-MM-DD hh:mm:ss’.

  2. ’window’str

    Time window relative to the current time, formatted as ‘Xd Xh Xm’ for days, hours, and minutes.

  3. ’timestamps’str or list of str

    Specific timestamps in ‘YYYYMMDDThhmmssZ’ format.

  4. ’runs’int or list of ints

    Run number(s), e.g., 10 corresponds to ‘r010’

Examples

>>> get_query_timerange(start='2022-09-28 08:00:00', end='2022-09-28 09:30:00')
{'timestamp': {'start': '20220928T080000Z', 'end': '20220928T093000Z'}}
>>> get_query_timerange(window='1d 5h 0m')
{'timestamp': {'end': '20230220T114337Z', 'start': '20230219T064337Z'}}
>>> get_query_timerange(timestamps=['20220928T080000Z', '20220928093000Z'])
{'timestamp': ['20220928T080000Z', '20220928093000Z']}
>>> get_query_timerange(timestamps='20220928T080000Z')
{'timestamp': ['20220928T080000Z']}

>> get_query_timerange(runs=[9,10]) {‘run’: [‘r009’, ‘r010’]} >>> get_query_timerange(runs=10) {‘run’: [‘r010’]}

>>> get_query_timerange(dataset={'start': '2022-09-28 08:00:00', 'end':'2022-09-28 09:30:00'})
{'timestamp': {'start': '20220928T080000Z', 'end': '20220928T093000Z'}}
legend_data_monitor.utils.get_query_times(**kwargs)

Get time ranges for DataLoader query from user input, as well as first/last timestamp for channel map / status / SC query.

Parameters:

dataset (dict, optional) –

Dictionary with the following keys (note: can provide the same keys as in dataset but separately, i.e. path=…, version=…, type=…, and one of start=…&end=…, window=…, timestamps=…, or runs=…):

  • ’path’str

    Base path to the dataset.

  • ’version’str

    Dataset version.

  • ’type’str

    Type of dataset. Note: multiple types are not currently supported.

  • Time selection keys (choose one):

    1. ’start’str, ‘end’str

      Start and end datetime in the format ‘YYYY-MM-DD hh:mm:ss’.

    2. ’window’str

      Time window from the current time, e.g., ‘1d 2h 30m’ for 1 day, 2 hours, 30 minutes.

    3. ’timestamps’str or list of str

      Timestamps in the format ‘YYYYMMDDThhmmssZ’.

    4. ’runs’int or list of ints

      Run number(s), e.g., 10 corresponds to run ‘r010’.

Notes

  • path, version, and type are required because channel map and status cannot be retrieved by run directly. These are used to determine the first timestamp available in the data path.

  • Designed in such a way to accommodate Subsystem init kwargs.

Examples

>>> get_query_times(..., start='2022-09-28 08:00:00', end='2022-09-28 09:30:00')
{'timestamp': {'start': '20220928T080000Z', 'end': '20220928T093000Z'}}, '20220928T080000Z'

>> get_query_times(…, runs=27) ({‘run’: [‘r027’]}, ‘20220928T091135Z’)

legend_data_monitor.utils.get_run_name(config: dict, user_time_range: dict) str

Get the run ID given start/end timestamps. If the timestamps run over multiple run IDs, a list of runs is retrieved, out of which only the first element is returned.

Return type:

str

legend_data_monitor.utils.get_start_key(auto_dir_path: str, data_type: str, period: str, current_run: str)
legend_data_monitor.utils.get_status_map(path: str, version: str, first_timestamp: str, datatype: str)

Return the correct status map, either reading a .json or .yaml file.

legend_data_monitor.utils.get_tiers_pars_folders(path: str)

Get the absolute path to different tier and par folders.

Parameters:

path (str) – Absolute path to the processed data for a specific version, eg path=’/global/cfs/cdirs/m2676/data/lngs/l200/public/prodenv/prod-blind/ref-v2.1.5/’.

legend_data_monitor.utils.get_time_name(user_time_range: dict) str

Get a name for each available time selection.

Parameters:

user_time_range (dict) – Careful handling of folder name depending on the selected time range

Return type:

str

Examples

>>> get_time_name({'timestamp': {'start': '20220928T080000Z', 'end': '20220928T093000Z'}})
20220928T080000Z_20220928T093000Z
>>> get_time_name({'timestamp': ['20230207T103123Z']})
20230207T103123Z
>>> get_time_name({'timestamp': ['20230207T103123Z', '20230207T141123Z', '20230207T083323Z']})
20230207T083323Z_20230207T141123Z
>>> get_time_name({'run': ['r010']})
r010
>>> get_time_name({'run': ['r010', 'r014']})
r010_r014
legend_data_monitor.utils.get_timestamp(filename: str)

Get the timestamp from a filename. For instance, if file=’l200-p04-r000-phy-20230421T055556Z-tier_dsp.lh5’, then it returns ‘20230421T055556Z’.

legend_data_monitor.utils.get_timestamp_from_path(path)
legend_data_monitor.utils.get_valid_path(base_path)
legend_data_monitor.utils.is_bad(t, intervals)
legend_data_monitor.utils.load_and_filter(store, key: str, mask=None)

Load a given key from a HDF file and applies a mask.

legend_data_monitor.utils.load_config(config_file: dict | str)

Load a configuration from a dictionary, JSON string, or YAML file.

This function supports three input types:

  • A dictionary, which is returned as-is.

  • A JSON string, which is parsed into a dictionary.

  • A path to a YAML (.yaml/.yml) file, which is read and parsed.

Parameters:

config_file (dict or str) – The configuration input

legend_data_monitor.utils.load_tier_config(path: str, version: str, tier_name: str)

Load tier configuration (YAML or JSON) for the given tier name, and search through possible directory structures and file patterns.

Parameters:
  • path (str) – Path to the processing environment, e.g. ‘/data2/public/prodenv/prod-blind’.

  • version (str) – Version of data under inspection, e.g. ‘tmp-auto’.

  • tier_name (str) – Name of the tier under inspection, e.g. ‘hit’.

legend_data_monitor.utils.load_yaml_or_default(path: str, detectors: dict) dict

Load YAML if it exists, else return a default dict.

Return type:

dict

legend_data_monitor.utils.make_dir(dir_path)

Check if directory exists, and if not, make it.

legend_data_monitor.utils.make_output_paths(config: dict, user_time_range: dict) str

Get a dict and return a dict. The function defines output paths and create directories accordingly.

To use when you want a specific output structure of the following type: […]/prod-ref/{version}/generated/plt/hit/phy/{period}/{run} This does not work if you select more types (eg. both cal and phy) or timestamp intervals (but just runs). It can be used for run summary plots, eg during stable data taking. Note that monitoring plots are stored under the ‘hit’ subfolder to replicate the structure of the main prodenv.

Return type:

str

legend_data_monitor.utils.none_to_nan(data: list)

Convert None elements into nan values for an input list.

legend_data_monitor.utils.pulser_from_evt_or_mtg(my_dir, period, run, output, run_info)

Try to load EVT tier; if not found, attempt to update run info from monitoring path.

legend_data_monitor.utils.read_json_or_yaml(file_path: str)

Open either a JSON/YAML file, if not raise an error and exit.

Parameters:

file_path (str) – Path to the JSON/YAML file to read.

legend_data_monitor.utils.retrieve_json_or_yaml(base_path: str, filename: str)

Return either a yaml or a json file for the specified file looking at the existing available extension.

legend_data_monitor.utils.send_email_alert(app_password: str, recipients: list, text_file_path: str)

Send automatic emails with alert messages.

Parameters:
  • app_password (str) – String password to send mails from legend.data.monitoring@gmail.com

  • recipients (list) – List of email addresses to send the alert emails

  • text_file_path (str) – String path to the .txt file containing the message to send via email

legend_data_monitor.utils.unix_timestamp_to_string(unix_timestamp)

Convert a Unix timestamp to a string in the format ‘YYYYMMDDTHHMMSSZ’ with the timezone indicating UTC+00.

legend_data_monitor.utils.update_evaluation_in_memory(data: dict, det_name: str, data_type: str, key: str, value: bool | float)

Update the key entry in memory dict, where value can be bool or nan if not available; data_type is either ‘cal’ or ‘phy’.

Parameters:
  • data (dict) – Dictionary storing summary monitoring results, structured as data[det_name][data_type][key] = False/True/null.

  • det_name (str) – Detector name.

  • data_type (str) – Data type, either ‘cal’ or ‘phy’.

  • key (str) – Parameter’s key name, eg ‘fwhm_ok’ or ‘pulser_stab’.

  • value (bool or float) – Value to assign: False/True/null.

legend_data_monitor.utils.update_runinfo(run_info: dict, period: str, run: str, data_type: str, mtg_files_path: str)

Update run information dict, with livetime in seconds for phy data; it automatically removes cycles that are flagged as unusable via keys stored in settings/ignore-keys.yaml.

Parameters:
  • run_info (dict) – Dictionary containing metadata for runs, separated by period, run, and data type (cal, phy, …).

  • period (str) – Period under inspection.

  • run (str) – Run under inspection.

  • data_type (str) – Data type to process (cal, phy, …).

  • mtg_files_path (str) – Path where the monitoring HDF5 files were stored for a specific period and run.