legend_data_monitor package¶
Subpackages¶
- legend_data_monitor.excel package
- Submodules
- legend_data_monitor.excel.config_io module
- legend_data_monitor.excel.core module
- legend_data_monitor.excel.detector_history module
- legend_data_monitor.excel.make_dashboard module
- legend_data_monitor.excel.read_qcp module
- legend_data_monitor.excel.read_usability module
- legend_data_monitor.excel.sync_to_datasets module
Submodules¶
legend_data_monitor.analysis_data module¶
- class legend_data_monitor.analysis_data.AnalysisData(sub_data: DataFrame, **kwargs)¶
Bases:
objectObject containing information for a data subselected from Subsystem data based on given criteria.
sub_data [DataFrame]: subsystem data
- Available kwargs:
- selection=
- dict with the following contents:
‘parameters’ [str or list of str]: parameter(s) of interest e.g. ‘baseline’
‘event_type’ [str]: event type, options: pulser/phy/all
‘cuts’ [str or list of str]: [optional] cuts to apply to data (will be loaded but not applied immediately)
- ‘variation’ [bool]: [optional] keep absolute value of parameter (False) or calculate % variation from mean (True).
Default: False
- ‘time_window’ [str]: [optional] time window in which to calculate event rate, in case that’s the parameter of interest.
Format: time_window=’NA’, where N is integer, and A is M for months, D for days, T for minutes, and S for seconds. Default: None
- aux_info=
- str that has info regarding pulser operations (as difference or ratio wrt geds (spms?) data). Available options are:
“pulser01anaRatio”
“pulser01anaDiff”
Or input kwargs directly parameters=, event_type=, cuts=, variation=, time_window=
- add_channel_mean_column()¶
Add a column to self.data with the per-channel mean of a time-cut DataFrame.
- Parameters:
self (AnalysisData object) – An AnalysisData object that has data as a column.
- Returns:
self.data – The original data with an additional column for the per-channel mean.
- Return type:
DataFrame
- apply_all_cuts()¶
- apply_cut(cut: str)¶
Apply given boolean cut.
Format: cut name as in lh5 files (“is_*”) to apply given cut, or cut name preceded by “~” to apply a “not” cut.
- calculate_variation()¶
Add a new column containing the percentage variation of a given parameter.
The new column is called ‘<parameter>_var’. There is still the <parameter> column containing absolute values. There is only the <parameter> column if variation is set to False.
- channel_mean()¶
Get mean value of each parameter of interest in each channel in the first 10% of the dataset.
Ignore in case of SiPMs, as each entry is a list of values, not a single value.
- convert_bitmasks()¶
Convert float64 bitmask columns into boolean columns based on the conditions saved in metadata.
- get_subsys() str¶
Return ‘pulser’, ‘pulser01ana’, ‘FCbsln’, ‘muon’, ‘geds’ or ‘spms’ depending on the subsystem type.
- Return type:
- is_spms() bool¶
Return True if ‘location’ (=fiber) and ‘position’ (=top, bottom) are strings.
- Return type:
- select_events()¶
- special_parameter()¶
- legend_data_monitor.analysis_data.concat_channel_mean(self, channel_mean) DataFrame¶
Add a new column containing the mean values of the inspected parameter.
- Return type:
DataFrame
- legend_data_monitor.analysis_data.cut_dataframe(df: DataFrame, fraction: float = 0.1) DataFrame¶
Get mean value of the parameters under study over the first ‘fraction’ of data present in the selected time range of the input dataframe.
- Return type:
DataFrame
- legend_data_monitor.analysis_data.get_aux_df(df: DataFrame, parameter: list, plot_settings: dict, aux_ch: str) DataFrame¶
Get dataframes containing auxiliary (PULS01ANA) data, storing absolute/diff&ratio/mean/% variations values.
- Return type:
DataFrame
- legend_data_monitor.analysis_data.get_aux_info(df: DataFrame, chmap: dict, aux_ch: str) DataFrame¶
Return a DataFrame with correct pulser AUX info.
- Return type:
DataFrame
- legend_data_monitor.analysis_data.get_saved_df_hdf(self, subsys: str, param: str, old_df: DataFrame) DataFrame¶
Get the already saved dataframe from the already saved output jdf file, for a given parameter
`param`. In particular, it evaluates again the mean over the new 10% of data in the new larger time window.- Return type:
DataFrame
legend_data_monitor.automatic_run module¶
- legend_data_monitor.automatic_run.auto_run(cluster, ref_version, output_folder, partition, pswd, get_sc, port, pswd_email, chunk_size, input_period, input_run, save_pdf, escale_val, data_type)¶
Inspect LEGEND HDF5 (LH5) processed data (and Slow Control data from lngs-login cluster) for a specific period and run (if specified; otherwise the latest being processed are used); plots and summary files are saved; automatic alert emails are sent.
- legend_data_monitor.automatic_run.check_calib(auto_dir_path: str, output_folder: str, period: str, current_run: str, pswd_email: str, data_type: str = 'phy', partition: bool = False, save_pdf: bool = False)¶
Check calibration stability in calibration runs and create monitoring summary file.
- Parameters:
auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
output_folder (str) – Path to output folder.
period (str) – Period to inspect.
current_run (str) – Run under inspection.
pswd_email (str) – Password to access the legend.data.monitoring@gmail.com account for sending alert messages.
data_type (str) – Data type to load; default: ‘phy’.
partition (bool) – False if not partition data; default: False.
save_pdf (bool) – True if you want to save pdf files too; default: False.
- legend_data_monitor.automatic_run.qc_avg_series(auto_dir_path: str, output_folder: str, start_key: str, period: str, current_run: str, save_pdf: bool = False)¶
Plot quality cuts average values across the array and trends in time.
- Parameters:
auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
output_folder (str) – Path to output folder.
start_key (str) – First timestamp of the inspected range.
period (str) – Period to inspect.
current_run (str) – Run under inspection.
save_pdf (bool) – True if you want to save pdf files too; default: False.
- legend_data_monitor.automatic_run.summary_plots(auto_dir_path: str, phy_mtg_data: str, output_folder: str, start_key: str, period: str, current_run: str, runs: list, pswd_email: str, last_checked: str, data_type: str = 'phy', partition: bool = False, escale_val: float = 2039.0, save_pdf: bool = False, zoom: bool = False, quadratic: bool = False)¶
Run function for creating summary plots.
- Parameters:
auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
phy_mtg_data (str) – Path to generated monitoring hdf files.
output_folder (str) – Path to output folder.
start_key (str) – First timestamp of the inspected range.
period (str) – Period to inspect.
current_run (str) – Run under inspection.
runs (list) – Available runs to inspect for a given period.
pswd_email (str) – Password to access the legend.data.monitoring@gmail.com account for sending alert messages.
last_checked (str) – Timestamp of the last check.
data_type (str) – Data type to load; default: ‘phy’.
partition (bool) – False if not partition data; default: False.
escale_val (float) – Energy scale at which evaluating the gain differences; default: 2039 keV (76Ge Qbb).
save_pdf (bool) – True if you want to save pdf files too; default: False.
zoom (bool) – True to zoom over y axis; default: False.
quadratic (bool) – True if you want to plot the quadratic resolution too; default: False.
legend_data_monitor.calibration module¶
- legend_data_monitor.calibration.check_calibration(tmp_auto_dir: str, output_folder: str, period: str, run: str, first_run: bool, det_info: dict, save_pdf=False)¶
Check calibration stability for a given run and update monitoring summary YAML file.
- Parameters:
tmp_auto_dir (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
output_folder (str) – Path to output folder where the output summary YAML and plots will be stored.
period (str) – Period to inspect.
run (str) – Run to inspect.
first_run (bool) – Flag indicating whether this is the first run of the period.
det_info (dict) – Dictionary containing detector metadata.
save_pdf (bool) – True if you want to save pdf files too; default: False.
- legend_data_monitor.calibration.check_calibration_lac_ssc(tmp_auto_dir: str, output_folder: str, period: str, run: str, run_to_apply: str, first_run: bool, det_info: dict, data_type='cal', save_pdf=False)¶
Check calibration stability for a given run and update monitoring summary YAML file in special LAC or SSC data.
- Parameters:
tmp_auto_dir (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
output_folder (str) – Path to output folder where the output summary YAML and plots will be stored.
period (str) – Period to inspect.
run (str) – Run to inspect.
run_to_apply (str) – Calibration run to apply to these data.
first_run (bool) – Flag indicating whether this is the first run of the period.
det_info (dict) – Dictionary containing detector metadata.
save_pdf (bool) – True if you want to save pdf files too; default: False.
- legend_data_monitor.calibration.check_escale(auto_dir_path: str, cal_path: str, output_folder: str, period: str, current_run: str, det_info: dict, save_pdf: bool) None¶
Run energy-scale calibration checks and generate detector plots.
- Parameters:
auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
cal_path (str) – Path to the directory containing calibration runs (eg /data2/public/prodenv/prod-blind/tmp-auto/generated/par/<tier>/cal/<period>).
output_folder (str) – Path to output folder where the summary plots will be stored.
period (str) – Period to inspect.
current_run (str) – Run to inspect.
det_info (dict) – Dictionary containing detector metadata.
save_pdf (bool) – True if you want to save pdf files too; default: False.
- legend_data_monitor.calibration.check_psd(auto_dir_path: str, cal_path: str, pars_files_list: list, output_dir: str, period: str, current_run: str, det_info: dict, save_pdf: bool)¶
Evaluate the PSD usability for a set of detectors based on calibration results; save results in a YAML summary file; plot per-detector PSD stability data and store them as shelve file (and pdf if wanted).
- Parameters:
auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
cal_path (str) – Path to the directory containing calibration runs (eg /data2/public/prodenv/prod-blind/tmp-auto/generated/par/<tier>/cal/<period>).
pars_files_list (list) – List of YAML/JSON files containing results for each calibration run.
output_dir (str) – Path to output folder where the output summary YAML and plots will be stored.
period (str) – Period to inspect.
current_run (str) – Run to inspect.
det_info (dict) – Dictionary containing detector metadata.
save_pdf (bool) – True if you want to save pdf files too; default: False.
- legend_data_monitor.calibration.evaluate_psd_performance(mean_vals: list, sigma_vals: list, run_labels: list, current_run: str, det_name: str)¶
Evaluate PSD performance metrics: slow shifts and sudden shifts and return a dict with evaluation results.
- legend_data_monitor.calibration.evaluate_psd_usability_and_plot(period: str, current_run: str, fit_results_cal: dict, det_name: str, location, output_dir: str, psd_data: dict, save_pdf: bool)¶
Plot PSD stability results across runs, evaluate performance, and save both plot and evaluation summary.
- legend_data_monitor.calibration.fep_gain_variation(period: str, run: str, pars: dict, chmap: dict, timestamps: ndarray, values: ndarray, output_dir: str, save_pdf: bool, shelf: Shelf)¶
Compute and plot FEP gain variation for a single detector; optional pdf saving; store a serialized plot in a shelve object.
- Parameters:
period (str) – Period to inspect.
run (str) – Run to inspect.
pars (dict) – Calibration results dictionary for a given detector.
chmap (dict) – Dictionary with detector info, must include ‘name’, ‘string’, ‘position’.
timestamps (np.ndarray) – Array of timestamps for a given detector.
values (np.ndarray) – Array of energies for a given detector.
output_dir (str) – Path to output folder where plots will be stored.
save_pdf (bool) – If True, save a PDF of the plot.
shelf (shelve.Shelf) – Open shelve object where serialized plots will be stored.
- legend_data_monitor.calibration.get_partitions_params(ge_keys: list, detector_status: dict, run_dict: dict, hit_map: dict, dsp_map: dict) dict¶
Build per-detector calibration and analysis parameters across runs.
Returns a nested dictionary: det -> parameter -> peak -> run_key -> value
- Parameters:
- Return type:
- legend_data_monitor.calibration.load_fit_pars_from_yaml(pars_files_list: list, detectors_list: list, detectors_name: list, avail_runs: list)¶
Load detector data from YAML files and return directly as a dict.
- Parameters:
pars_files_list (list) – List of file paths to YAML parameter files.
detectors_list (list) – List of detector raw IDs (eg. ‘ch1104000’) to extract data for.
detectors_name (list) – List of detector names (eg. ‘V11925A’) to extract data for.
avail_runs (list or None) – Available runs to inspect (e.g. [4, 5, 6]); if None, keep all.
- Returns:
- {
- “V11925A”: {
“r004”: {“mean”: …, “mean_err”: …, “sigma”: …, “sigma_err”: …}, “r005”: {…}, …
}, “V11925B”: {
”r004”: {…}, …
}
}
- Return type:
legend_data_monitor.core module¶
- legend_data_monitor.core.auto_control_plots(config: str, file_keys: str, prod_path: str, prod_config: str, n_files=None)¶
Set the configuration file and the output paths when a config file is provided during automathic plot production.
- legend_data_monitor.core.control_plots(user_config_path: str, n_files=None)¶
Set the configuration file and the output paths when a user config file is provided. The function to generate plots is then automatically called.
- legend_data_monitor.core.generate_plots(config: dict, plt_path: str, n_files=None)¶
Generate plots once the config file is set and once we provide the path and name in which store results. n_files specifies if we want to inspect the entire time window (if n_files is not specified), otherwise we subdivide the time window in smaller datasets, each one being composed by n_files files.
legend_data_monitor.monitoring module¶
- legend_data_monitor.monitoring.add_calibration_runs(period: str | list, run_list: list | dict) list¶
Add special calibration runs to the run list for a given period.
- legend_data_monitor.monitoring.box_summary_plot(period: str, run: str, pars: dict, det_info: dict, results: dict, info: dict, output_dir: str, data_type: str, save_pdf: bool, run_to_apply=None)¶
Box plot summary for FEP gain variations for multiple detectors.
- Parameters:
period (str) – Period to inspect.
run (str) – Run to inspect.
pars (dict) – Calibration results for each detector.
det_info (dict) – Dictionary with channel names, IDs, and mapping to string and position.
results (dict) – Dictionary with arrays values (per detector); None if invalid.
info (dict) – Dictionary containing info on a parameter basis (eg label name, file title, colours, limits, …).
output_dir (str) – Output folder for saving plots and shelve data.
data_type (str) – Type of data, either ‘cal’ or ‘phy’.
save_pdf (bool) – If True, save the summary plot as a PDF.
run_to_apply – Run to apply (eg see ssc data).
- legend_data_monitor.monitoring.build_new_files(generated_path: str, period: str, run: str, data_type='phy')¶
Generate and store resampled HDF files for a given data run and extract summary info.
This function:
loads the original .hdf file for the specified period and run
extracts available keys from the HDF file
resamples all applicable time series data into multiple time intervals (10min, 60min)
stores each resampled dataset into a separate HDF file
extracts metadata from the ‘info’ key and saves it as a .yaml file
- legend_data_monitor.monitoring.compute_dead_time(df, window_ms=10)¶
Compute dead time percentage based on discharge windows.
- Parameters:
df (pd.DataFrame) – Timestamps and boolean detector columns with is_discharge entries.
window_ms (float) – Dead time window after each discharge; default: 10 ms.
- legend_data_monitor.monitoring.compute_diff(values: ndarray, initial_value: float | int, scale: float | int) ndarray¶
Compute relative differences with respect to an initial value. If the initial value is zero, returns an array of nan values.
- legend_data_monitor.monitoring.compute_diff_and_rescaling(series: Series, reference: float, escale: float, variations: bool)¶
Compute relative differences (if ‘variations’ is True) and rescale values by ‘escale’.
- legend_data_monitor.monitoring.evaluate_fep_cal(pars_dict: dict, channel: str, fep_peak_pos: float, fep_peak_pos_err: float)¶
Return calibrated FEP position (fep_cal) and error (fep_cal_err).
- legend_data_monitor.monitoring.extract_fep_peak(pars_dict: dict, channel: str)¶
Return fep_peak_pos, fep_peak_pos_err, fep_gain, fep_gain_err.
- legend_data_monitor.monitoring.extract_resolution_at_q_bb(pars_dict: dict, channel: str, key_result: str, fit: str = 'linear')¶
Return Qbb_fwhm (linear resolution) and Qbb_fwhm_quad (quadratic resolution).
- legend_data_monitor.monitoring.filter_by_period(series: Series, period: str | list) Series¶
Return a series filtered by ignore keys for the given period(s).
- legend_data_monitor.monitoring.filter_series_by_ignore_keys(series_to_filter: Series, skip_keys: dict, period: str)¶
Remove data from a time-indexed pandas Series that falls within time ranges specified by start and stop timestamps for a given period.
- Parameters:
series_to_filter (pd.Series) – The time-indexed pandas Series to be filtered.
skip_keys (dict) – Dictionary mapping periods to sub-dictionaries containing ‘start_keys’ and ‘stop_keys’ lists with timestamp strings in the format ‘%Y%m%dT%H%M%S%z’.
period (str) – The period to check for keys to ignore. If not present, the series is returned unmodified.
- legend_data_monitor.monitoring.find_hdf_file(directory: str, include: list[str], exclude: list[str] | None = None) str | None¶
Find the original HDF monitoring file in a given directory, matching inclusion/exclusion filters.
- Parameters:
- Return type:
str | None
- legend_data_monitor.monitoring.get_calib_data_dict(calib_data: dict, channel_info: list, tiers: list, pars: list, period: str, run: str, tier: str, key_result: str, fit: str, data_type: str)¶
Extract calibration information for a given run and appends it to the provided dictionary.
This function loads calibration parameters for a specific detector channel and run, parses energy calibration results and resolution information, and evaluates derived values such as gain and calibration constants. It appends the extracted data to the provided calib_data dictionary, which is expected to contain keys like “fep”, “fep_err”, “cal_const”, “cal_const_err”, “run_start”, “run_end”, “res”, and “res_quad”.
- Parameters:
calib_data (dict) – Dictionary that accumulates calibration results across runs.
channel_info (list) – List of [channel ID, channel name].
tiers (list of str) – Paths to tier data folders based on the inspected processed version.
period (str) – Period to inspect.
run (str) – Run to inspect.
tier (str) – Tier level for the analysis (‘hit’, ‘phy’, etc.).
key_result (str) – Key name used to extract the resolution results from the parsed file.
fit (str) – Fitting method used for energy resolution, either ‘linear’ or ‘quadratic’.
data_type (str)
- legend_data_monitor.monitoring.get_calib_pars(path: str, period: str | list, run_list: list, channel_info: list, partition: bool, data_type: str, escale: float, fit='linear') dict¶
Retrieve and process calibration parameters across a list of runs for a given channel.
This function loads calibration data from JSON/YAML files for each specified run, computes gain and calibration constant evolution over time, and returns a dictionary of relevant quantities, including their relative changes with respect to the initial values. It optionally appends special calibration runs at the end of a period, if available.
- Parameters:
path (str) – Base directory containing the tier and parameter folders.
period (str or list) – Period to inspect. Can be a list if multiple periods are inspected.
run_list (list) – List of run to inspect, or a dictionary mapping periods to lists of runs.
channel_info (list) – List containing [channel ID, channel name].
partition (bool) – True if you want to retrieve partition calibration results.
escale (float) – Scaling factor used to compute relative differences in gain and calibration constant.
fit (str, optional) – Fit method used for energy resolution (“linear” or “quadratic”), by default “linear”.
- Return type:
- legend_data_monitor.monitoring.get_calibration_file(folder_par: str) dict¶
Return the content of the JSON/YAML calibration file in folder_par.
- legend_data_monitor.monitoring.get_dfs(phy_mtg_data: str, period: str, run_list: list, parameter: str)¶
Load and concatenate monitoring data from HDF files for a given period and list of runs.
- Parameters:
phy_mtg_data (str) – Path to the base directory containing monitoring HDF5 files (typically ending in /mtg/phy).
period (str) – Period to inspect.
run_list (list) – List of available runs.
parameter (str) – Parameter name used to construct the HDF key for loading specific datasets (e.g., ‘TrapemaxCtcCal’ looks for ‘IsPulser_TrapemaxCtcCal’).
- legend_data_monitor.monitoring.get_energy_key(ecal_results: dict) dict¶
Retrieve the energy calibration results from a given dictionary.
This function searches for specific keys (‘cuspEmax_ctc_runcal’ or ‘cuspEmax_ctc_cal’) in the input ecal_results dictionary. It returns a sub-dictionary if one of the keys is found, otherwise an empty dictionary is returned.
- legend_data_monitor.monitoring.get_pulser_data(resampling_time: str, period: str | list, dfs: list, channel: str, escale: float, variations=False) dict¶
Return a dictionary of geds and pulser filtered dataframes for which a time resampling is performed.
- Parameters:
resampling_time (str) – Resampling time, eg ‘1HH’ or ‘10T’.
dfs (list) – List of dataframes for geds and pulser events.
channel (str) – Channel to inspect.
escale (float) – Scaling factor used to compute relative differences in gain and calibration constant.
variations (bool) – True if you want to retrieve % variations (default: False).
- Return type:
- legend_data_monitor.monitoring.get_run_start_end_times(sto, tiers: list, period: str, run: str, tier: str)¶
Determine the start and end timestamps for a given run, including the special case for additional final calibration runs.
- legend_data_monitor.monitoring.get_tier_keyresult(tiers: list)¶
Retrieve proper tier name (pht or hit) and key_result (partition_ecal or ecal) depending if partitioning data exists or not.
- Parameters:
tiers (list) – Base directory containing the tier and parameter folders.
- legend_data_monitor.monitoring.get_traptmax_tp0est(phy_mtg_data: str, period: str, run_list: list)¶
Load and concatenate trapTmax and tp0est data from HDF files for a given period and list of runs.
- legend_data_monitor.monitoring.mhz_to_percent(mhz, avg_total_forced_mhz)¶
- legend_data_monitor.monitoring.percent_to_mhz(pct, avg_total_forced_mhz)¶
- legend_data_monitor.monitoring.plot_time_series(auto_dir_path: str, phy_mtg_data: str, output_folder: str, data_type: str, period: str, runs: list, current_run: str, det_info: dict, save_pdf: bool, escale_val: float, last_checked: float | None, partition: bool, quadratic: bool, zoom: bool)¶
Generate and save time-series plots of calibration and monitoring data for germanium detectors across multiple runs.
This function collects physics and calibration data from HDF5 monitoring files and visualizes stability over time. Channels with no pulser entries are automatically skipped. Corrections are applied to the gain if pulser data is available (‘GED corrected’), otherwise uncorrected data is plotted. The plots are saved as pickled objects for later retrieval (eg. in the online Dashboard) and optionally as PDFs:
plots saved in shelve database files under
<output_folder>/<period>/mtg/l200-<period>-phy-monitoring;if save_pdf=True, PDF copies saved under
<output_folder>/<period>/mtg/pdf/st<string>/.
- Parameters:
auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
phy_mtg_data (str) – Path to generated monitoring hdf files.
output_folder (str) – Path to output folder.
period (str) – Period to inspect.
runs (list) – Available runs to inspect for a given period.
current_run (str) – Run under inspection.
det_info (dict) – Dictionary containing detector metadata.
save_pdf (bool) – True if you want to save pdf files too; default: False.
escale_val (float) – Energy scale at which evaluating the gain differences; default: 2039 keV (76Ge Qbb).
last_checked (float | None) – Timestamp of the last check.
partition (bool) – False if not partition data; default: False.
quadratic (bool) – True if you want to plot the quadratic resolution too; default: False.
zoom (bool) – True to zoom over y axis; default: False.
- legend_data_monitor.monitoring.qc_and_evt_summary_plots(auto_dir_path: str, phy_mtg_data: str, output_folder: str, start_key: str, period: str, run: str, det_info: dict, save_pdf: bool)¶
- legend_data_monitor.monitoring.qc_average(auto_dir_path: str, output_folder: str, det_info: dict, period: str, run: str, save_pdf: bool, pars_to_inspect: list | None = None)¶
Evaluate the average rate of passing quality cuts for a given run and period across the whole array for different QC flags.
- Parameters:
auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
output_folder (str) – Path to generated monitoring hdf files.
det_info (dict) – Dictionary with channel names, IDs, and mapping to string and position.
period (str) – Period to inspect.
run (str) – Run under inspection.
save_pdf (bool) – True if you want to save pdf files too; default: False.
pars_to_inspect (list) – List of parameters (boolean flags) to inspect.
- legend_data_monitor.monitoring.qc_distributions(auto_dir_path: str, phy_mtg_data: str, output_folder: str, start_key: str, period: str, run: str, det_info: dict, save_pdf: bool)¶
- legend_data_monitor.monitoring.qc_ft_failure_rates(auto_dir_path: str, phy_mtg_data: str, output_folder: str, start_key: str, period: str, run: str, det_info: dict, save_pdf: bool)¶
- legend_data_monitor.monitoring.qc_time_series(auto_dir_path: str, output_folder: str, det_info: dict, period: str, run: str, save_pdf: bool, pars_to_inspect: list | None = None)¶
Evaluate rate over time of passing quality cuts for a given run and period across the whole array for different QC flags.
- Parameters:
auto_dir_path (str) – Path to tmp-auto public data files (eg /data2/public/prodenv/prod-blind/tmp-auto).
output_folder (str) – Path to generated monitoring hdf files.
det_info (dict) – Dictionary with channel names, IDs, and mapping to string and position.
period (str) – Period to inspect.
run (str) – Run under inspection.
save_pdf (bool) – True if you want to save pdf files too; default: False.
pars_to_inspect (list) – List of parameters (boolean flags) to inspect.
- legend_data_monitor.monitoring.read_if_key_exists(hdf_path: str, key: str) DataFrame | None¶
Read an HDF dataset if the key exists, otherwise return None; handle the case where the parameter is saved under either ‘/key’ or ‘key’.
- legend_data_monitor.monitoring.resample_series(series: Series, resampling_time: str, mask: Series)¶
Calculate mean/std for resampled time ranges to which a mask is then applied. The function already adds UTC timezones to the series.
- Parameters:
series (pd.Series) – Input time series of numerical values.
resampling_time (str) – Resampling frequency, eg ‘1h’.
mask (pd.Series) – Boolean mask aligned to the datetime index; false values mark timestamps that should be excluded, ie set to nan value.
legend_data_monitor.plot_styles module¶
- legend_data_monitor.plot_styles.par_vs_ch(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)¶
- legend_data_monitor.plot_styles.plot_heatmap(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)¶
- legend_data_monitor.plot_styles.plot_histo(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)¶
- legend_data_monitor.plot_styles.plot_par_vs_par(data_channel: DataFrame, fig: Figure, ax: Axes, plot_info: dict, color=None, map_dict=None)¶
legend_data_monitor.plotting module¶
- legend_data_monitor.plotting.align_to_keys(all_keys: list, keys: list, values: list, categorical=False)¶
Align values to a reference list of keys.
Creates an array matching all_keys and fills in values where keys match. Missing entries are filled with NaN (numeric) or None (categorical). Returns array of values aligned to all_keys.
- Parameters:
all_keys (list) – Reference list of keys defining the output order.
keys (list) – Keys corresponding to provided values.
values (list) – Values to align.
categorical (bool, optional) – If True, output array is object dtype with None for missing values. Otherwise (default), uses float dtype with NaN for missing values.
- legend_data_monitor.plotting.apply_cal_to_following_run(mu_vals: ndarray, cal_vals: ndarray)¶
Apply calibration parameters from each run to the following run’s ADC values.
Returns a list of calibrated peak positions in keV for each following run.
Assumes mu_vals and cal_vals have the same length. If mu_vals and cal_vals do not have the same length an error is raised. The function shifts the arrays so that each calibration is applied to the subsequent run: - drops the first element of mu_vals - drops the last element of cal_vals
Each ADC value is converted to keV using a polynomial calibration.
- Parameters:
mu_vals (np.ndarray) – Sequence of ADC peak positions (one per run).
cal_vals (np.ndarray) – Sequence of calibration polynomial coefficients (one per run).
- legend_data_monitor.plotting.filter_period(keys: list, vals: list, *periods)¶
Filter key-value pairs by matching key prefixes (e.g. ‘p18’); only entries where the key starts with any of the provided period prefixes (e.g. ‘p18’, ‘p19’) are retained.
Returns filtered (keys, values), otherwise empty lists if no matches are found.
- legend_data_monitor.plotting.get_fwhm_for_fixed_ch(data_channel: DataFrame, parameter: str) float¶
Calculate the FWHM of a given parameter for a given channel.
- Return type:
- legend_data_monitor.plotting.make_subsystem_plots(subsystem: Subsystem, plots: dict, dataset_info: dict, plt_path: str, saving=None)¶
- legend_data_monitor.plotting.plot_all_detector_info(det_name: str, det_info: dict, partitions_params: dict, detector_status: dict, period: str, current_run: str, output_folder: str, save_pdf=False, exclude_period=None)¶
Generate a comprehensive multi-panel summary plot of detector performance.
Produces a grid of subplots showing key quantities such as: - Slow control voltage - Energy resolution (FWHM) - Peak positions and residuals - Baseline properties - Pulse shape parameters - Calibration stability metrics
Internally extracts, aligns, and plots multiple variables using plot_variable.
- Parameters:
det_name (str) – Detector identifier.
partitions_params (dict) – Dictionary containing per-detector analysis results and calibration data.
detector_status (dict) – Dictionary with detector usability and slow control information.
period (str) – Period to inspect.
current_run (str) – Run to inspect.
output_folder (str) – Output folder where to save plots.
save_pdf (bool, optional) – True if you want to save pdf files too; default: False.
exclude_period (list of str, optional) – Period prefixes to exclude from plotting.
- legend_data_monitor.plotting.plot_det_status(det_name: str, ax: Axes, detector_status: dict, keys: list)¶
Overlay detector usability status as shaded regions on a plot: ‘ac’ (‘off’) grey (red) shaded region.
- legend_data_monitor.plotting.plot_limits(ax: Axes, params: list, limits: list | dict)¶
Plot limits (if present) on the plot. The multi-params case is carefully handled.
- legend_data_monitor.plotting.plot_per_barrel_and_position(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)¶
- legend_data_monitor.plotting.plot_per_cc4(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)¶
- legend_data_monitor.plotting.plot_per_fiber_and_barrel(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)¶
- legend_data_monitor.plotting.plot_per_string(data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)¶
- legend_data_monitor.plotting.plot_variable(det_name: str, ax: Axes, all_keys: ndarray, keys: list, vals: list, det_status: dict, periods: list | str, current_run: str, errs=None, title='', units='keV', alpha=1, fixed_thr=None, err_thr=None, plot_det_stat=False, plot_mean=True, exclude_period=None, ylabel=None)¶
Plot a detector variable over runs, grouped by data-taking periods.
Data are aligned to all_keys, split by period prefixes (e.g., ‘p16’), and plotted with optional error bands and threshold lines. Mean values are computed per period using only runs where the detector usability is ‘on’.
- Parameters:
det_name (str) – Detector identifier.
ax (Axes) – Axis to plot on.
all_keys (np.ndarray) – Master list of run keys defining x-axis.
keys (list) – Keys corresponding to vals.
vals (list) – Values to plot.
det_status (dict) – Detector status dictionary containing usability information.
current_run (str) – Run to inspect.
errs (sequence, optional) – Uncertainties corresponding to vals.
title (str, optional) – Plot title.
units (str, optional) – Units for y-axis label.
alpha (float, optional) – Transparency for plotted data.
fixed_thr (float, optional) – Fixed threshold to draw around the mean.
err_thr (float, optional) – Multiplier for mean error-based thresholds.
plot_det_stat (bool, optional) – If True, overlays detector status shading.
plot_mean (bool, optional) – If True, plots mean lines per period.
exclude_period (list of str, optional) – Period prefixes to exclude.
ylabel (str, optional) – Custom y-axis label (overrides default).
legend_data_monitor.run module¶
- legend_data_monitor.run.add_auto_prod_parser(subparsers)¶
Configure
core.auto_control_plots()command line interface.
- legend_data_monitor.run.add_auto_run_parser(subparsers)¶
Configure
core.auto_run()command line interface.
- legend_data_monitor.run.add_get_exposure(subparsers)¶
Configure
core.retrieve_exposure()command line interface.
- legend_data_monitor.run.add_get_runinfo(subparsers)¶
Configure
core.build_runinfo()command line interface.
- legend_data_monitor.run.add_user_bunch_parser(subparsers)¶
Configure
core.control_plots()command line interface.
- legend_data_monitor.run.add_user_config_parser(subparsers)¶
Configure
core.control_plots()command line interface.
- legend_data_monitor.run.add_user_rsync_parser(subparsers)¶
Configure
core.auto_control_plots()command line interface.
- legend_data_monitor.run.add_user_scdb(subparsers)¶
Configure
core.control_plots()command line interface.
- legend_data_monitor.run.auto_prod_cli(args)¶
Pass command line arguments to
core.auto_control_plots().
- legend_data_monitor.run.auto_run_cli(args)¶
Pass command line arguments to
core.auto_run().
- legend_data_monitor.run.get_exposure_cli(args)¶
Pass command line arguments to
core.retrieve_exposure().
- legend_data_monitor.run.get_runinfo_cli(args)¶
Pass command line arguments to
core.build_runinfo().
- legend_data_monitor.run.main()¶
legend-data-monitor’s starting point.
Here you define the path to the YAML configuration file you want to use when generating the plots. To learn more, have a look at the help section:
- legend_data_monitor.run.user_bunch_cli(args)¶
Pass command line arguments to
core.control_plots().
- legend_data_monitor.run.user_config_cli(args)¶
Pass command line arguments to
core.control_plots().
- legend_data_monitor.run.user_rsync_cli(args)¶
Pass command line arguments to
core.auto_control_plots().
- legend_data_monitor.run.user_scdb_cli(args)¶
Pass command line arguments to
core.retrieve_scdb().
legend_data_monitor.save_data module¶
- legend_data_monitor.save_data.append_new_data(param: str, plot_settings: dict, plot_info: dict, old_dict: dict, par_dict_content: dict, plt_path: str) dict¶
- Return type:
- legend_data_monitor.save_data.build_dict(plot_settings: list, plot_info: list, par_dict_content: dict, out_dict: dict) dict¶
Create a dictionary with the correct format for being saved in the final shelve object.
- Return type:
- legend_data_monitor.save_data.build_out_dict(plot_settings: list, par_dict_content: dict, out_dict: dict)¶
Build the output dictionary based on the input ‘saving’ option.
- Parameters:
plot_settings (list) – Dictionary with settings for plotting. It contains the following keys: ‘parameters’, ‘event_type’, ‘plot_structure’, ‘resampled’, ‘plot_style’, ‘variation’, ‘time_window’, ‘range’, ‘saving’, ‘plt_path’
par_dict_content (dict) – Dictionary containing, for a given parameter, the dataframe with data and a dictionary with info for plotting (e.g. plot style, title, units, labels, …)
out_dict (dict) – Dictionary that is returned, containing the objects that need to be saved.
- legend_data_monitor.save_data.check_existence_and_overwrite(file: str)¶
Check for the existence of a file, and if it exists removes it.
- legend_data_monitor.save_data.check_level0(dataframe: DataFrame) DataFrame¶
Check if a dataframe contains the ‘level_0’ column. If so, remove it.
- Return type:
DataFrame
- legend_data_monitor.save_data.get_param_df(parameter: str, df: DataFrame) DataFrame¶
Subselect from ‘df’ only the dataframe columns that refer to a given parameter. The case of ‘parameter’ being a special parameter is carefully handled.
- Return type:
DataFrame
- legend_data_monitor.save_data.get_param_info(param: str, plot_info: dict) dict¶
Subselect from ‘plot_info’ the plotting info for the specified parameter
`param`. This is needed for the multi-parameters case.- Return type:
- legend_data_monitor.save_data.get_pivot(df: DataFrame, parameter: str, key_name: str, file_path: str, saving: str)¶
Get pivot: datetimes (first column) vs channels (other columns).
- legend_data_monitor.save_data.save_df_and_info(df: DataFrame, plot_info: dict) dict¶
Return a dictionary containing a dataframe for the parameter(s) under study for a given subsystem. The plotting info are saved too.
- Return type:
- legend_data_monitor.save_data.save_hdf(saving: str, file_path: str, df, aux_ch: str, aux_analysis, aux_ratio_analysis, aux_diff_analysis, plot_info: dict) dict¶
Save the input dataframe in an external hdf file, using a different structure (time vs channel, with values in cells). Plot info are saved too.
- Return type:
legend_data_monitor.slow_control module¶
- class legend_data_monitor.slow_control.SlowControl(parameter: str, port: int, pswd: str, **kwargs)¶
Bases:
objectObject containing Slow Control database information for a data subselected based on given criteria.
parameter [str] : diode_vmon | diode_imon | PT114 | PT115 | PT118 | PT202 | PT205 | PT208 | LT01 | RREiT | RRNTe | RRSTe | ZUL_T_RR | DaqLeft-Temp1 | DaqLeft-Temp2 | DaqRight-Temp1 | DaqRight-Temp2
Options for kwargs
- dataset=
- dict with the following keys:
‘experiment’ [str]: ‘L60’ or ‘L200’
‘period’ [str]: period format pXX
‘path’ [str]: path to prod-ref folder (before version)
‘version’ [str]: version of pygama data processing format vXX.XX
‘type’ [str]: ‘phy’ or ‘cal’
- the following key(s) depending in time selection
‘start’ : <start datetime>, ‘end’: <end datetime> where <datetime> input is of format ‘YYYY-MM-DD hh:mm:ss’
2. ‘window’[str]: time window in the past from current time point, format: ‘Xd Xh Xm’ for days, hours, minutes 2. ‘timestamps’: str or list of str in format ‘YYYYMMDDThhmmssZ’ 3. ‘runs’: int or list of ints for run number(s) e.g. 10 for r010
Or input kwargs separately experiment=, period=, path=, version=, type=; start=&end=, (or window= - ???), or timestamps=, or runs=
- get_sc_param()¶
Load the corresponding table from SC database for the process of interest and apply already the flags for the parameter under study.
- legend_data_monitor.slow_control.apply_flags(df: DataFrame, sc_parameters: dict, flags_param: list) DataFrame¶
Apply the flags read from ‘settings/SC-params.yaml’ to the input dataframe.
- Return type:
DataFrame
- legend_data_monitor.slow_control.get_plotting_info(parameter: str, sc_parameters: dict, first_tstmp: str, last_tstmp: str, scdb: LegendSlowControlDB) Tuple[str, float, float]¶
Return units and low/high limits of a given parameter.
- legend_data_monitor.slow_control.include_more_diode_info(df: DataFrame, scdb: LegendSlowControlDB) DataFrame¶
Include more diode info, such as the channel name and the string number to which it belongs.
- Return type:
DataFrame
legend_data_monitor.string_visualization module¶
- legend_data_monitor.string_visualization.exposure_plot(subsystem, data_analysis: DataFrame, plot_info: dict, pdf: PdfPages)¶
legend_data_monitor.subsystem module¶
- class legend_data_monitor.subsystem.Subsystem(sub_type: str, **kwargs)¶
Bases:
objectObject containing information for a given subsystem such as channel map, channels status etc.
sub_type [str]: geds | spms | pulser | pulser01ana | FCbsln | muon
Options for kwargs
- dataset=
- dict with the following keys:
‘experiment’ [str]: ‘L60’ or ‘L200’
‘period’ [str]: period format pXX
‘path’ [str]: path to prod-ref folder (before version)
‘version’ [str]: version of pygama data processing format vXX.XX
‘type’ [str]: ‘phy’ or ‘cal’
- the following key(s) depending in time selection
‘start’ : <start datetime>, ‘end’: <end datetime> where <datetime> input is of format ‘YYYY-MM-DD hh:mm:ss’
2. ‘window’ [str]: time window in the past from current time point, format: ‘Xd Xh Xm’ for days, hours, minutes 2. ‘timestamps’: str or list of str in format ‘YYYYMMDDThhmmssZ’ 3. ‘runs’: int or list of ints for run number(s) e.g. 10 for r010
Or input kwargs separately experiment=, period=, path=, version=, type=; start=&end=, or window=, or timestamps=, or runs=
Experiment is needed to know which channel belongs to the pulser Subsystem (and its name), “auxs” ch0 (L60) or “puls” ch1 (L200) Period is needed to know channel name (“fcid” or “rawid”) Selection range is needed for the channel map and status information at that time point, and should be the only information needed,
however, pylegendmeta only allows query .on(timestamp=…) but not .on(run=…); therefore, to be able to get info in case of runs selection, we need to know path, version, and run type to look up first timestamp of the run. If this changes in the future, the path will only be asked when data is requested to be loaded with Subsystem.get_data(), but not to just load the channel map and status for given run
Might set default “latest” for version, but gotta be careful.
- construct_dataloader_configs(param_tiers, params: list[str], tier_key: str)¶
Construct DL and DB configs for DataLoader based on parameters and which tiers they belong to.
params: list of parameters to load
- flag_fcbsln_events(fc_bsln=None)¶
Flag FC baseline events, keeping the ones that are in correspondence with a pulser event too. If a FC baseline object was provided, flag FC baseline events in data based on its flag.
- flag_fcbsln_only_events(fc_bsln=None)¶
Flag FC baseline events. If a FC baseline object was provided, flag FC baseline events in data based on its flag.
- flag_muon_events(muon=None)¶
Flag muon events. If a muon object was provided, flag muon events in data based on its flag.
- flag_pulser_events(pulser=None)¶
Flag pulser events. If a pulser object was provided, flag pulser events in data based on its flag.
- get_channel_map()¶
Build channel map for given subsystem with info like name, position, cc4, HV, DAQ, detector type, … for each channel.
setup_info: dict with the keys ‘experiment’ and ‘period’
Later will probably be changed to get channel map by run, if possible Planning to add:
barrel column for SiPMs special case
- get_channel_status()¶
Add status column to channel map with on/off for software status.
setup_info: dict with the keys ‘experiment’ and ‘period’
Later will probably be changed to get channel status by timestamp (or hopefully run, if possible)
- get_data(parameters: str | list[str] | tuple[str] = ())¶
Get data for requested parameters from DataLoader and “prime” it to be ready for analysis.
- parameters: single parameter or list of parameters to load.
If empty, only default parameters will be loaded (channel, timestamp; baseline and wfmax for pulser)
- get_parameters_for_dataloader(parameters: str | list[str])¶
Construct list of parameters to query from the DataLoader.
parameters that are always loaded (+ pulser special case)
parameters that are already in lh5
parameters needed for calculation, if special parameter(s) asked (e.g. wf_max_rel)
legend_data_monitor.utils module¶
- legend_data_monitor.utils.add_config_entries(config: dict, file_keys: str, prod_path: str, prod_config: dict) dict¶
Add missing information (output, dataset) to the configuration file. This function is generally used during automathic data production, where the initiali config file has only the ‘subsystem’ entry.
- Return type:
- legend_data_monitor.utils.build_detector_info(metadata_path, start_key=None)¶
Build detector information from LEGEND metadata.
- Parameters:
metadata_path (str) – Path to the metadata file.
start_key (optional) – Starting key for channelmap selection.
- Returns:
Dictionary with two main entries: - “detectors”: mapping from detector name to different infos
daq_rawid : int
channel_str : str (e.g. “ch1234”)
string : int
position : int
processable : bool
usability : str
mass_in_kg : int
”str_chns”: mapping from string to a list of detector names
- Return type:
- legend_data_monitor.utils.build_detector_info_per_period(auto_dir_path: str, run_dict: dict, period: str)¶
- legend_data_monitor.utils.build_file_map(base_path: str, tier: str = 'hit') dict¶
Build mapping from (period, run) to calibration file paths.
Returns (period, run) -> file path mapping.
- legend_data_monitor.utils.build_runinfo(path: str, version: str, proc_folder: str, output: str | None)¶
Build dictionary with main run information (start key, phy livetime in seconds) for multiple data types (phy, cal, fft, bkg, pzc, pul, …).
- legend_data_monitor.utils.bunch_dataset(config: dict, n_files=None)¶
Bunch the full datasets into smaller pieces, based on the number of files we want to inspect at each iteration.
It works for “start+end”, “runs” and “timestamps” in “dataset” present in the config file.
- legend_data_monitor.utils.check_cal_phy_thresholds(output_folder: str, period: str, run: str, key: str, detectors: list, pswd_email: str | None)¶
Check detector calibration/physics thresholds for a given run and optionally send an alert mail.
- Parameters:
output_folder (str) – Path to output folder where the output summary YAML and plots will be stored.
period (str) – Period to inspect.
run (str) – Run to inspect.
key (str) – Data type key to inspect, either ‘cal’ or ‘phy’.
detectors (list) – List of detector names.
pswd_email (str or None) – Password for the email account used to send alerts; if None, no email is sent.
- legend_data_monitor.utils.check_empty_df(df) bool¶
Check if df (DataFrame | analysis_data.AnalysisData) exists and is not empty.
- Return type:
- legend_data_monitor.utils.check_key_existence(hdf_path: str, key_to_load: str) bool¶
Check if a specific key exists in the specified hdf file path.
- Return type:
- legend_data_monitor.utils.check_scdb_settings(conf: dict) bool¶
Validate the ‘slow_control’ entry in the config dictionary by checking if it contains a ‘slow_control’ section with a ‘parameters’ key. It ensures that the ‘parameters’ value is either a string or a list of strings. Always returns True if the configuration passes all checks. Exits the program otherwise.
Examples
>>> conf = { ... 'slow_control': { ... 'parameters': ['RREiT', 'ZUL_T_RR'] ... } ... } >>> check_scdb_settings(conf) True
- legend_data_monitor.utils.check_threshold(data_series: Series, channel_name: str, last_checked: float | None | str, t0: list, threshold: list, parameter: str, output: dict)¶
Check if a given parameter is over threshold and update the email message list.
- Parameters:
data_series (pd.Series) – Series of gain differences indexed by timestamp.
last_checked (float) – Timestamp (in seconds since epoch) of last check.
t0 (list of pd.Timestamp) – List of start times for time windows.
threshold (list) – Threshold (int or float).
channel_name (str) – Name of the channel.
parameter (str) – Parameter name under inspection.
output (dict) – Dictionary containing summary cal and phy info.
- legend_data_monitor.utils.convert_to_camel_case(string: str, char: str) str¶
Remove a character from a string and capitalize all initial letters.
- Return type:
- legend_data_monitor.utils.dataset_validity_check(data_info: dict)¶
Check the validity of the input dictionary and if it contains all required fields and keys to existing paths.
This function is typically used in Subsystem and SlowControl classes to ensure that all necessary metadata for accessing data is present and correct. The function also checks that the provided path and the combined path/version exist on the filesystem.
- Parameters:
data_info (dict) –
Dictionary containing dataset metadata. Required keys:
- ’experiment’str
Name of the experiment.
- ’type’str
Type of dataset.
- ’period’str
Period to inspect.
- ’path’str
Path to the base dataset directory.
- ’version’str
Processing version. Can be empty string if not needed.
Examples
>>> dataset_info = { ... 'experiment': 'L200', ... 'period': 'p03', ... 'type': 'phy', ... 'path': '/global/cfs/cdirs/m2676/data/lngs/l200/public/prodenv/prod-blind/', ... 'version': 'tmp-auto', ... // ... additional time selection keys ... } >>> dataset_validity_check(dataset_info) # No output if all checks pass; errors otherwise
- legend_data_monitor.utils.deep_get(d, keys, default=None, verbose=False)¶
- legend_data_monitor.utils.find_over_threshold(data_series: Series, last_checked: float | None | str, t0: list, threshold: list) bool¶
Return timestamps where values exceed the given thresholds.
- Parameters:
data_series (pd.Series) – Series of values indexed by datetime.
last_checked (float | None | str) – Epoch time (seconds) of the last check; if None/”None”, no cutoff is applied.
t0 (list of pd.Timestamp) – Start times where the first entry defines the window start.
threshold (list) – Threshold bounds; either can be None.
- Return type:
- legend_data_monitor.utils.get_all_plot_parameters(subsystem: str, config: dict)¶
Get list of all parameters needed for all plots for given subsystem.
- legend_data_monitor.utils.get_last_timestamp(fname: str) str¶
Read a lh5 file and return the last timestamp saved in the file. This works only in case of a global trigger where the whole array is entirely recorded for a given timestamp.
- Return type:
- legend_data_monitor.utils.get_livetime(tot_livetime: float)¶
Get the livetime in a human readable format, starting from livetime in seconds.
- Parameters:
tot_livetime (float) –
If tot_livetime is more than 0.1 yr, convert it to years.
If tot_livetime is less than 0.1 yr but more than 1 day, convert it to days.
If tot_livetime is less than 1 day but more than 1 hour, convert it to hours.
If tot_livetime is less than 1 hour but more than 1 minute, convert it to minutes.
- legend_data_monitor.utils.get_map_dict(data_analysis: DataFrame)¶
Map string location and geds position for plotting values vs chs.
- Parameters:
data_analysis (DataFrame) – DataFrame with geds data information, in particular ‘location’ and ‘position’
- legend_data_monitor.utils.get_output_path(config: dict)¶
Get output path provided a ‘dataset’ from the config file. The path will be used to save and store pdfs/hdf/etc files.
- legend_data_monitor.utils.get_output_plot_path(plt_path: str, extension: str) str¶
Given a path to the plt directory, generate a corresponding output path in the tmp/mtg/ directory.
- legend_data_monitor.utils.get_query_timerange(**kwargs)¶
Get DataLoader compatible time range.
The function accepts either a dataset dictionary or keyword arguments. Only one type of time selection should be provided at a time. Designed in such a way to accommodate Subsystem init kwargs.
- Parameters:
dataset (dict, optional) –
- Dictionary specifying the time selection. Choose one of the following (or enter kwargs separately):
- ’start’str, ‘end’str
Start and end datetime in the format ‘YYYY-MM-DD hh:mm:ss’.
- ’window’str
Time window relative to the current time, formatted as ‘Xd Xh Xm’ for days, hours, and minutes.
- ’timestamps’str or list of str
Specific timestamps in ‘YYYYMMDDThhmmssZ’ format.
- ’runs’int or list of ints
Run number(s), e.g., 10 corresponds to ‘r010’
Examples
>>> get_query_timerange(start='2022-09-28 08:00:00', end='2022-09-28 09:30:00') {'timestamp': {'start': '20220928T080000Z', 'end': '20220928T093000Z'}}
>>> get_query_timerange(window='1d 5h 0m') {'timestamp': {'end': '20230220T114337Z', 'start': '20230219T064337Z'}}
>>> get_query_timerange(timestamps=['20220928T080000Z', '20220928093000Z']) {'timestamp': ['20220928T080000Z', '20220928093000Z']} >>> get_query_timerange(timestamps='20220928T080000Z') {'timestamp': ['20220928T080000Z']}
>> get_query_timerange(runs=[9,10]) {‘run’: [‘r009’, ‘r010’]} >>> get_query_timerange(runs=10) {‘run’: [‘r010’]}
>>> get_query_timerange(dataset={'start': '2022-09-28 08:00:00', 'end':'2022-09-28 09:30:00'}) {'timestamp': {'start': '20220928T080000Z', 'end': '20220928T093000Z'}}
- legend_data_monitor.utils.get_query_times(**kwargs)¶
Get time ranges for DataLoader query from user input, as well as first/last timestamp for channel map / status / SC query.
- Parameters:
dataset (dict, optional) –
Dictionary with the following keys (note: can provide the same keys as in dataset but separately, i.e. path=…, version=…, type=…, and one of start=…&end=…, window=…, timestamps=…, or runs=…):
- ’path’str
Base path to the dataset.
- ’version’str
Dataset version.
- ’type’str
Type of dataset. Note: multiple types are not currently supported.
Time selection keys (choose one):
- ’start’str, ‘end’str
Start and end datetime in the format ‘YYYY-MM-DD hh:mm:ss’.
- ’window’str
Time window from the current time, e.g., ‘1d 2h 30m’ for 1 day, 2 hours, 30 minutes.
- ’timestamps’str or list of str
Timestamps in the format ‘YYYYMMDDThhmmssZ’.
- ’runs’int or list of ints
Run number(s), e.g., 10 corresponds to run ‘r010’.
Notes
path, version, and type are required because channel map and status cannot be retrieved by run directly. These are used to determine the first timestamp available in the data path.
Designed in such a way to accommodate Subsystem init kwargs.
Examples
>>> get_query_times(..., start='2022-09-28 08:00:00', end='2022-09-28 09:30:00') {'timestamp': {'start': '20220928T080000Z', 'end': '20220928T093000Z'}}, '20220928T080000Z'
>> get_query_times(…, runs=27) ({‘run’: [‘r027’]}, ‘20220928T091135Z’)
- legend_data_monitor.utils.get_run_name(config: dict, user_time_range: dict) str¶
Get the run ID given start/end timestamps. If the timestamps run over multiple run IDs, a list of runs is retrieved, out of which only the first element is returned.
- Return type:
- legend_data_monitor.utils.get_start_key(auto_dir_path: str, data_type: str, period: str, current_run: str)¶
- legend_data_monitor.utils.get_status_map(path: str, version: str, first_timestamp: str, datatype: str)¶
Return the correct status map, either reading a .json or .yaml file.
- legend_data_monitor.utils.get_tiers_pars_folders(path: str)¶
Get the absolute path to different tier and par folders.
- Parameters:
path (str) – Absolute path to the processed data for a specific version, eg path=’/global/cfs/cdirs/m2676/data/lngs/l200/public/prodenv/prod-blind/ref-v2.1.5/’.
- legend_data_monitor.utils.get_time_name(user_time_range: dict) str¶
Get a name for each available time selection.
- Parameters:
user_time_range (dict) – Careful handling of folder name depending on the selected time range
- Return type:
Examples
>>> get_time_name({'timestamp': {'start': '20220928T080000Z', 'end': '20220928T093000Z'}}) 20220928T080000Z_20220928T093000Z
>>> get_time_name({'timestamp': ['20230207T103123Z']}) 20230207T103123Z
>>> get_time_name({'timestamp': ['20230207T103123Z', '20230207T141123Z', '20230207T083323Z']}) 20230207T083323Z_20230207T141123Z
>>> get_time_name({'run': ['r010']}) r010
>>> get_time_name({'run': ['r010', 'r014']}) r010_r014
- legend_data_monitor.utils.get_timestamp(filename: str)¶
Get the timestamp from a filename. For instance, if file=’l200-p04-r000-phy-20230421T055556Z-tier_dsp.lh5’, then it returns ‘20230421T055556Z’.
- legend_data_monitor.utils.get_timestamp_from_path(path)¶
- legend_data_monitor.utils.get_valid_path(base_path)¶
- legend_data_monitor.utils.is_bad(t, intervals)¶
- legend_data_monitor.utils.load_and_filter(store, key: str, mask=None)¶
Load a given key from a HDF file and applies a mask.
- legend_data_monitor.utils.load_config(config_file: dict | str)¶
Load a configuration from a dictionary, JSON string, or YAML file.
This function supports three input types:
A dictionary, which is returned as-is.
A JSON string, which is parsed into a dictionary.
A path to a YAML (.yaml/.yml) file, which is read and parsed.
- legend_data_monitor.utils.load_tier_config(path: str, version: str, tier_name: str)¶
Load tier configuration (YAML or JSON) for the given tier name, and search through possible directory structures and file patterns.
- legend_data_monitor.utils.load_yaml_or_default(path: str, detectors: dict) dict¶
Load YAML if it exists, else return a default dict.
- Return type:
- legend_data_monitor.utils.make_dir(dir_path)¶
Check if directory exists, and if not, make it.
- legend_data_monitor.utils.make_output_paths(config: dict, user_time_range: dict) str¶
Get a dict and return a dict. The function defines output paths and create directories accordingly.
To use when you want a specific output structure of the following type: […]/prod-ref/{version}/generated/plt/hit/phy/{period}/{run} This does not work if you select more types (eg. both cal and phy) or timestamp intervals (but just runs). It can be used for run summary plots, eg during stable data taking. Note that monitoring plots are stored under the ‘hit’ subfolder to replicate the structure of the main prodenv.
- Return type:
- legend_data_monitor.utils.none_to_nan(data: list)¶
Convert None elements into nan values for an input list.
- legend_data_monitor.utils.pulser_from_evt_or_mtg(my_dir, period, run, output, run_info)¶
Try to load EVT tier; if not found, attempt to update run info from monitoring path.
- legend_data_monitor.utils.read_json_or_yaml(file_path: str)¶
Open either a JSON/YAML file, if not raise an error and exit.
- Parameters:
file_path (str) – Path to the JSON/YAML file to read.
- legend_data_monitor.utils.retrieve_json_or_yaml(base_path: str, filename: str)¶
Return either a yaml or a json file for the specified file looking at the existing available extension.
- legend_data_monitor.utils.send_email_alert(app_password: str, recipients: list, text_file_path: str)¶
Send automatic emails with alert messages.
- Parameters:
app_password (str) – String password to send mails from legend.data.monitoring@gmail.com
recipients (list) – List of email addresses to send the alert emails
text_file_path (str) – String path to the .txt file containing the message to send via email
- legend_data_monitor.utils.unix_timestamp_to_string(unix_timestamp)¶
Convert a Unix timestamp to a string in the format ‘YYYYMMDDTHHMMSSZ’ with the timezone indicating UTC+00.
- legend_data_monitor.utils.update_evaluation_in_memory(data: dict, det_name: str, data_type: str, key: str, value: bool | float)¶
Update the key entry in memory dict, where value can be bool or nan if not available; data_type is either ‘cal’ or ‘phy’.
- legend_data_monitor.utils.update_runinfo(run_info: dict, period: str, run: str, data_type: str, mtg_files_path: str)¶
Update run information dict, with livetime in seconds for phy data; it automatically removes cycles that are flagged as unusable via keys stored in settings/ignore-keys.yaml.
- Parameters:
run_info (dict) – Dictionary containing metadata for runs, separated by period, run, and data type (cal, phy, …).
period (str) – Period under inspection.
run (str) – Run under inspection.
data_type (str) – Data type to process (cal, phy, …).
mtg_files_path (str) – Path where the monitoring HDF5 files were stored for a specific period and run.