Histogramming modules
Analysis Objects
This module defines the API with the so-called Analysis objects. These are objects
that an analyser would interact with when doing their analysis.
They are observables, regions, systematics. The user interacts
with pythium.histogramming.objects.Observable , pythium.histogramming.objects.Region and
pythium.histogramming.objects.Systematic sub-classes through the configuration file.
- class pythium.histogramming.objects.Observable(var, name, binning, dataset, weights=1.0, label='', samples=None, exclude_samples=None, regions=None, exclude_regions=None, *, obs_build=None)
This class defines an observable that will be retrieved from all samples entering a given region, and constructed with the given binning.
- binning
The chosen binning for this observable
- dataset
The equivalent of a TTree in ROOT files. It is the parent group for the observable in the input file (e.g. nominal tree in a ROOT file)
- Type
- obs_build
An instance of
pythium.common.functor.Functorwhich defines how to build the observable from existing data- Type
pythium.common.functor.Functor
- __eq__(other)
Return self==value.
- __hash__()
Return hash(self).
- __init__(var, name, binning, dataset, weights=1.0, label='', samples=None, exclude_samples=None, regions=None, exclude_regions=None, *, obs_build=None)
- __weakref__
list of weak references to the object (if defined)
- classmethod fromFunc(var, func, args, *obs_args, **obs_kwargs)
Alternative “constructor” for
pythium.histogramming.objects.Observableclass which takes a function and function args instrad of var to compute a new observable from existing data- Parameters
var (
str) – The name to be given to the new observablefunc (
Union[Callable,List[Callable]]) – The function that defines how the variable should be computedargs (
Union[List[Union[str,int,float,Dict]],List[List[Union[str,int,float,Dict]]]]) – The argument to be passed to func to compute the observable
- Return type
TypeVar(TObservable, bound= Observable)- Returns
pythium.histogramming.objects.Observableclass instance with anpythium.histogramming.objects.ObservableBuilder
- classmethod fromStr(name, string_op, *obs_args, **obs_kwargs)
Alternative “constructor” for
pythium.histogramming.objects.Observableclass which takes a function and function args instrad of var to compute a new observable from existing data- Parameters
name (
str) – The name to be given to the new observablestring – The string that should be parsed to compute new observable
- Return type
TypeVar(TObservable, bound= Observable)- Returns
pythium.histogramming.objects.Observableclass instance with anpythium.histogramming.objects._ObservableBuilder
- class pythium.histogramming.objects.Region(name, selection, title=None, samples=None, exclude_samples=None, observables=None, exclude_observables=None, **kwargs)
This class defines a phase-space region object that the user will need
- name
The name given to the region
- selection
The
pythium.common.selection.Selectioninstance to be evaluated for all samples that enter this region
- samples
The list of
pythium.common.samples.Sampleinstances that should be included in this region
- exclude
The list of
pythium.common.samples.Sampleinstances that should be excluded from this region
- __init__(name, selection, title=None, samples=None, exclude_samples=None, observables=None, exclude_observables=None, **kwargs)
- __weakref__
list of weak references to the object (if defined)
Histogramming Utils
- class pythium.histogramming.binning.RegBin(low, high, nbins, axis=None)
Inherits from
pythium.histogramming.objects._Binningbut constructs uniform binning between given limits- low
The lower edge of histogram
- high
The higher edge of histogram
- nbins
The number of bins to build within the low and high
- __init__(low, high, nbins, axis=None)
- class pythium.histogramming.binning.VarBin(binning, axis=None)
Inherits from
pythium.histogramming.objects._Binning- __init__(binning, axis=None)
- class pythium.histogramming.binning._Binning(binning, axis=None)
The parent binning class
- binning
The np.array that defines the bin edges
- axis
The number of the axis defined by this binning. 0: x-axis, 1: y-axis.
- __init__(binning, axis=None)
- __weakref__
list of weak references to the object (if defined)
Configuration
Processing
This is where that steers that handles that booking-keeping for histogramming tasks
- class pythium.histogramming.processor.Processor(config, scheduler)
Class which sorts through the user configuration and makes transactions with the managers in order to run a histogramming chain
- __weakref__
list of weak references to the object (if defined)
- create()
High-level function through which the user can start the constrction of the histogramming task-graph. No computation happens at this point
- cross_product(samples, regions, systematics, observables)
Make all cross-products from the user-provided configurations, skipping cross-products that aren’t needed.
- run()
Method to compute the histograms
- save(hists_dict)
Method to save the histograms into output pickle files :param hists_dict: Mapping of observable -> sample_region_syst_template -> histogram :type hists_dict: Dict
Task Managing
This is where manage different parts of the histogramming stage pythium.histogramming.managers._InputManager
- class pythium.histogramming.managers._InputManager(xps, cfg)
Class responisble for preparing information about the input files and what we want from them
- __init__(xps, cfg)
_InputManager constructor
- Parameters
xps (List[CrossProduct]) – A list of the cross-products to be evaluated
cfg (dict) – The histogramming configuration dictionary
- __weakref__
list of weak references to the object (if defined)
- required_paths()
Method to summarise all the input files that need to be opened.
Goal is to have the task manager open each file only once and get what’s needed from it. The paths are constructed for each XP assuming Pythium naming system, where a file is defiend by a sample + dataset.
Paths are gather from (In case of Pythium-like input):
Paths to the nominal file needed for an observable
Path to an alternative sample needed for an NTup systematic
Path to an alternative tree needed for a Tree systematic
TODO:: Support Custom inputs
- required_variables()
Method to determine which variable columns need to be retrieved from input files. Required variables are gathered from:
Variables directly requested with the Observable(‘variable’,’name’) API
Variables required to compute new Observable s (either passed as args or inferred from a string)
Weights column
Variables required to apply a region seleciton
Variables required to compute a weight variation
These variables are encoded in Functor instances for each Observable, Selection and WeightSyst instances, as the attribute req_vars
- Returns
Mapping from CrossProduct instances to list of required variables passed as Observable instances
- class pythium.histogramming.managers._TaskManager(method, sample_sel)
Class responsible for building a task-graph from a variety of operations on the input data. In order the manager will build the following workflow into a graph:
Retrieve data from inputs
Create new variables needed
Loop through cross-products
Apply event cuts (on all columns)
Retrieve the relevant observable column
Retrieve the relevant weights (can be columns/floats)
Make and fill histogram with observable and weights
- __init__(method, sample_sel)
Constructor for _TaskManager .. attribute:: method
Method name to-be-used for reading input
- type
str
- __weakref__
list of weak references to the object (if defined)
- _apply_cut
Apply event selection onto all columns. Object-wise selection should be appplied in the form of masks.
- Parameters
data (ak.Array) – Data columns retrieved from input path
xps (List[CrossProduct]) – The compute-from-file-first ordered list of XPs for a given path
- Returns
Awkward array with event selection applied to columns
- _build_tree(xp_paths_map, xp_vars_map)
Method to build an optimized task graph of the entire histogramming chain.
- Parameters
xp_paths_map – Mapping from XP to paths needed
xp_vars_map – Mapping from XP to variables needed
- Returns
List of dask tasks to be executed Corresponding list of XPs (matches the list of jobs)
- _create_variables
Compute and add new columns to the data if needed
- Parameters
data (ak.Array) – Data columns retrieved from input path
xps (List[CrossProduct]) – The compute-from-file-first ordered list of XPs for a given path
- Returns
Awkward array with new columns added
- _get_data
Call the method to retrieve data from one input file
- Parameters
inpath (str) – Path to file which should be opened
observables (List[Observable]) – List of Observable instances of variables to be retrieved from input file for the given path
- Returns
An awkward array of data columns retrieved from input path
- _get_var
Method to retrieve a column from data.
- Parameters
data (ak.Array) – Data columns retrieved from input path
xp (CrossProduct) – The XP being computed
- Returns
Awkward array with the relevant column’s data
- _get_weights
Method to compute event weights from different sources. For example, if weights are given to an observable, as well as to a WeightSystematic, then we need to multiply both
- Parameters
data (ak.Array) – Data columns retrieved from input path
xp (CrossProduct) – The XP being computed
- Returns
Awkward array with the relevant weight column’s data or a float
- _make_histogram
Method to create and fill a histogram using data from one path that contributes to a given XP histogram.
- Parameters
var_data (ak.Array) – A column/columns of data which should fill the histogram
weights (ak.Array | float) – The event weight to be used to fill the histogram
xp (CrossProduct) – The XP instance holding information on the current histogram
- Returns
A filled Hist object
- classmethod hist_wanted(sample, region, observable, syst, template)
Method to determine if a XP is needed or not :param sample: Relevant sample :type sample: Sample :param region: Relevant region :type region: Region :param observable: Relevant observable :type observable: Observable :param syst: Relevant systematic :type syst: Systematic :param template: Relevant template :type template: str
- paths_to_xpinfo(xp_to_paths, xp_to_vars)
Method to convert XP -> paths and XP -> required variables maps into path -> xp and path -> required variables maps
- sort_xps
Sort the cross product order so that all new variables that do not depend on other new variables are computed first. This makes the creation of the variables less problematic and avoids need for recursion which is bad practice in dask.delayed() funcitons.
- Parameters
xps (List[CrossProduct]) – List of cross-products whose histograms need to be computed for the given input path
- Returns
Ordered list of cross-products such that XPs that use new variables which in-turn require other new variables are computed last.