Histogramming modules

Analysis Objects

This module defines the API with the so-called Analysis objects. These are objects that an analyser would interact with when doing their analysis. They are observables, regions, systematics. The user interacts with pythium.histogramming.objects.Observable , pythium.histogramming.objects.Region and pythium.histogramming.objects.Systematic sub-classes through the configuration file.

class pythium.histogramming.objects.Observable(var, name, binning, dataset, weights=1.0, label='', samples=None, exclude_samples=None, regions=None, exclude_regions=None, *, obs_build=None)

This class defines an observable that will be retrieved from all samples entering a given region, and constructed with the given binning.

name

The name given to the observable

Type

str

var

The name of the observable in the input file

Type

str

binning

The chosen binning for this observable

Type

pythium.histogramming.binning._Binning

dataset

The equivalent of a TTree in ROOT files. It is the parent group for the observable in the input file (e.g. nominal tree in a ROOT file)

Type

str

obs_build

An instance of pythium.common.functor.Functor which defines how to build the observable from existing data

Type

pythium.common.functor.Functor

__eq__(other)

Return self==value.

__hash__()

Return hash(self).

__init__(var, name, binning, dataset, weights=1.0, label='', samples=None, exclude_samples=None, regions=None, exclude_regions=None, *, obs_build=None)
__weakref__

list of weak references to the object (if defined)

classmethod fromFunc(var, func, args, *obs_args, **obs_kwargs)

Alternative “constructor” for pythium.histogramming.objects.Observable class which takes a function and function args instrad of var to compute a new observable from existing data

Parameters
Return type

TypeVar(TObservable, bound= Observable)

Returns

pythium.histogramming.objects.Observable class instance with an pythium.histogramming.objects.ObservableBuilder

classmethod fromStr(name, string_op, *obs_args, **obs_kwargs)

Alternative “constructor” for pythium.histogramming.objects.Observable class which takes a function and function args instrad of var to compute a new observable from existing data

Parameters
  • name (str) – The name to be given to the new observable

  • string – The string that should be parsed to compute new observable

Return type

TypeVar(TObservable, bound= Observable)

Returns

pythium.histogramming.objects.Observable class instance with an pythium.histogramming.objects._ObservableBuilder

class pythium.histogramming.objects.Region(name, selection, title=None, samples=None, exclude_samples=None, observables=None, exclude_observables=None, **kwargs)

This class defines a phase-space region object that the user will need

name

The name given to the region

selection

The pythium.common.selection.Selection instance to be evaluated for all samples that enter this region

samples

The list of pythium.common.samples.Sample instances that should be included in this region

exclude

The list of pythium.common.samples.Sample instances that should be excluded from this region

__init__(name, selection, title=None, samples=None, exclude_samples=None, observables=None, exclude_observables=None, **kwargs)
__weakref__

list of weak references to the object (if defined)

Histogramming Utils

class pythium.histogramming.binning.RegBin(low, high, nbins, axis=None)

Inherits from pythium.histogramming.objects._Binning but constructs uniform binning between given limits

low

The lower edge of histogram

high

The higher edge of histogram

nbins

The number of bins to build within the low and high

__init__(low, high, nbins, axis=None)
class pythium.histogramming.binning.VarBin(binning, axis=None)

Inherits from pythium.histogramming.objects._Binning

__init__(binning, axis=None)
class pythium.histogramming.binning._Binning(binning, axis=None)

The parent binning class

binning

The np.array that defines the bin edges

axis

The number of the axis defined by this binning. 0: x-axis, 1: y-axis.

__init__(binning, axis=None)
__weakref__

list of weak references to the object (if defined)

Configuration

Processing

This is where that steers that handles that booking-keeping for histogramming tasks

class pythium.histogramming.processor.Processor(config, scheduler)

Class which sorts through the user configuration and makes transactions with the managers in order to run a histogramming chain

__init__(config, scheduler)
config

Mapping of the settings from the config file

Type

Dict

scheduler

The scheduler to-be-used by dask

Type

str

__weakref__

list of weak references to the object (if defined)

create()

High-level function through which the user can start the constrction of the histogramming task-graph. No computation happens at this point

cross_product(samples, regions, systematics, observables)

Make all cross-products from the user-provided configurations, skipping cross-products that aren’t needed.

run()

Method to compute the histograms

save(hists_dict)

Method to save the histograms into output pickle files :param hists_dict: Mapping of observable -> sample_region_syst_template -> histogram :type hists_dict: Dict

Task Managing

This is where manage different parts of the histogramming stage pythium.histogramming.managers._InputManager

class pythium.histogramming.managers._InputManager(xps, cfg)

Class responisble for preparing information about the input files and what we want from them

__init__(xps, cfg)

_InputManager constructor

Parameters
  • xps (List[CrossProduct]) – A list of the cross-products to be evaluated

  • cfg (dict) – The histogramming configuration dictionary

__weakref__

list of weak references to the object (if defined)

required_paths()

Method to summarise all the input files that need to be opened.

Goal is to have the task manager open each file only once and get what’s needed from it. The paths are constructed for each XP assuming Pythium naming system, where a file is defiend by a sample + dataset.

Paths are gather from (In case of Pythium-like input):

  • Paths to the nominal file needed for an observable

  • Path to an alternative sample needed for an NTup systematic

  • Path to an alternative tree needed for a Tree systematic

TODO:: Support Custom inputs

required_variables()

Method to determine which variable columns need to be retrieved from input files. Required variables are gathered from:

  • Variables directly requested with the Observable(‘variable’,’name’) API

  • Variables required to compute new Observable s (either passed as args or inferred from a string)

  • Weights column

  • Variables required to apply a region seleciton

  • Variables required to compute a weight variation

These variables are encoded in Functor instances for each Observable, Selection and WeightSyst instances, as the attribute req_vars

Returns

Mapping from CrossProduct instances to list of required variables passed as Observable instances

class pythium.histogramming.managers._TaskManager(method, sample_sel)

Class responsible for building a task-graph from a variety of operations on the input data. In order the manager will build the following workflow into a graph:

  • Retrieve data from inputs

  • Create new variables needed

  • Loop through cross-products

  • Apply event cuts (on all columns)

  • Retrieve the relevant observable column

  • Retrieve the relevant weights (can be columns/floats)

  • Make and fill histogram with observable and weights

__init__(method, sample_sel)

Constructor for _TaskManager .. attribute:: method

Method name to-be-used for reading input

type

str

sample_sel

Should sample selections be applied or not

Type

bool

__weakref__

list of weak references to the object (if defined)

_apply_cut

Apply event selection onto all columns. Object-wise selection should be appplied in the form of masks.

Parameters
  • data (ak.Array) – Data columns retrieved from input path

  • xps (List[CrossProduct]) – The compute-from-file-first ordered list of XPs for a given path

Returns

Awkward array with event selection applied to columns

_build_tree(xp_paths_map, xp_vars_map)

Method to build an optimized task graph of the entire histogramming chain.

Parameters
  • xp_paths_map – Mapping from XP to paths needed

  • xp_vars_map – Mapping from XP to variables needed

Returns

List of dask tasks to be executed Corresponding list of XPs (matches the list of jobs)

_create_variables

Compute and add new columns to the data if needed

Parameters
  • data (ak.Array) – Data columns retrieved from input path

  • xps (List[CrossProduct]) – The compute-from-file-first ordered list of XPs for a given path

Returns

Awkward array with new columns added

_get_data

Call the method to retrieve data from one input file

Parameters
  • inpath (str) – Path to file which should be opened

  • observables (List[Observable]) – List of Observable instances of variables to be retrieved from input file for the given path

Returns

An awkward array of data columns retrieved from input path

_get_var

Method to retrieve a column from data.

Parameters
  • data (ak.Array) – Data columns retrieved from input path

  • xp (CrossProduct) – The XP being computed

Returns

Awkward array with the relevant column’s data

_get_weights

Method to compute event weights from different sources. For example, if weights are given to an observable, as well as to a WeightSystematic, then we need to multiply both

Parameters
  • data (ak.Array) – Data columns retrieved from input path

  • xp (CrossProduct) – The XP being computed

Returns

Awkward array with the relevant weight column’s data or a float

_make_histogram

Method to create and fill a histogram using data from one path that contributes to a given XP histogram.

Parameters
  • var_data (ak.Array) – A column/columns of data which should fill the histogram

  • weights (ak.Array | float) – The event weight to be used to fill the histogram

  • xp (CrossProduct) – The XP instance holding information on the current histogram

Returns

A filled Hist object

classmethod hist_wanted(sample, region, observable, syst, template)

Method to determine if a XP is needed or not :param sample: Relevant sample :type sample: Sample :param region: Relevant region :type region: Region :param observable: Relevant observable :type observable: Observable :param syst: Relevant systematic :type syst: Systematic :param template: Relevant template :type template: str

paths_to_xpinfo(xp_to_paths, xp_to_vars)

Method to convert XP -> paths and XP -> required variables maps into path -> xp and path -> required variables maps

sort_xps

Sort the cross product order so that all new variables that do not depend on other new variables are computed first. This makes the creation of the variables less problematic and avoids need for recursion which is bad practice in dask.delayed() funcitons.

Parameters

xps (List[CrossProduct]) – List of cross-products whose histograms need to be computed for the given input path

Returns

Ordered list of cross-products such that XPs that use new variables which in-turn require other new variables are computed last.