import_datasets

Documentation for the functions in the utilities.import_datasets module:

utilities.import_datasets.loadvars(file_pi, file_k, tree, vars, flag_column=False, flatten1d=True)[source]

Function that extracts the chosen variables for all eventsin two ROOT files given and stores them in numpy arrays.

Parameters:
  • file_pi (str) – Path to MC root file of background only processes.

  • file_k (str) – Path to MC root file of signal only processes.

  • tree (str) – Tree in which the variables are stored on the root files.

  • vars (list[str] or tuple[str]) – List or tuple containing names of the variables to be loaded.

  • flag_column (bool) – If is True, a column full of 0 or 1, for background or signal events respectively, is appended as the last column of the 2D array.

  • flatten1d (bool) – If is True and only one variable is passed as “vars”, the arrays generated are returned as row-arrays instead of one-column arrays.

Returns:

Two 2D numpy arrays filled by events of the two root files given in input and containing the requested variables, plus a flag-column if requested.

Return type:

2D numpy.array[float]

utilities.import_datasets.overlaid_cornerplot(rootpaths=('/home/docs/checkouts/readthedocs.org/user_builds/cmepda-pikclassifier/checkouts/latest/utilities/../data/root_files/B0PiPi_MC.root', '/home/docs/checkouts/readthedocs.org/user_builds/cmepda-pikclassifier/checkouts/latest/utilities/../data/root_files/B0sKK_MC.root', '/home/docs/checkouts/readthedocs.org/user_builds/cmepda-pikclassifier/checkouts/latest/utilities/../data/root_files/Bhh_data.root'), tree='t_M0pipi;1', vars=('M0_Mpipi', 'M0_MKK', 'M0_MKpi', 'M0_MpiK', 'M0_p', 'M0_pt', 'M0_eta', 'h1_thetaC0', 'h1_thetaC1', 'h1_thetaC2', 'h2_thetaC0', 'h2_thetaC1', 'h2_thetaC2'), figpath='')[source]

Generates and saves cornerplots for two different (multidimensional) arrays on the same canvas.

Parameters:
  • figpaths (list[str] or tuple[str]) – Two element list or tuple containing the two root file paths where the events are stored.

  • tree (str) – Name of the tree where the events are stored

  • vars (list[str] or tuple[str]) – List or tuple of names of the variables to plot.

  • figpath (str) – Path where the figure is saved. This string must not contain the name of the figure itself since it is given automatically.

utilities.import_datasets.include_merged_variables(rootpaths, tree, initial_vars, new_variables)[source]

Function that allows to append to the existing datasets (numpy arrays) new columns filled by the outputs of the mergevar() function.

Parameters:
  • rootpaths (list[str] or tuple[str]) – Three element list or tuple of .root file paths. The first should indicate the root file containing the “background” species (flag=0), the second the “signal” species (flag=1), the third the data mix.

  • tree (str) – Tree in which the variables are stored on the root files.

  • initial_vars (list[str] or tuple[str]) – List or tuple containing names of the variables to be loaded.

  • new_variables (list[tuple[str]] or tuple[tuple[str]]) – List or tuple containing two element lists or tuples of variables to merge.

Returns:

A list or tuple containing the new numpy arrays for the three datasets, with the new columns filled with the data retrieved by the merge-variables algorithm. For MC datasets the flag column is still the rightmost column.

Return type:

list[2D numpy.array[double]] or tuple[2D numpy.array[double]]

utilities.import_datasets.array_generator(rootpaths, tree, vars, n_mc=560000, n_data=50000, new_variables=())[source]

Generates arrays for ML treatment (training and testing). To guarantee unbiasedness the training array has an equal number of background and signal events.

Parameters:
  • rootpaths (list[str] or tuple[str]) – Three element list or tuple of .root file paths. The first should indicate the root file containing the “background” species (flag=0), the second the “signal” species (flag=1), the third the mix.

  • tree (str) – Tree in which the variables are stored on the root files.

  • vars (list[str] or tuple[str]) – Tuple containing names of the variables to be used.

  • n_mc (int) – Number of events to take from the root files for the training set.

  • n_data (int) – Number of events to take from the root file for the testing set.

  • new_variables (list[tuple[str]] or tuple[tuple[str]]) – Optional list or tuple containing two element lists or tuples of variables to merge.

Returns:

Two element tuple containing 2D numpy arrays. The first contains the MC datasets’ events (scrambled to avoid position bias) and the flag that identifies each event as background or signal. The second contains the events of the mixed dataset without flags (one less column).

Return type:

list[2D numpy.array[double]] or tuple[2D numpy.array[double]]