dtc

Documentation for the machine_learning.dtc.dt_classifier function:

machine_learning.dtc.dt_classifier(source=('root', ('/home/docs/checkouts/readthedocs.org/user_builds/cmepda-pikclassifier/checkouts/latest/utilities/../data/root_files/B0PiPi_MC.root', '/home/docs/checkouts/readthedocs.org/user_builds/cmepda-pikclassifier/checkouts/latest/utilities/../data/root_files/B0sKK_MC.root', '/home/docs/checkouts/readthedocs.org/user_builds/cmepda-pikclassifier/checkouts/latest/utilities/../data/root_files/Bhh_data.root')), root_tree='t_M0pipi;1', vars=('M0_Mpipi', 'M0_MKK', 'M0_MKpi', 'M0_MpiK', 'M0_p', 'M0_pt', 'M0_eta', 'h1_thetaC0', 'h1_thetaC1', 'h1_thetaC2', 'h2_thetaC0', 'h2_thetaC1', 'h2_thetaC2'), n_mc=560000, n_data=50000, test_size=0.3, min_leaf_samp=1, crit='gini', print_tree='printed_dtc', figpath='')[source]

Builds and tests a Decision Tree Classifier with multiple variables (features) in numpy arrays, performs an evaluation on a mixed dataset and applies them the algorithm that estimates the fraction of Kaons.

Parameters:
  • source (tuple[{'root','txt'},tuple[str]]) – Two element tuple containing respectively the option for how to build the DTC and the relative paths. The first item can be either ‘txt’ or ‘root’. In case it is built from txt the second element of source must be a tuple containing two .txt paths, one relative to the template set .txt file and the other to the set to be evaluated. The .txt files must be in a format compatible with numpy’s loadtxt() and savetxt() methods. In case it is built from root, the second element of source must be a tuple containing three .root file paths: the first should indicate the root file containing the “background” species (flag=0), the second the “signal” species (flag=1), the third the mix to be evaluated.

  • root_tree (str) – In case of ‘root’ source, the name of the tree from which to load variables.

  • vars (tuple[str]) – In case of ‘root’ source, tuple containing the names of the variables to load and with which the DTC should be built.

  • n_mc (int) – In case of ‘root’ source, number of events to take from the root files as mc set

  • n_data (int) – In case of ‘root’ source, number of events to take from the root file as data set

  • test_size (float) – The fraction of events in the mc set to be used as testing dataset for the DTC.

  • min_leaf_samp (int or float) – The minimum number of samples required to split an internal node. If it’s an int, it’s the minimum number. If it’s a float, it’s the fraction.

  • crit ({'gini','log_loss','entropy'}) – The function to measure the quality of a split. Supported criteria are ‘gini’ for the Gini impurity and ‘log_loss’ and ‘entropy’ both for the Shannon information gain.

  • print_tree (str) – If different from ‘’, prints the tree on a .txt file with the given name

Returns:

Estimated fraction of Kaons (with uncertainties) and parameters of the test algorithm

Return type:

tuple[float], tuple[float]