(data-reduction)=
# Exercise 15: Data reduction (working with TTrees)
**(1-2 hours)**

:::{admonition} Beyond n-tuples
:class: note

Up until now, we've considered n-tuples that someone else created for
you. The process by which a file that contains complex data structures
is converted into a relatively simple n-tuple is part of a larger process
called "data reduction." It's a typical step in the overall physics-analysis chain.

As I implied in the first day of this tutorial, perhaps you'll be given
an n-tuple and told to work with it. However, it's possible you'll be
given a file containing the next-to-last step in the analysis chain: a
file of C++ objects with data structures. You'd want to extract data
from those structures to create your own n-tuples.[^f143]
:::

Copy the files whose names contain "Example" from my root-class directory:

    > cp ~seligman/root-class/*Example* $PWD

The file {file}`exampleEvents.root` contains a ROOT tree of C++ objects. The
task is to take the event information in those C++ objects and reduce it
to a relatively simple n-tuple.

First, take a look at {file}`ExampleEvent.h`. You're not going to edit this
file. It's the file that someone else used to create the events in the
ROOT tree. If you're given an `ExampleEvents` object, you can use any of
the methods you see to access information in that object; for example:

    ExampleEvent* exampleEvent = 0;
    // Assume we assign exampleEvent somehow.
    Int_t numberLeptons = exampleEvent->GetNumberLeptons();

For this hypothetical analysis, you've been told that the following
information is to be put into the n-tuple you're going to create:

-   the run number;

-   the event number;

-   the total energy of all the particles in the event;

-   the total number of particles in the event.

-   a boolean indicator: does the event have only one muon?

-   the total energy of all the muons in the event;

-   the number of muons in the event;

The task is to write the code to read the events in {file}`exampleEvents.root`
and write an n-tuple to a different file, {file}`exampleNtuple.root`.

:::::{admonition} Dictionary
:class: note

After what you've done before, your first inclination may be to open
{file}`exampleEvents.root` directly in ROOT and look at it with the `TBrowser`.
Try it.

It doesn't fail, but you'll get an error message about not being able
to find a dictionary for some portions of the `ExampleEvent`
class.[^f144] That's because the file contains a custom class that's not
a standard part of ROOT. In order for ROOT I/O to work with such a class,
ROOT needs a custom dictionary. Only if such a dictionary is defined
can it be fully displayed using the ROOT browser.

Try to see how much of the `ExampleEvent` tree you can see without the
dictionary. Then restart ROOT and type the following ROOT command:

    [] gSystem->Load("libExampleEvent.so");

This causes ROOT to load in the code for a dictionary that I've
pre-compiled for you.[^f145] Now you can open the `exampleEvents.root`
using a `TFile` object and use the ROOT browser to navigate through the
`ExampleEvent` objects stored in the tree.

As you look at the file, you'll see that there's a hierarchy of objects.
There's only one object in the file, `exampleEventsTree`. Inside that
tree, there is only one "branch", `exampleEventsBranch`.

That's a bit of a clue: a ROOT n-tuple is actually a `TTree` object with
one Branch for every simple variable.

At this point, you could use `MakeSelector()` to create a ROOT macro for
you, but I suggest that you only do this to get some useful code
fragments to copy into your own macro.[^f146]

:::::{hint}
Some additional hints:

-   The first line of your ROOT macro for this exercise is likely to be
    the library load command above.

-   If you're writing a stand-alone program, instead of loading the
    library you'll have

        #include "ExampleEvent.h"

    and include `libExampleEvent.so` on the line you use to compile your code.

-   Look at the examples in the $ROOTSYS/tutorials/tree directory, on the `TTree`
    web page, and in the macro you created with `MakeSelector` (if you
    chose to make one).

-   Yes, the ampersands are important!

One more hint:

How do you tell if a lepton is a muon or an electron? I'm not talking
about their track length in the detector, at least not for this example.
I'm talking about what indicator is being used inside this example
`TTree`.

There's a standard identification code used for particles. The Particle
Data Group developed it, so it's called the "PDG code".[^pdg-code] There are
methods in `ExampleEvent` that return this value (e.g., `LeptonPDG`). 
For this exercise, these codes will be sufficient:

:::{table}
:align: center

| Particle   | PDG Code |
| ---------- | -------- |
| $e^{-}$    | 11       |
| $e^{+}$    | -11      |
| $\mu^{-}$  | 13       |
| $\mu^{+}$  | -13      |
:::

If the sign of the PDG codes for leptons seems puzzling to you, recall
that under the usual particle-physics nomenclature, electrons are
assigned a lepton number L of +1, positrons are assigned L=-1, and so
on.
:::::

That's enough to get you started. If you'd like some more:

:::{admonition} Extra challenge
:class: note

Use the `RDataFrame` class to create the output n-tuple, instead of
manually fiddling with `TTree` and branches. One aspect will be a
trifle easier: Using the `.Define` method in `RDataFrame` is easier
than defining a branch.

The harder part will be figure out how to pass a calculation to 
`Define` to calculate each column in the n-tuple.
I give an
example in the {ref}`RDataFrame portion of the tutorial <writing-functions>`.
:::

:::{admonition} More on ROOT dictionaries
:class: note

For more on the topic of creating and reading complex structures from
ROOT TTrees using both C++ and Python , there's a {ref}`chapter on
that topic <dictionary>` in the appendix.

:::

:::{figure-md} particle_properties-fig
:class: align-center

<img src="https://imgs.xkcd.com/comics/particle_properties.png" alt="xkcd particle_properties" width="50%">

<https://xkcd.com/1862/> by Randall Munroe
:::

[^f143]: If you're trying to get through the advanced exercises using
    Python, you will still have to load the 
    C+\+\-based ROOT dictionary:

        ROOT.gInterpreter.ProcessLine('#include "ExampleEvent.h"')
        ROOT.gSystem.Load("./libExampleEvent.so")

    You can find an example of writing TTree branches within the
    ROOT website's [TTree documentation](https://root.cern.ch/doc/master/classTTree.html).

[^f144]: If you didn't get such a message, then you probably copied my
    entire `root-class` directory to your working directory. That's OK,
    but you might want to temporarily create a new directory, go into
    it, start ROOT, and open the file just so you can see the error
    message. That way you'll know how it looks if you have a
    missing-dictionary problem.

[^f145]: This library may not work if you're on a different kind of
    system than the one on which I created the library. If you get some
    kind of load error, here's what to do:

    Copy the following additional files from my `root-class` directory if
    you haven't already done so:

        LinkDef.h
        ExampleEvent.cxx
        BuildExampleEvent.cxx
        BuildExampleEvent.sh

    Run the UNIX command script with:

        > sh BuildExampleEvent.sh

    This will (re-)create the `libExampleEvent` shared library for your
    system. It will also create the program `BuildExampleEvent`, which I
    used to create the file `exampleEvent.root`.

    If you're running this on a Macintosh, the name of the library will
    be `libExampleEvent.dylib`; that's the name to use in the
    `gSystem->Load()` command in the Mac version of ROOT.

[^f146]: Why don't I want to you use `MakeSelector` here? The answer is
    that some physics experiments only use ROOT to make n-tuples; they
    don't use it for their more complex C++ classes. In that case, you
    won't be able to use `MakeSelector` because you won't have a ROOT
    dictionary. It's likely that such a physics experiment would have
    its own I/O methods that you'd use to read its physics classes, but
    you'd still use a ROOT `TTree` and branches to write your n-tuple.

[^pdg-code]: If you'd like to see them, here's
    a [PDF file with a complete list of codes](http://pdg.lbl.gov/2002/montecarlorpp.pdf).