Exercise 15: Data reduction (1-2 hours)

Note

Up until now, we’ve considered n-tuples that someone else created for you. The process by which a file that contains complex data structures is converted into a relatively simple n-tuple is part of a larger process called “data reduction.” It’s a typical step in the overall physics analysis chain.

As I implied in the first day of this tutorial, perhaps you’ll be given an n-tuple and told to work with it. However, it’s possible you’ll be given a file containing the next-to-last step in the analysis chain: a file of C++ objects with data structures. You’d want to extract data from those structures to create your own n-tuples.1

Copy the files whose names contain “Example” from my root-class directory:

> cp ~seligman/root-class/*Example* $PWD

The file exampleEvents.root contains a ROOT tree of C++ objects. The task is to take the event information in those C++ objects and reduce it to a relatively simple n-tuple.

First, take a look at ExampleEvent.h. You’re not going to edit this file. It’s the file that someone else used to create the events in the ROOT tree. If you’re given an ExampleEvents object, you can use any of the methods you see to access information in that object; for example:

ExampleEvent* exampleEvent = 0;
// Assume we assign exampleEvent somehow.
Int_t numberLeptons = exampleEvent->GetNumberLeptons();

For this hypothetical analysis, you’ve been told that the following information is to be put into the n-tuple you’re going to create:

  • the run number;

  • the event number;

  • the total energy of all the particles in the event;

  • the total number of particles in the event.

  • a boolean indicator: does the event have only one muon?

  • the total energy of all the muons in the event;

  • the number of muons in the event;

The task is to write the code to read the events in exampleEvents.root and write an n-tuple to a different file, exampleNtuple.root.

Note

After what you’ve done before, your first inclination may be to open exampleEvents.root directly in ROOT and look at it with the TBrowser. Try it.

It doesn’t fail, but you’ll get an error message about not being able to find a dictionary for some portions of the ExampleEvent class.2 I noted this earlier: it’s possible to extend ROOT’s list of classes with your own by creating a custom dictionary. Only classes that have a dictionary defined can be fully displayed using the ROOT browser.

Try to see how much of the ExampleEvent tree you can see without the dictionary. Then restart ROOT and type the following ROOT command:

[] gSystem->Load("libExampleEvent.so");

This causes ROOT to load in the code for a dictionary that I’ve pre-compiled for you.3 Now you can open the exampleEvents.root using a TFile object and use the ROOT browser to navigate through the ExampleEvent objects stored in the tree.

As you look at the file, you’ll see that there’s a hierarchy of objects. There’s only one object in the file, exampleEventsTree. Inside that tree, there is only one “branch”, exampleEventsBranch.

That’s a bit of a clue: a ROOT n-tuple is actually a TTree object with one Branch for every simple variable.

At this point, you could use MakeSelector() to create a ROOT macro for you, but I suggest that you only do this to get some useful code fragments to copy into your own macro.4

Hint

Some additional hints:

  • The first line of your ROOT macro for this exercise is likely to be the library load command above.

  • If you’re writing a stand-alone program, instead of loading the library you’ll have

    #include "ExampleEvent.h"
    

    and include libExampleEvent.so on the line you use to compile your code.

  • Look at the examples in the $ROOTSYS/tutorials/tree directory, on the TTree web page, and in the macro you created with MakeSelector (if you chose to make one).

  • Yes, the ampersands are important!

One more hint:

How do you tell if a lepton is a muon or an electron? I’m not talking about their track length in the detector, at least not for this example. I’m talking about what indicator is being used inside this example TTree.

There’s a standard identification code used for particles. The Particle Data Group developed it, so it’s called the “PDG code”.5 There are methods in ExampleEvent that return this value (e.g., LeptonPDG). For this exercise, these codes will be sufficient:

Particle

PDG Code

\(e^{-}\)

11

\(e^{+}\)

-11

\(\mu^{-}\)

13

\(\mu^{+}\)

-13

If the sign of the PDG codes for leptons seems puzzling to you, recall that under the usual particle-physics nomenclature, electrons are assigned a lepton number L of +1, positrons are assigned L=-1, and so on.

Extra challenge

Use the RDataFrame class to create the output n-tuple, instead of manually fiddling with TTree and branches. One aspect will be a trifle easier: Using the .Define method in RDataFrame is easier than defining a branch.

The harder part will be figure out how to pass a calculation to Define to calculate each column in the n-tuple. You’ll have to learn how to create (in the words of the ROOT web site) a “function, lambda expression, functor class, or any other callable object”, none of which I’ve mentioned so far in this tutorial.

Get to work!6

xkcd particle_properties

Figure 50: https://xkcd.com/1862/ by Randall Munroe


1

If you’re trying to get through the advanced exercises using Python, this one may stump you; it certainly stumps me. I know of no simple way of loading a C+±based ROOT dictionary using Python. Something like this may be a start:

ROOT.gInterpreter.ProcessLine('#include "ExampleEvent.h"')
ROOT.gSystem.Load("./libExampleEvent.so")
2

If you didn’t get such a message, then you probably copied my entire root-class directory to your working directory. That’s OK, but you might want to temporarily create a new directory, go into it, start ROOT, and open the file just so you can see the error message. That way you’ll know how it looks if you have a missing-dictionary problem.

3

This library may not work if you’re on a different kind of system than the one on which I created the library. If you get some kind of load error, here’s what to do:

Copy the following additional files from my root-class directory if you haven’t already done so:

LinkDef.h
ExampleEvent.cxx
BuildExampleEvent.cxx
BuildExampleEvent.sh

Run the UNIX command script with:

> sh BuildExampleEvent.sh

This will (re-)create the libExampleEvent shared library for your system. It will also create the program BuildExampleEvent, which I used to create the file exampleEvent.root.

If you’re running this on a Macintosh, the name of the library will be libExampleEvent.dylib; that’s the name to use in the gSystem->Load() command in the Mac version of ROOT.

4

Why don’t I want to you use MakeSelector here? The answer is that some physics experiments only use ROOT to make n-tuples; they don’t use it for their more complex C++ classes. In that case, you won’t be able to use MakeSelector because you won’t have a ROOT dictionary. It’s likely that such a physics experiment would have its own I/O methods that you’d use to read its physics classes, but you’d still use a ROOT TTree and branches to write your n-tuple.

5

If you’d like to see them, here’s a PDF file with a complete list of codes.

6

In the time since I constructed this exercise in the mid-2000s, a new class has been added to ROOT: TNtuple. It may make the process of writing n-tuples easier for you. Take a look!