Walkthrough: Using Python to analyze a Tree

(10 minutes)


You can spend a lifetime learning all the in-and-outs of programming in Python.1 Fortunately, you only need a small subset of this to perform analysis tasks with pyroot.

In ROOT/C++, there’s a method (MakeSelector) that can create a macro for you from a TTree or n-tuple. In pyroot there’s no direct equivalent. However, the “analysis skeleton” for an n-tuple is much simpler in Python. I’ve got a basic file in my area that you can copy and edit to suit your task.

Copy my example Python script to your directory. Then take a look at it:

%cp ~seligman/root-class/Analyze.py $PWD
%load Analyze.py


The second of the two magic commands will load the contents of Analyze.py into the next notebook cell, ready for you to play with it.

Most analysis tasks have the following steps:

  • Set-up - open files, define variables, create histograms, etc.

  • Loop - for each event in the n-tuple or Tree, perform some tasks: calculate values, apply cuts, fill histograms, etc.

  • Wrap-up - display results, save histograms, etc.

Here’s the Python code from Analyze.py. I’ve marked the places in the code where you’d place your own commands for Set-up, Loop, and Wrap-up.

Listing 2: Python analysis “skeleton” for a ROOT n-tuple.
from ROOT import TFile, gDirectory
# You probably also want to import TH1D and TCanvas
# unless you're not drawing any histograms.
from ROOT import TH1D, TCanvas

# Open the file. Note that the name of your file outside this class
# will probably NOT be experiment.root.
myfile = TFile( 'experiment.root' )

# Retrieve the n-tuple of interest. In this case, the n-tuple's name is
# "tree1". You may have to use the TBrowser to find the name of the
# n-tuple in a file that someone gives you.

mychain = gDirectory.Get( 'tree1' )
entries = mychain.GetEntriesFast()

### The Set-up code goes here.

for jentry in range( entries ):
    # Copy next entry into memory and verify.
    nb = mychain.GetEntry( jentry )

    if nb <= 0:

    # Use the values directly from the tree. This is an example using a
    # variable "vertex". This variable does not exist in the example
    # n-tuple experiment.root, to force you to think about what you're
    # doing.

    # myValue = mychain.vertex
    # myHist.Fill(myValue)

    ### The Loop code goes here.

### The Wrap-up code goes here

Compare this with the C++ code in Listing 1.


You’ve probably already guessed that lines beginning with “#” are comments.

In Python, “flow control” (loops, if statements, etc.) is indicated by indenting statements. In C++, any indentation is optional and is for the convenience of humans. In Python the indentation is mandatory and shows the scope of statements like if and for.

Note that Loop and Wrap-up are distinguished by their indentation. This means that when you type in your own Loop and Wrap-up commands, they must have the same indentation as the comments I put in.

Take a look at the code mychain.vertex, which means “get the current value of variable vertex from the TTree in mychain.” This is an example; there’s no variable vertex in the n-tuple in experiment.root. If you want to know what variables are available, typically you’ll have to examine the n-tuple/TTree in the TBrowser or display its structure with Print as you did before.


We’re up to at least four lifetimes, five if you completed The C++ Path, possibly six if you’re learning LaTeX from scratch, maybe even seven if you skipped ahead to Statistics.