The Python approach

Finally! We can escape from all this C++ nonsense and write code in Python!

Sorry, it’s not so simple.

There are two ways to create your own functions to pass to RDataFrame in Python: Using the C++ interpreter cling, and using Numba.

The C++ interpreter

In this approach, we use the %%cpp magic command within a Python notebook to compile the C++ code that defines our function.

Making the magic work

The %%cpp magic command only works after an earlier import ROOT command. Since %%cpp is a cell magic, it must be the first line in a cell. Therefore, the import command must be present in a notebook cell that’s executed prior to the one with %%cpp.

Listing 35: Defining a C++ function within a ROOT python notebook
# First cell:
import ROOT
%jsroot
# ... or whatever


# Next cell:
%%cpp
// Define a function to compute the transverse momentum
// from the x- and y-momentum. 
float pt_func(float xmom, float ymom) {
    float pt = sqrt(xmom*xmom + ymom+ymom);
    return pt;
}

Once you’ve added this new function to ROOT’s C++ interpreter using %%cpp, you can use it in formulas that you pass to RDataFrame:

Listing 36: Using our previously-interpreted function in Define
pt_dataframe = dataframe.Define("pt","pt_func(px,py)")

In Listing 28, we had to supply a list of variables for the function’s arguments. We don’t do that here. The function pt_func has become part of ROOT’s internal layer, and so it’s part of the same “pool” as the names of the columns of the n-tuple. We can just supply the string "pt_func(px,py)" as a formula, and ROOT will be able to interpret it correctly.

Using Numba

If you want a purely Python-language approach to writing your own functions, you can try using Numba, a compiler for Python code.

Let’s return to the calculation of pt one last time in this section (I promise!):

Listing 37: Using a pyroot+Numba decorator to define a function for RDataFrame
import ROOT, math
dataframe = ROOT.RDataFrame("tree1","experiment.root")

@ROOT.Numba.Declare(["float", "float"], "float")
# Define a function to compute the transverse momentum
# from the x- and y-momentum. 
def pt_func(x,y):
    return math.sqrt(x*x + y*y)

pt_dataframe = dataframe.Define("pt","Numba::pt_func(px,py)")

Notes:

  • The C++ interpreter has an internal environment that includes the definition of math functions like sqrt. Python does not, so we have to import math.1

  • The @ tells us that we are using a Python decorator. In this case, we are not using Numba’s own @jit decorator but a pyroot decorator that interfaces Numba to ROOT.

  • For pyroot+Numba to be able to compile the function, it needs the types of the function’s arguments and its return type, just as a C++ function would. That’s the argument to this decorator: (["float", "float"], "float") means that the function takes two float values as input and returns a single float as output.

  • The function itself is written in Python. Yay! At last! Indentation matters again! None of those darned semi-colons!

  • We have to include Numba:: in front of the function name for the ROOT interpreter to recognize the pyroot+Numba-compiled function.

We can do pretty much the same for Filter, except that we have to make sure that the function returns a boolean (True or False). For example:

Listing 38: Using a Numba decorator to define a filter for RDataFrame
@ROOT.Numba.Declare(["float"], "bool")
# Define an energy cut of 145 GeV.
def energy_cut(e):
    return e < 145

pz_cut = dataframe.Filter("Numba::energy_cut(pz)")

Why all this C++ nonsense?

Using Numba seems relatively straight-forward. Why did I take you, a Python programmer, through the C++ material? Why didn’t I just start with Numba?

The answer involves some turgid details. Feel free to skip this note and move on to Exercise 10: A more practical function. However, if you’re curious:

  • Even the Numba documentation admits that it’s possible that Numba will make things slower. I’m not sure what happens if you call a ROOT method (e.g., ROOT.TMath.ATan2(y,x)) from within a pyroot+Numba decorated function.

  • Numba is not always part of a Python installation. I try to make sure it’s in the Nevis particle-physics Python libraries (but see the next point). If you’re not at Nevis, and your site doesn’t have Numba, maybe something like this will install it for you:

    pip3 install --user --upgrade numba
    

    If you have installed ROOT on your own system, maybe this command will work:

    conda install numba
    

  • Aside from variations in installations and package managers, the reason why I keep using “maybe” in the previous point is that Numba depends on a specific version of numpy, and it’s not always the most recent version.2

    You can get what we in the sysadmin business call “dependency hell”: When you use one of the above commands, the package manager may downgrade numpy to accomodate Numba. In turn, that causes other packages (like scipy) to be downgraded as well.

    This spreads through to other packages, and you may finally get a conflict where a package can’t be downgraded, and other packages can’t be upgraded. The result is lots of package-manager errors and a broken Python distribution (see Figure 52). The only cure may be to tell the package manager to remove Numba!

  • This can happen even on Nevis particle-physics systems, since Numba is not the focus of my maintenance of our Python libraries. For example, if the VERITAS group requires the lastest version of gammapy, I’m going to install it even if it means that numpy gets updated and Numba ceases to function.

Therefore, I can’t promise you that Numba will be available. That’s why I made you learn what I hope was a tolerable amount of C++.

xkcd decorative_constants

Figure 48: https://xkcd.com/2566/ by Randall Munroe


1

This is probably obvious to a Python programmer, but remember that I’m a C++ programmer.

2

I don’t see this as a defect in Numba. I see it as a natural consequence of building a language compiler that may require access to a different library over which the Numba development team has no control. Numba-compiled functions are going to call numpy’s internals directly, without a Python language wrapper to act as an intermediary.

If numpy changes, Numba must change with it. But often there’s a delay to make this happen. In my experience as a sysadmin, Numba is more frequently out-of-sync with numpy, or packages that depend on numpy, than it is in-sync.