The Python approach
Finally! We can escape from all this C++ nonsense and write code in Python!
Sorry, it’s not so simple.
There are two ways to create your own functions to pass to
RDataFrame
in Python: Using the C++ interpreter
cling, and using
Numba.
The C++ interpreter
In this approach, we create a text string containing the C++ code that defines our function. Then we pass that string to cling to interpret. Here’s an example:
import ROOT
# Define a (long) text string containing a C++ function definition.
pt_func_def = '''
// Define a function to compute the transverse momentum
// from the x- and y-momentum.
float pt_func(float xmom, float ymom) {
float pt = sqrt(xmom*xmom + ymom+ymom);
return pt;
}
'''
# Add the function to ROOT's C++ pool of interpreted code.
ROOT.gInterpreter.Declare(pt_func_def)
Notes:
pt_func_def
is a string, not a function.The string that will be passed to the interpreter must be delimited by quotes. That’s the reason for
'''
that delimits the string;'''gummy bear'''
defines in Python the string'gummy bear'
including the quotes.The name of the function that will be added to ROOT’s pool of interpreted code is
pt_func
, the function name defined in the C++ code; it is notpt_func_def
.I emphasize again: The code in this string must be in full-fledged C++. You have to obey all the C++ conventions: specifying types, putting
;
at end of lines, curly braces, etc.
Once you’ve added this new function to ROOT’s interpreter, you can use
it in formulas that you pass to RDataFrame
:
pt_dataframe = dataframe.Define("pt","pt_func(px,py)")
In Listing 31, we had to supply a list of
variables for the function’s arguments. We don’t do that here. The
function pt_func
has become part of ROOT’s internal layer, and so
it’s part of the same “pool” as the names of the columns of the
n-tuple. We can just supply the string "pt_func(px,py)"
as a
formula, and ROOT will be able to interpret it correctly.
Using Numba
If you want a purely Python-language approach to writing your own functions, you can try using Numba, a compiler for Python code.
Let’s return to the calculation of pt
one last time (I promise!):
import ROOT, math
dataframe = ROOT.RDataFrame("tree1","experiment.root")
@ROOT.Numba.Declare(["float", "float"], "float")
# Define a function to compute the transverse momentum
# from the x- and y-momentum.
def pt_func(x,y):
return math.sqrt(x*x + y*y)
pt_dataframe = dataframe.Define("pt","Numba::pt_func(px,py)")
Notes:
The C++ interpreter has an internal environment that includes the definition of math functions like
sqrt
. Python does not, so we have toimport math
.1The
@
tells us that we are using a Python decorator. In this case, we are not using Numba’s own @jit decorator but a pyroot decorator that interfaces Numba to ROOT.For pyroot+Numba to be able to compile the function, it needs the types of the function’s arguments and its return type, just as a C++ function would. That’s the argument to this decorator:
(["float", "float"], "float")
means that the function takes twofloat
values as input and returns a singlefloat
as output.The function itself is written in Python. Yay! At last! Indentation matters again! None of those darned semi-colons!
We have to include
Numba::
in front of the function name for the ROOT interpreter to recognize the pyroot+Numba-compiled function.You can find a more detailed example in the ROOT tutorials.
We can do pretty much the same for Filter
, except that we have to
make sure that the function returns a boolean (True
or False
). For
example:
@ROOT.Numba.Declare(["float"], "bool")
# Define an energy cut of 145 GeV.
def energy_cut(e):
return e < 145
pz_cut = dataframe.Filter("Numba::energy_cut(pz)")
Why all this C++ nonsense?
Using Numba seems relatively straight-forward. Why did I take you, a Python programmer, through the C++ material? Why didn’t I just start with Numba?
The answer involves some turgid details. Feel free to skip this note and move on to Exercise 10: A more practical function. However, if you’re curious:
Even the Numba documentation admits that it’s possible that Numba might make things slower. I’m not sure what happens if you call a ROOT method (e.g.,
TMath::ATan2(y,x)
) from within a pyroot+Numba decorated function.Numba is not always part of a Python installation. I try to make sure it’s in the Nevis particle-physics Python libraries (but see the next point). If you’re not at Nevis, and your site doesn’t have Numba, maybe something like this will install it for you:
pip3 install --user --upgrade numba
If you have installed ROOT on your own system, maybe this command will work:
conda install numba
Aside from variations in installations and package managers, the reason why I keep using “maybe” in the previous point is that Numba depends on a specific version of numpy, and it’s not always the most recent version.2
You can get what we in the sysadmin business call “dependency hell”: When you use one of the above commands, the package manager may downgrade numpy to accomodate Numba. In turn, that causes other packages (like scipy) to be downgraded as well.
This spreads through to other packages, and you may finally get a conflict where a package can’t be downgraded, and other packages can’t be upgraded. The result is lots of package-manager errors and a broken Python distribution (see Figure 35). The only cure may be to tell the package manager to remove Numba!
This can happen even on Nevis particle-physics systems, since Numba is not the focus of my maintenance of our Python libraries. For example, if the VERITAS group requires the lastest version of gammapy, I’m going to install it even if it means that numpy gets updated and Numba ceases to function.
Therefore, I can’t promise you that Numba will be available. That’s why I made you learn what I hope was a tolerable amount of C++.

Figure 60: https://xkcd.com/2566/ by Randall Munroe
- 1
This is probably obvious to a Python programmer, but remember that I’m a C++ programmer.
- 2
I don’t see this as a defect in Numba. I see it as a natural consequence of building a language compiler that may require access to a different library over which the Numba development team has no control. Numba-compiled functions are going to call numpy’s internals directly, without a Python language wrapper to act as an intermediary.
If numpy changes, Numba must change with it. But often there’s a delay to make this happen. In my experience as a sysadmin, Numba is more frequently out-of-sync with numpy, or packages that depend on numpy, than it is in-sync.