RDataframe or write the code?
RDataFrame
is
a ROOT class that does just one thing, but does it really, really well: It
goes through the entries in a ROOT file and does… well… something with
each entry.
Using RDataFrame
, each of the main Exercises in this tutorial (2
through 9) can be written in a few lines of code. You don’t need to create
event loops with macros or analysis skeletons; the RDataFrame
class
and its associated methods handle all of that.
If you want to just click through to The RDataframe Path and get started, go right ahead. However, my conscience demands that I offer you some reasons to take The Python Path or The C++ Path instead:
As I said,
RDataFrame
is designed to loop through every entry in an n-tuple. That describes a large portion of typical physics analysis tasks. The whole raison d’être of this tutorial is to teach you exactly that.However, that’s not the only analysis task you may be asked to do this summer. In fact, none of the Advanced Exercises or Expert Exercises can be done in this way. So you may to take go through the coding portions of this tutorial, in order to prepare you for more challenging tasks.
The
RDataFrame
class is a relatively new addition to ROOT.1 It’s possible your supervisor has never heard of it.
It’s easy to use to use
RDataFrame
… at first. There’s point at which you hit a “wall”: You suddenly have to understand about C++ functions and lambdas, even if you’re doing your work in Python.To give you an idea how complex using
RDataFrame
can get, consider these advanced examples in C++ or Python; they are from the ROOT dataframe tutorials and demonstrate Higgs-boson reconstruction. If you just glance at those examples, you’ll confirm for yourself thatRDataFrame
doesn’t keep you from learning how to code.
Now that I’ve scared you, let’s look at the reasons to use RDataFrame
for this tutorial:
Some students have had difficulty getting through The Python Path or The C++ Path portions of the tutorial in the time we have available. If you do the The RDataframe Path, you’re almost guaranteed to complete the whole thing.
Although I only show examples using the n-tuple
tree1
, you can also use other file formats as input to dataframes; e.g., TTrees and CSV files.
It’s easy to make
RDataFrame
multi-threaded, which can greatly speed up the execution time of its operations.
If you’re ambitious, you might consider working through
The Python Path or The C++ Path up to Exercise 10, then do
The RDataframe Path. After doing Exercises 2 through 9 using code, doing
the same Exercises using RDataFrame
will take very little time.

Figure 29: https://xkcd.com/2582/ by Randall Munroe
- 1
The current RDataFrame class was introduced in ROOT 6.14 (June 2018). From ROOT 6.10 to 6.12, the class was called
ROOT::Experimental::TDataFrame
. Prior to 6.10, you won’t find dataframes in ROOT at all. Since this is an actively evolving feature of ROOT, you’ll want to check which version of ROOT your collaboration uses.The Nevis notebook server uses the latest stable version of ROOT, but collaborations often stick with a particular older ROOT version.