RDataFrame concepts
(5 minutes)
Let’s start with some definitions. For purposes of this tutorial, an n-tuple, a spreadsheet, and a dataframe are all the same thing.1\(^,\)2 It’s something that looks like this:

Figure 44: You saw this in the class introduction. These are the first few
rows and columns in the n-tuple tree1
in file experiment.root
.
Some more equivalences: a row in the spreadsheet can also be called an entry in the n-tuple; a column in the spreadsheet is a branch in the n-tuple.
Note
In ROOT, the individual cells can have full-fledged C++ structures in them. To keep things simple I’m sticking with numeric values (leaves in ROOT’s terminology) for this tutorial.
Since we can think of Figure 44 as a spreadsheet, let’s think of the kinds of physics-analysis tasks we might do with the columns and rows in a program like Microsoft Excel, Google Sheets, or Apple Numbers:
Sum the values in a column. While this comes up a lot in the business world, it’s not common in a physics analysis.
Statistics: Take the mean or standard deviation of a column, or find its minimum or maximum value.
Make a histogram of the values in a column. You’ve already done this if you went through the TreeViewer section.
Add new columns to the spreadsheet, with the new columns derived from formulas based on existing columns.
The idea behind RDataFrame is to provide a simple way to perform tasks like these.
- 1
I frequently switch from one term to the other, sometimes within the middle of the same sentence.
- 2
The term “dataframe” is also an important component of the Python data analysis package pandas, the R programming language, and the HDF5 file format. It pretty much means the same thing in all these environments.
If you’re curious why high-energy physics prefers to use the ROOT file format compared to HDF5, here’s a 2018 paper comparing the use of different file formats and databases in a typical analysis. The TL;DR version: HDF5 is better at storing large multi-dimensional arrays, often found in HPC applications associated with Deep Learning. ROOT is a better choice for storing complex data structures.