Walkthrough: Defining an RDataFrame

(10 minutes)

Defining an RDataFrame is usually simple. Here’s how to do it in both C++ and Python:

Listing 4: RDataFrame definition (C++)
auto ntupleName = "tree1";
auto fileName = "experiment.root";
auto dataframe = ROOT::RDataframe(ntupleName,fileName);
Listing 5: RDataFrame definition (Python)
import ROOT

ntupleName = "tree1"
fileName = "experiment.root"
dataframe = ROOT.RDataframe(ntupleName,fileName)

Note

Actually, unless I’m writing a program that accepts the name of the n-tuple or its file as arguments, I usually don’t define separate variables like ntupleName or fileName the way I do in the above examples. I’m more likely to just simply do:

dataframe=ROOT.RDataframe("tree1","experiment.root")

I’m doing this the long way so you can get a sense of what the arguments mean.

The name dataframe in this example is arbitrary. If you visited the ROOT website’s RDataFrame page, you can see they typically use a short name like df to save on typing. Since I know how to use copy-and-paste, I’ve opted to use a longer variable name for clarity.

Note

For now, I’m showing both C++ and Python examples of the code. Eventually, when I think I’ve shown enough examples so you can convert one to the other, I’ll stop showing both in parallel. You’ve probably already noticed how, at least for RDataFrame, the code is very similar.

Note

I assume that you’re working through The RDataframe Path interactively, probably in a notebook. You only have to define your dataframe once per session. I’m not usually going to include the above commands in the listings below. If you restart ROOT or the notebook kernel, be sure to initialize dataframe again.

If you’d like to see the names of the columns in the dataframe, it’s easy to do interactively:1

Listing 6: RDataframe description
dataframe.Describe()

If you’d like a peek at the first few values (roughly equivalent to the TTree::Scan() method in The C++ Path or The Python Path):2

Listing 7: RDataframe - displaying the first few rows (C++)
dataframe.Display()->Print()
Listing 8: RDataframe - displaying the first few rows (Python)
dataframe.Display().Print()

Give these commands a try to see what they tell you about the n-tuple tree1.

xkcd health_data

Figure 45: https://xkcd.com/2620/ by Randall Munroe


1

If the Describe or Display methods don’t work for you, don’t panic. These were added to the very latest version of ROOT. While I try to keep the ROOT versions up-to-date for everyone on the Nevis particle-physics systems, sometimes (due to complex reasons that are beyond irrelevant to you, trust me) I can’t offer you the latest-and-greatest.

2

If you’re using C++: You’ll have to observe via my examples when an RDataFrame method returns a pointer; that is, when you have to use -> to access a method. Remember that you had to deal with this back when you were fitting histograms.