Walkthrough: Apply a cut and a count

(15 minutes)

Applying cuts is an important part of any physics analysis. There’ll be some events you want to analyze and others which are not important to your study. A “cut” is a calculation that separates the two categories.

In RDataFrame, the method that applies a cut is Filter. For example, suppose that we’re only interested in events with pz less than 145 GeV. A way this can be expressed in our example n-tuple is:

pzcut = dataframe.Filter("pz < 145")

C++ syntax

Again, the string passed on to the Filter method is interpreted as a C++ expression, not a Python expression, even if you’re working in Python. You’ll get an error if you try this:

pzcut = dataframe.Filter("pz lt 145")

You can also apply a cut on any new columns you’ve defined:

ptcut = definept.Filter("pt > 50")

Define vs. Filter

There’s an important operational difference between Define and Filter. Define is a column-wise operation; that is, it operates on columns and adds a new one. Filter is a row-wise operation; it essentially removes rows from the n-tuple that don’t pass its criteria.

You’ve probably already guessed that you can plot any column from the filtered n-tuple; e.g.,

pzcut_hist = pzcut.Histo1D("ebeam")

The above line would accumulate a histogram of ebeam for those rows with pz less than 145 GeV.

If you just want to know the number of n-tuple rows that pass a cut, the method to use is Count. For example:

pzcut_count = pzcut.Count()

Count syntax

Unlike Define and Filter, Count never takes an argument. However, you can’t omit the parenthesis, since Count is a function; it’s always Count() and not Count in program code.

This seems a bit counter-intuitive at first: You can’t just print out the value of pzcut_count. That’s because it’s still an RDataFrame variable, in the same sense that histchi2 was earlier. In the case of histchi2, you had to Draw it to see anything. The corresponding method to use with Count is GetValue(); e.g.,

Listing 12: RDataFrame - get the count after a cut (C++)
pzcut = dataframe.Filter("pz < 145");
pzcount = pzcut.Count();
std::cout << "The number of events with pz < 145 is " 
          << pzcount.GetValue() << std::endl;
Listing 13: RDataFrame - get the count after a cut (Python)
pzcut = dataframe.Filter("pz < 145")
pzcount = pzcut.Count()
print("The number of events with pz < 145 is",pzcount.GetValue())

Result

When I run either of the above code examples, I get

The number of events with pz < 145 is 14962

Give it a try. Hopefully you’ll get the same answer.

xkcd parking

Figure 38: https://xkcd.com/562/ by Randall Munroe. This is another way to apply a cut.