Walkthrough: Apply a cut and a count
(15 minutes)
Applying cuts is an important part of any physics analysis. There’ll be some events you want to analyze and others which are not important to your study. A “cut” is a calculation that separates the two categories.
In RDataFrame, the method that applies a cut is Filter. For example, suppose
that we’re only interested in events with pz less than 145 GeV. A way
this can be expressed in our example n-tuple is:
pzcut = dataframe.Filter("pz < 145")
C++ syntax
Again, the string passed on to the Filter method is interpreted as a C++ expression,
not a Python expression, even if you’re working in Python. You’ll get an error
if you try this:
pzcut = dataframe.Filter("pz lt 145")
You can also apply a cut on any new columns you’ve defined:
ptcut = definept.Filter("pt > 50")
Define vs. Filter
There’s an important operational difference between Define and Filter.
Define is a column-wise operation; that is, it operates on columns and adds
a new one. Filter is a row-wise operation; it essentially removes rows
from the n-tuple that don’t pass its criteria.
You’ve probably already guessed that you can plot any column from the filtered n-tuple; e.g.,
pzcut_hist = pzcut.Histo1D("ebeam")
The above line would accumulate a histogram of ebeam for those rows
with pz less than 145 GeV.
If you just want to know the number of n-tuple rows that pass a cut, the method
to use is Count. For example:
pzcut_count = pzcut.Count()
Count syntax
Unlike Define and Filter, Count never takes an argument. However, you can’t
omit the parenthesis, since Count is a function; it’s always Count() and not
Count in program code.
This seems a bit counter-intuitive at first: You can’t just print out the value of
pzcut_count. That’s because it’s still an RDataFrame variable, in the same sense
that histchi2 was earlier. In the case of histchi2,
you had to Draw it to see anything. The corresponding method to use with Count is GetValue(); e.g.,
pzcut = dataframe.Filter("pz < 145");
pzcount = pzcut.Count();
std::cout << "The number of events with pz < 145 is " 
          << pzcount.GetValue() << std::endl;
pzcut = dataframe.Filter("pz < 145")
pzcount = pzcut.Count()
print("The number of events with pz < 145 is",pzcount.GetValue())
Result
When I run either of the above code examples, I get
The number of events with pz < 145 is 14962
Give it a try. Hopefully you’ll get the same answer.
 
Figure 38: https://xkcd.com/562/ by Randall Munroe. This is another way to apply a cut.