# Exercise 7: Picking a physics cut **(15 minutes)** Make a histogram of **`chi2`**, then a scatterplot of **`chi2`** vs **`ebeam`**. :::{hint} If you've forgotten how to figure out the axis limits for **`ebeam`**, look at {ref}`rdf-make-scatterplots` again for a clue. ::: :::{note} The chi2 distribution and the scatterplot hint that something interesting may be going on. The chi2 histogram looks unusual: there's a peak around 1, but the x-axis extends far beyond that, up to chi2 > 18. Evidently there are some events with a large chi2, but not enough of them to show up on the plot. On the scatterplot, we can see a dark band that represents the main peak of the chi2 distribution, and a scattering of dots that represents a group of events with anomalously high chi2. The chi2 represents a confidence level in reconstructing the particle's trajectory. If chi2 is high, the trajectory reconstruction was poor. It would be acceptable to apply a cut of "chi2 < 1.5", but let's see if we can correlate a large chi2 with anything else. ::: Make a scatterplot of **`chi2`** versus **`theta`**. :::{note} Take a careful look at the scatterplot. It looks like all the large-chi2 values are found in the region theta > 0.15 radians. It may be that our trajectory-finding code has a problem with large angles. Let's put in both a theta cut and a chi2 cut to be certain we're looking at a sample of events with good reconstructed trajectories. ::: Repeat the above plots with a `Filter()` to only fill your histograms if chi2 < 1.5 and theta < 0.15. Change the bin limits of your histograms to reflect these cuts; for example, there's no point to putting bins above 1.5 in your chi2 histograms since you know there won't be any events in those bins after cuts. :::{tip} You may ask which is better: - `.Filter("chi2 < 1.5").Filter("theta < 0.15")` - `.Filter("chi2 < 1.5 && theta < 0.15")` On the relatively small scale of this example, it doesn't make much of a difference. For a large-scale analysis, the second expression is more efficient, since `RDataFrame` only has to invoke the overhead of the `Filter()` method once (including compiling the C++ expression within the quotes) instead of twice. I must confess: I cheated when I pointed you directly to theta as the cause of the high-chi2 events. I knew this because I wrote the program that created the tree. If you want to look at this program yourself, go to the {ref}`UNIX window ` and type: > less ~seligman/root-class/CreateTree.C :::