Walkthrough: Making scatterplots
(15 minutes)
Now that we’ve had some practice making one-dimensional histograms, let’s
make a two-dimensional histogram. Let’s see if we can take the same approach
that we did for Exercise 1. To make a 1-D histogram we used
Histo1D
; when I look at the RDataFrame web page
I see there’s a Histo2D
method. So it’s obvious that it should be something like:
hist2dim = dataframe.Histo2D("ebeam","px");
hist2dim->Draw();
canvas.Draw();
hist2dim = dataframe.Histo2D("ebeam","px")
hist2dim.Draw()
canvas.Draw()
Give it a try!
Hey, what’s happening? Am I being sneaky again?
Not this time. This one of those (fortunately rare) cases where ROOT is not
uniform in its approach. In order to make a 2D histogram with RDataFrame
, you have to supply
the same parameters to Histo2D
as if you were to create such a
histogram “by hand.”
If you look up TH2D, in analogy
with TH1D, you’ll see that the
arguments to TH2D
are something like:
hist2d = TH2D("name","title",nxbins,xlo,xhi,nybins,ylo,yhi)
where:
"name"
is the ROOT name of the histogram;"title"
is the title of histogram, which is shown at the top of the plot;nxbins
is the number of bins on the x-axis;xlo
is the lower limit of the x-axis of the plot;xhi
is the upper limit of the x-axis of the plot;nybins
is the number of bins on the y-axis;ylo
is the lower limit of the y-axis of the plot;yhi
is the upper limit of the y-axis of the plot.
When using RDataFrame
, you have explicitly supply these values to Histo2D
like this:
Histo2D(("name", "title", nxbins, xlo, xhi, nybins, ylo, yhi),"ebeam","px")
Here’s how it looks in the actual code, specifying the TH2D
parameters in an
initializer list in the respective languages:
hist2dim = dataframe.Histo2D({"hist2dim", "ebeam vs px", 100, 149, 151, 100, -20, 20},"ebeam","px");
hist2dim->Draw();
canvas.Draw();
hist2dim = dataframe.Histo2D(("hist2dim", "ebeam vs px", 100, 149, 151, 100, -20, 20),"ebeam","px")
hist2dim.Draw()
canvas.Draw()
Give it a try!
Note
This is a scatterplot, a handy way of observing the correlations between
two variables. The Histo2D
command interprets the last two variables as
“x”,“y” to define which axes to use.
It’s easy to fall into the trap of thinking that each (x,y) point on a scatterplot represents two values in your n-tuple. The scatterplot is a grid; each square in the grid is randomly populated with a density of dots proportional to the number of values in that square.
This leads to the question: How did I know the values for xlo
, xhi
, ylo
, and yhi
in the
above examples? The answer is that I made 1-D plots for the variables so I knew their range,
then used those values for the 2-D axis limits.1\(^,\)2
Now that you have the recipe, try making scatterplots of different pairs of variables. Do you see any correlations?
Note
If you see a shapeless blob on the scatterplot, the variables are likely
to be uncorrelated; for example, plot px
versus py
. If you see a
pattern, there may be a correlation; for example, plot pz
versus
zv
. It appears that the higher pz
is, the lower zv
is, and
vice versa. Perhaps the particle loses energy before it is deflected in
the target.

Figure 51: https://xkcd.com/552/ by Randall Munroe
- 1
There’s another obvious question: Why is this necessary? The
Histo1D
method is able to automatically determine the scale of its single x-axis; why can’tHisto2D
do the same for its axes?I hunted for the reason, and finally asked the question on the ROOT Forums. The answer has to do with being able to use
RDataFrame
with multiple threads, a subject I address in the intermediate topics section. While running with multiple execution threads, the ROOT developers can make automatic scaling ofHisto1D
work, but they haven’t figured out how to make automatic axis scaling work withHisto2D
(orHisto3D
, for that matter).The lesson here: Even though
RDataFrame
is generally easier to use (yes, really!) than the techniques described in The C++ Path or The Python Path, there are still times when you have deal with ROOT’s peculiarities.- 2
You can also explicitly specify the parameters when creating a 1-D histogram; e.g.,
hist1 = dataframe.Histo1D(("h1", "ebeam", 100, 149, 151),"ebeam")
You might want to do this if you want to override the automatic histogram limits, or you want to set the histogram title.