(rdf-make-scatterplots)= # Walkthrough: Making scatterplots **(15 minutes)** Now that we've had some practice making one-dimensional histograms, let's make a two-dimensional histogram. Let's see if we can take the same approach that we did for {ref}`Exercise 1 `. To make a 1-D histogram we used `Histo1D`; when I look at the [RDataFrame web page](https://root.cern/doc/master/classROOT_1_1RDataFrame.html) I see there's a `Histo2D` method. So it's obvious that it should be something like: :::{code-block} c++ :name: cpp-rdf-try-2d :caption: RDataFrame - does this make a 2-D histogram (C++) hist2dim = dataframe.Histo2D("ebeam","px"); hist2dim->Draw(); canvas.Draw(); ::: :::{code-block} python :name: python-rdf-try-2d :caption: RDataFrame - does this make a 2-D histogram (Python) hist2dim = dataframe.Histo2D("ebeam","px") hist2dim.Draw() canvas.Draw() ::: Give it a try! Hey, what's happening? Am I being sneaky again? Not this time. This one of those (fortunately rare) cases where ROOT is not uniform in its approach. In order to make a 2D histogram with `RDataFrame`, you have to supply the same parameters to `Histo2D` as if you were to create such a histogram "by hand." If you look up [TH2D](https://root.cern/doc/master/classTH2D.html), in analogy with [TH1D](https://root.cern/doc/master/classTH1D.html), you'll see that the arguments to `TH2D` are something like: hist2d = TH2D("name","title",nxbins,xlo,xhi,nybins,ylo,yhi) where: - `"name"` is the ROOT name of the histogram; - `"title"` is the title of histogram, which is shown at the top of the plot; - `nxbins` is the number of bins on the x-axis; - `xlo` is the lower limit of the x-axis of the plot; - `xhi` is the upper limit of the x-axis of the plot; - `nybins` is the number of bins on the y-axis; - `ylo` is the lower limit of the y-axis of the plot; - `yhi` is the upper limit of the y-axis of the plot. When using `RDataFrame`, you have explicitly supply these values to `Histo2D` like this: Histo2D(("name", "title", nxbins, xlo, xhi, nybins, ylo, yhi),"ebeam","px") Here's how it looks in the actual code, specifying the `TH2D` parameters in an initializer list in the respective languages: :::{code-block} c++ :name: cpp-rdf-2d :caption: RDataFrame - making a 2-D histogram (C++) hist2dim = dataframe.Histo2D({"hist2dim", "ebeam vs px", 100, 149, 151, 100, -20, 20},"ebeam","px"); hist2dim->Draw(); canvas.Draw(); ::: :::{code-block} python :name: python-rdf-2d :caption: RDataFrame - making a 2-D histogram (Python) hist2dim = dataframe.Histo2D(("hist2dim", "ebeam vs px", 100, 149, 151, 100, -20, 20),"ebeam","px") hist2dim.Draw() canvas.Draw() ::: Give it a try! :::{note} This is a scatterplot, a handy way of observing the correlations between two variables. The `Histo2D` command interprets the last two variables as "x","y" to define which axes to use. It's easy to fall into the trap of thinking that each (x,y) point on a scatterplot represents two values in your n-tuple. The scatterplot is a grid; each square in the grid is randomly populated with a density of dots proportional to the number of values in that square. ::: This leads to the question: How did I know the values for `xlo`, `xhi`, `ylo`, and `yhi` in the above examples? The answer is that I made 1-D plots for the variables so I knew their range, then used those values for the 2-D axis limits.[^why2d]$^,$[^histo1d] [^why2d]: There's another obvious question: Why is this necessary? The `Histo1D` method is able to automatically determine the scale of its single x-axis; why can't `Histo2D` do the same for its axes? I hunted for the reason, and finally asked the question on the [ROOT Forums](https://root-forum.cern.ch/t/using-rdataframe-histo2d/29408). The answer has to do with being able to use `RDataFrame` {ref}`with multiple threads `, a subject I address in the {ref}`intermediate topics ` section. While running with multiple execution threads, the ROOT developers can make automatic scaling of `Histo1D` work, but they haven't figured out how to make automatic axis scaling work with `Histo2D` (or `Histo3D`, for that matter). The lesson here: Even though `RDataFrame` is generally easier to use (yes, really!) than the techniques described in {ref}`cpath` or {ref}`pythonpath`, there are still times when you have deal with ROOT's peculiarities. [^histo1d]: You can also explicitly specify the parameters when creating a 1-D histogram; e.g., hist1 = dataframe.Histo1D(("h1", "ebeam", 100, 149, 151),"ebeam") You might want to do this if you want to override the automatic histogram limits, or you want to set the histogram title. Now that you have the recipe, try making scatterplots of different pairs of variables. Do you see any correlations? :::{note} If you see a shapeless blob on the scatterplot, the variables are likely to be uncorrelated; for example, plot **`px`** versus **`py`**. If you see a pattern, there may be a correlation; for example, plot **`pz`** versus **`zv`**. It appears that the higher **`pz`** is, the lower **`zv`** is, and vice versa. Perhaps the particle loses energy before it is deflected in the target. ::: :::{figure-md} correlation-fig :align: center xkcd correlation by Randall Munroe :::