(transformation)= # Transformations and Actions **(15 minutes)** Let's refine our understanding of what goes on when we perform operations with `RDataFrame`. In all of the examples I've given you up to now, I've shown each operation on a single line of code; e.g., :::{code-block} c++ :name: cpp-one-per-line :caption: The verbose way of doing things with RDataFrame (C++) // Define the dataframe from an input ntuple and file. auto dataframe = ROOT::RDataFrame("tree1","experiment.root"); // Histogram the value of pt for pz<145 GeV, pt<10 GeV. auto pzcut = dataframe.Filter("pz < 145"); auto ptdefine = pzcut.Define("pt","sqrt(px*px + py*py)"); auto ptcut = ptdefine.Filter("pt < 10"); auto pthist = ptcut.Histo1D("pt"); ::: You can be much less verbose if you don't need to use intermediate modified dataframes for anything:[^continue] [^continue]: If you're using Python, it might help to note that C++ does not need anything special to continue a program statement on another line (it's the `;` that terminates a statement). Python requires a backslash `\` to continue a statement onto the next line. :::{code-block} c++ :name: cpp-all-on-one-line :caption: The concise way of doing things with RDataFrame (C++) // Define the dataframe from an input ntuple and file. auto dataframe = ROOT::RDataFrame("tree1","experiment.root"); // Histogram the value of pt for pz<145 GeV, pt<10 GeV. auto pthist = dataframe.Filter("pz<145").Define("pt","sqrt(px*px+py*py)") .Filter("pt<10").Histo1D("pt"); ::: There's an important restriction when you're being more concise: You can have as many {dfn}`transformations` as you like on an `RDataFrame`, but a given sequence of operations can have only one {dfn}`action`. Before I give you a definition of "transformation" or "action", let me show you what led me to make this distinction. I tried to do something like this: :::{code-block} python :name: python-two-actions :caption: An attempt to use two actions on one line (Python) # Define the dataframe from an input ntuple and file. import ROOT dataframe = ROOT.RDataFrame("tree1","experiment.root") # Count the number of events with pz<145 GeV and histogram them. pthist = dataframe.Filter("pz < 145").Count().Histo1D("pz") ::: The above line won't work; give it a try to see the error message.[^cpp-snob] [^cpp-snob]: If you're a C++ snob (which I've been accused of being from time to time), you might foolishly assume that it doesn't work because the code is in Python. Instead of being rude, just slap an `auto` in the front and a `;` at the end and see for yourself. The reason why the code doesn't work is that both `Count()` and `Histo1D()` are actions. A {dfn}`transformation` like `Define()` or `Filter()` changes the n-tuple; an {dfn}`action` accumulates data within the n-tuple. If you go to the [RDataFrame web page](https://root.cern/doc/master/classROOT_1_1RDataFrame.html), you will see lists of which `RDataFrame` operations are transformations and which are actions (and which are {dfn}`queries`, yet another category). Here's a re-write of the code above so that there's only one action per line. :::{code-block} python :name: python-one-action :caption: For two actions, use two lines (Python) # Define the dataframe from an input ntuple and file. import ROOT dataframe = ROOT.RDataFrame("tree1","experiment.root") # Count the number of events with pz<145 GeV and histogram them. ptcut = dataframe.Filter("pz < 145") ptcount = ptcut.Count() pthist = ptcut.Histo1D("pz") ::: In other words, you can put everything on one line _if you don't need to use the intermediate modified dataframes_. If you want to apply more than one action to same modified dataframe, then you will have to create intermediate dataframe variables. :::::{tip} Some folks may find a diagram helpful for understanding this idea. Consider the following code:[^curly] [^curly]: If the stuff in the curly braces `{}` is confusing to you, look at the footnotes in {ref}`rdf-make-scatterplots`. :::{code-block} c++ :name: cpp-code-for-display :caption: Several n-tuple operations (C++) // Define an RDataFrame. auto dataframe = ROOT::RDataFrame("tree1","experiment.root"); // Create a couple of histograms, before and after a pz cut. // Make sure the x-axes of the plots will be the same. auto pzhist = dataframe.Histo1D({"pz","pz before cut",100,130,170},"pz"); auto pzcuthist = dataframe.Filter("pz < 145") .Histo1D({"pzcut","pz after cut",100,130,170},"pz"); // Create a new column, pt, and look at chi2 before and after a pt cut. // Again, make sure the x-axes match on the histograms. auto ptDefined = dataframe.Define("pt","sqrt(px*px + py*py)"); auto chi2hist = ptDefined .Histo1D({"chi2","chi2 before cut",100,0,20},"chi2"); auto chi2cut = ptDefined.Filter("pt < 10"); auto chi2cuthist = chi2cut .Histo1D({"chi2cut","chi2 after cut",100,0,20},"chi2"); // How many events passed our pt cut? auto chi2cutcount = chi2cut.Count(); // The necessary Draw() and GetValue() methods to see any plots or values // are left as an exercise for the student. ::: This is a diagram of how `RDataFrame` organizes the chain of operations to be performed on the n-tuple: :::{figure-md} rdataframe-fig :align: center RDataFrame operations diagram The series of operations that have been assigned to the `tree1` n-tuple based on the above code. ::: You may want to look at the program listing and match it against the operations indicated in the diagram. For a "path of operations" to take place, that path must end in an action, which are the rectangles in the diagram.[^display] ::::: [^display]: You can generate a diagram like this for your own dataframes, but it can be a lot of additional work. I only recommend it if you find these kinds of diagrams to be useful. Here's how I made that diagram using the `SaveGraph` method that's part of `RDataFrame`. After I defined all my n-tuple operations, I executed: // C++ ROOT::RDF::SaveGraph(dataframe, "./dataframe.dot"); // Python ROOT.ROOT.RDF.SaveGraph(dataframe, "./dataframe.dot") This will create a file {file}`dataframe.dot` in your current directory. You can look at the file using {program}`less`, but all you'll see is a text representation of the graph. To turn a `.dot` file into a diagram, you need to have the [Graphviz](https://graphviz.org/) software installed; this is available on all the systems on the Nevis particle-physics Linux cluster. The Nevis systems also have [ImageMagick](https://imagemagick.org/); all you have to do on a system with both is type this in your {ref}`UNIX window `: display dataframe.dot If you're not on a Nevis system, see {ref}`installing` and include `graphviz` and `imagemagick` when you install `root`.