{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A quick look at RDataFrame\n", "\n", "Part of the presentation on speeding up analysis code.\n", "\n", "See `RDataFrameExercises.ipynb` for more examples.\n", "See https://root.cern/doc/master/classROOT_1_1RDataFrame.html for documentation." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import ROOT \n", "\n", "# Just for fun, turn on multi-threaded execution.\n", "# To turn off multi-threading, just comment out\n", "# the line. \n", "ROOT.ROOT.EnableImplicitMT\n", "\n", "# Read in the ntuple into the RDataFrame\n", "fileName = \"~seligman/root-class/experiment.root\"\n", "treeName = \"tree1\"\n", "dataframe = ROOT.RDataFrame(treeName, fileName)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the statements in the following cell are either Actions or Transformations. They don't actually affect the dataframe or read the ROOT file at this point. These actions are staged using a technique called \"lazy evaluation\": nothing actually happens until a concrete result is needed. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Add a new column: pt\n", "ptDataFrame = dataframe.Define(\"pt\",\"sqrt(px*px + py*py)\")\n", "\n", "# Add yet another new column: theta\n", "thetaDataFrame = ptDataFrame.Define(\"theta\",\"atan2(pt,pz)\") \n", "\n", "# Apply a cut:\n", "cutDataFrame = thetaDataFrame.Filter(\"pz < 145\")\n", "\n", "# Make a histogram. Note how it depends on a new column we've defined.\n", "thetaHistogram = thetaDataFrame.Histo1D(\"theta\")\n", "\n", "# Make the same histogram, but with the cut we defined above:\n", "thetaWithCutHistogram = cutDataFrame.Histo1D(\"theta\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Except:* No one uses RDataFrames line-by-line unless what they're doing is so confusing that breaking up the steps into separate commands helps you to understand what's going on, or they're creating multiple histograms. This is what people do:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "derived = dataframe.Define(\"pt\",\"sqrt(px*px + py*py)\") \\\n", " .Define(\"theta\",\"atan2(pt,pz)\") \\\n", " .Define(\"emeas\",\"sqrt(px*px + py*py + pz*pz)\") \\\n", " .Define(\"eloss\",\"ebeam - emeas\")\n", "\n", "thetaHistogram = derived.Histo1D(\"theta\")\n", "thetaWithCutHistogram = derived.Filter(\"pz < 145\").Histo1D(\"theta\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We still haven't actually read the ntuple yet! Now we're about, because we're finally going to request a concrete value or result from the dataframe. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "