(python-approach)= # The Python approach Finally! We can escape from all this C++ nonsense and write code in Python! Sorry, it's not so simple. There are two ways to create your own functions to pass to `RDataFrame` in Python: Using the C++ interpreter [cling](https://root.cern/cling/), and using [Numba](https://numba.pydata.org/). ## The C++ interpreter In this approach, we use the `%%cpp` {ref}`magic command ` within a Python notebook to compile the C++ code that defines our function. :::{admonition} Making the magic work :class: note The `%%cpp` magic command only works *after* an earlier **`import ROOT`** command. Since `%%cpp` is a cell magic, it must be the first line in a cell. Therefore, the **`import`** command must be present in a notebook cell that's executed prior to the one with `%%cpp`. ::: :::{code-block} cpp :name: python-define-func :caption: Defining a C++ function within a ROOT python notebook # First cell: import ROOT %jsroot # ... or whatever # Next cell: %%cpp // Define a function to compute the transverse momentum // from the x- and y-momentum. float pt_func(float xmom, float ymom) { float pt = sqrt(xmom*xmom + ymom+ymom); return pt; } ::: Once you've added this new function to ROOT's C++ interpreter using `%%cpp`, you can use it in formulas that you pass to `RDataFrame`: :::{code-block} python :name: python-use-func :caption: Using our previously-interpreted function in `Define` pt_dataframe = dataframe.Define("pt","pt_func(px,py)") ::: In {numref}`Listing %s `, we had to supply a list of variables for the function's arguments. We don't do that here. The function `pt_func` has become part of ROOT's internal layer, and so it's part of the same "pool" as the names of the columns of the n-tuple. We can just supply the string `"pt_func(px,py)"` as a formula, and ROOT will be able to interpret it correctly. ## Using Numba If you want a purely Python-language approach to writing your own functions, you can try using Numba, a compiler for Python code. Let's return to the calculation of **`pt`** one last time in this section (I promise!): :::{code-block} python :name: python-use-numba :caption: Using a pyroot+Numba decorator to define a function for `RDataFrame` import ROOT, math dataframe = ROOT.RDataFrame("tree1","experiment.root") @ROOT.Numba.Declare(["float", "float"], "float") # Define a function to compute the transverse momentum # from the x- and y-momentum. def pt_func(x,y): return math.sqrt(x*x + y*y) pt_dataframe = dataframe.Define("pt","Numba::pt_func(px,py)") ::: Notes: - The C++ interpreter has an internal environment that includes the definition of math functions like `sqrt`. Python does not, so we have to `import math`.[^obvious]

- The `@` tells us that we are using a Python [decorator](https://www.geeksforgeeks.org/decorators-in-python/). In this case, we are *not* using Numba's own [@jit decorator](https://numba.readthedocs.io/en/stable/user/jit.html) but a pyroot decorator that interfaces Numba to ROOT.

- For pyroot+Numba to be able to compile the function, it needs the types of the function's arguments and its return type, just as a C++ function would. That's the argument to this decorator: `(["float", "float"], "float")` means that the function takes two `float` values as input and returns a single `float` as output.

- The function itself is written in Python. Yay! At last! Indentation matters again! None of those darned semi-colons!

- We have to include `Numba::` in front of the function name for the ROOT interpreter to recognize the pyroot+Numba-compiled function.

- You can find [a more detailed example](https://root.cern.ch/doc/v630/pyroot004__NumbaDeclare_8py.html) in the ROOT tutorials. [^obvious]: This is probably obvious to a Python programmer, but remember that I'm a C++ programmer. We can do pretty much the same for `Filter`, except that we have to make sure that the function returns a boolean (`True` or `False`). For example: :::{code-block} python :name: python-numba-filter :caption: Using a Numba decorator to define a filter for `RDataFrame` @ROOT.Numba.Declare(["float"], "bool") # Define an energy cut of 145 GeV. def energy_cut(e): return e < 145 pz_cut = dataframe.Filter("Numba::energy_cut(pz)") ::: :::{admonition} Why all this C++ nonsense? :class: note Using Numba seems relatively straight-forward. Why did I take you, a Python programmer, through the C++ material? Why didn't I just start with Numba? The answer involves some turgid details. Feel free to skip this note and move on to {ref}`practical-function`. However, if you're curious: - Even the [Numba documentation](https://numba.readthedocs.io/en/stable/user/5minguide.html) admits that it's possible that Numba will make things slower. I'm not sure what happens if you call a ROOT method (e.g., `ROOT.TMath.ATan2(y,x)`) from within a pyroot+Numba decorated function.

- Numba is not always part of a Python installation. I try to make sure it's in the Nevis particle-physics Python libraries (but see the next point). If you're not at Nevis, and your site doesn't have Numba, maybe something like this will install it for you: pip3 install --user --upgrade numba If you have {ref}`installed ` ROOT on your own system, maybe this command will work: conda install numba

- Aside from variations in installations and package managers, the reason why I keep using "maybe" in the previous point is that Numba depends on a specific version of [numpy](https://numpy.org/install/), and it's not always the most recent version.[^not-defect] You can get what we in the sysadmin business call "dependency hell": When you use one of the above commands, the package manager may downgrade numpy to accomodate Numba. In turn, that causes other packages (like [scipy](https://scipy.org/install/)) to be downgraded as well. This spreads through to other packages, and you may finally get a conflict where a package can't be downgraded, and other packages can't be upgraded. The result is lots of package-manager errors and a broken Python distribution (see {numref}`Figure %s `). The only cure may be to tell the package manager to remove Numba!

- This can happen even on Nevis particle-physics systems, since Numba is not the focus of my maintenance of our Python libraries. For example, if the VERITAS group requires the lastest version of [gammapy](https://gammapy.org/), I'm going to install it even if it means that numpy gets updated and Numba ceases to function. Therefore, I can't promise you that Numba will be available. That's why I made you learn what I hope was a tolerable amount of C++. ::: [^not-defect]: I don't see this as a defect in Numba. I see it as a natural consequence of building a language compiler that may require access to a different library over which the Numba development team has no control. Numba-compiled functions are going to call numpy's internals directly, without a Python language wrapper to act as an intermediary. If numpy changes, Numba must change with it. But often there's a [delay](https://github.com/numba/numba/issues/8242) to make this happen. In my experience as a sysadmin, Numba is more frequently out-of-sync with numpy, or packages that depend on numpy, than it is in-sync. :::{figure-md} decorative_constants-fig :align: center xkcd decorative_constants by Randall Munroe :::