(lambda-expressions)= # Lambda expressions in C++ I'd love to finish the topic of writing functions for `RDataFrame` in C++ at this point, and move on to torturing the Python programmers with what they have to put up with. Unfortunately, I can't. The [RDataFrame tutorials](https://root.cern.ch/doc/master/group__tutorial__dataframe.html) make heavy use of C++ lambda expressions. In order for you to be able to understand the code in those examples, I feel I have to review this somewhat-esoteric coding practice. :::{admonition} Not so fast, Python programmers! :class: warning I know you want to leap ahead to the next section. I don't blame you. Before you do, there's a couple of things you should know that may influence your choice: - Even though there is usually an equivalent Python example for every C++ example in the [ROOT tutorials](https://root.cern.ch/doc/master/group__Tutorials.html), sometimes there's only a `.C` file, probably because the Python equivalent is not yet possible. You'll want to be able to interpret the C++ code so you can figure out what it's supposed to do.[^comments] - If you're a skilled Python programmer, you know that there are lambda expressions in Python. Unfortunately, Python-based lambda expressions can't be used as arguments to `Define` and `Filter`.[^work] Now that you're informed, I leave it up to you to decide what to read.[^always] ::: [^comments]: "Can't I tell what the C++ code is supposed to do from the comments?" Oh, how I {ref}`wish ` that were true! [^work]: At least, not in the current version as of the time of this writing (ROOT 6.26), or not without a whole lot of extra code wrappers and decorators. Frankly, it's not worth the effort. [^always]: Of course, since I have no way to make you do anything, the decision is always up to you! A [lambda expression](https://stackoverflow.com/questions/7627098/what-is-a-lambda-expression-in-c11)[^better] is a way of defining a function without giving it a name. It's normally used for short functions to make the code more convenient.[^long] [^better]: I think that [this web page](https://www.programiz.com/cpp-programming/lambda-expression) offers a better explanation of lambda expressions. However, it also includes an annoying pop-up ad. [^long]: I'll confess that I often write multi-statement lambda functions. I do this because when I use the function in `RDataFrame` or an iterative function like [for_each](https://www.geeksforgeeks.org/for_each-loop-c/), I feel it's helpful for the function to be defined right above the C++ statement that's going to use it. Here's an example of a lambda expression that defines how to calculate our old friend **`pt`**: :::{code-block} c++ auto pt_func = [](float x, float y) { return std::sqrt(x*x + y*y); }; ::: Compare this with {numref}`Listing %s `: :::{code-block} c++ float pt_func( float x, float y ) { return std::sqrt( x*x + y*y ); } ::: Your first reaction may be that we're using a fancy syntax to do exactly the same thing. But lambda expressions offer more than that: They allow you to define a function within the body of the code, instead of in a separate declaration before the current routine. Let's break this down: - `[]` means "this is a lambda expression."[^brackets] - `(float x, float y)` - As with any function, we have to declare its arguments. - `{ return std::sqrt(x*x + y*y); }` - this is the body of the function, the same as any other function. - `;` - Don't forget the semi-colon at the end! To C++, this line is a statement, not a function definition. It has to end with a `;` so the compiler will recognize it. - `auto` - You've seen me use the `auto` before, to let the compiler do the work of specifying a complicated type. - `pt_func` - The name I assign to this lambda expression.[^anon] - If you're sharp of eye and mind, you may have noticed that we didn't have to declare the return type of the function, the way we had to in {numref}`Listing %s `. That's because the C++ compiler can automatically deduce the return type from the type of the value in the `return` statement.[^not-always] [^brackets]: It means something to put variable names within those brackets, but this page is long enough, and I need room for the xkcd cartoon. I'll let you learn this topic from doing a web search on "c++ lambda capture". [^anon]: Wait a second. I just said that lambda expressions are used to define functions without giving them a name ({dfn}`anonymous functions`), and yet here I am giving it a name. I'm giving this lambda expression a name, `pt_func`, for the sake of convenience. Read on; I'll show you a completely anonymous lambda expression in a few paragraphs. [^not-always]: This is not always true. If you looked at the web pages I linked to above, or you do your own web search on "C++ lambda expressions", you'll see examples in which you do have to declare a lambda expression's return type. However, this is not likely to matter for functions that you'd write for `RDataFrame`. You can use this definition of `pt_func` in exactly the same way as in {numref}`Listing %s `: :::{code-block} c++ auto pt_func = [](float x, float y) { return std::sqrt(x*x + y*y); }; auto pt_dataframe = dataframe.Define("pt",pt_func,{"px","py"}); ::: Consider those two lines of code. I define `pt_func`, use it as an argument to `Define`, then never use `pt_func` again. As an alternative, we can use the lambda expression as an argument to `Define` to define an anonymous function: :::{code-block} c++ :name: c-anon-define :caption: An example of a completely anonymous lambda expression auto pt_dataframe = dataframe .Define("pt", [](float x, float y) { return std::sqrt(x*x + y*y); }, {"px","py"} ); ::: I've inserted some line breaks to separate the arguments to `Define`. This may seem like a bit much. But it's the sort of expression that you'll see in the [RDataFrame tutorials](https://root.cern.ch/doc/master/group__tutorial__dataframe.html) (though without the convenient line breaks). Using lambda expressions in this way makes more sense with `Filter`: :::{code-block} c++ :name: c-anon-filter :caption: Using a lambda expression with `Filter` auto ptcut_dataframe = pt_dataframe .Filter( [](float e){ return e < 145; }, {"pt"} ); ::: It's not all that different from what we used before: :::{code-block} c++ auto ptcut_dataframe = pt_dataframe.Filter("pt < 145"); ::: :::::{admonition} I feel a need. The need for speed! :class: note There is a reason why the ROOT tutorials use lambda expressions so often: speed.[^slick] As I noted before, compiled code is faster than "jit"ed code. If you're working with a C++ notebook, you can easily see this for yourself: Create two new notebooks and put in the following: [^slick]: It has nothing to do with the ROOT developers' need to demonstrate how cool they are. Of course not. Now, if you'll excuse me, I've got to go write another smart-aleck footnote or stick another xkcd cartoon somewhere. :::{code-block} c++ :name: c-notebook-jit :caption: The C++ notebook for testing the speed of jit-ed code // In the first cell, copy-and-paste: ROOT::RDataFrame dataframe("tree1","experiment.root"); TCanvas canvas; auto pt_hist = dataframe.Define("pt","sqrt(px*px + py*py)").Histo1D("pt"); // In the second cell, copy-and-paste: %%time pt_hist->Draw(); ::: :::{code-block} c++ :name: c-notebook-lambda :caption: The C++ notebook for testing the speed of compiled code // In the first cell, copy-and-paste: ROOT::RDataFrame dataframe("tree1","experiment.root"); TCanvas canvas; auto pt_calc = [](float x,float y) { return std::sqrt(x*x+y*y); }; auto pt_hist = dataframe.Define("pt",pt_calc,{"px","py"}).Histo1D("pt"); // In the second cell, copy-and-paste: %%time pt_hist->Draw(); ::: Run both notebooks. Bear in mind that this speed difference would be even greater if {file}`experiment.root` were larger in both number of rows and number of columns. ::::: :::{figure-md} excel_lambda-fig :class: align-center xkcd excel_lambda by Randall Munroe :::