Lambda expressions in C++
I’d love to finish the topic of writing functions for RDataFrame
in
C++ at this point, and move on to torturing the Python programmers
with what they have to put up with.
Unfortunately, I can’t. The RDataFrame tutorials make heavy use of C++ lambda expressions. In order for you to be able to understand the code in those examples, I feel I have to review this somewhat-esoteric coding practice.
Not so fast, Python programmers!
I know you want to leap ahead to the next section. I don’t blame you. Before you do, there’s a couple of things you should know that may influence your choice:
Even though there is usually an equivalent Python example for every C++ example in the ROOT tutorials, sometimes there’s only a
.C
file, probably because the Python equivalent is not yet possible. You’ll want to be able to interpret the C++ code so you can figure out what it’s supposed to do.1
If you’re a skilled Python programmer, you know that there are lambda expressions in Python. Unfortunately, Python-based lambda expressions can’t be used as arguments to
Define
andFilter
.2
Now that you’re informed, I leave it up to you to decide what to read.3
A lambda expression4 is a way of defining a function without giving it a name. It’s normally used for short functions to make the code more convenient.5
Here’s an example of a lambda expression that defines how to calculate our old friend
pt
:
auto pt_func = [](float x, float y) { return std::sqrt(x*x + y*y); };
Compare this with Listing 27:
float pt_func( float x, float y ) {
return std::sqrt( x*x + y*y );
}
Your first reaction may be that we’re using a fancy syntax to do exactly the same thing. But lambda expressions offer more than that: They allow you to define a function within the body of the code, instead of in a separate declaration before the current routine.
Let’s break this down:
[]
means “this is a lambda expression.”6
(float x, float y)
- As with any function, we have to declare its arguments.
{ return std::sqrt(x*x + y*y); }
- this is the body of the function, the same as any other function.
;
- Don’t forget the semi-colon at the end! To C++, this line is a statement, not a function definition. It has to end with a;
so the compiler will recognize it.
auto
- You’ve seen me use theauto
before, to let the compiler do the work of specifying a complicated type.
pt_func
- The name I assign to this lambda expression.7
If you’re sharp of eye and mind, you may have noticed that we didn’t have to declare the return type of the function, the way we had to in Listing 27. That’s because the C++ compiler can automatically deduce the return type from the type of the value in the
return
statement.8
You can use this definition of pt_func
in exactly the same way as in
Listing 28:
auto pt_func = [](float x, float y) { return std::sqrt(x*x + y*y); };
auto pt_dataframe = dataframe.Define("pt",pt_func,{"px","py"});
Consider those two lines of code. I define pt_func
, use it as an argument to
Define
, then never use pt_func
again. As an alternative, we can use
the lambda expression as an argument to Define
to define an anonymous function:
auto pt_dataframe = dataframe
.Define("pt",
[](float x, float y) { return std::sqrt(x*x + y*y); },
{"px","py"} );
I’ve inserted some line breaks to separate the arguments to Define
.
This may seem like a bit much. But it’s the sort of expression that you’ll see in the RDataFrame tutorials (though without the convenient line breaks).
Using lambda expressions in this way makes more sense with Filter
:
auto ptcut_dataframe = pt_dataframe
.Filter( [](float e){ return e < 145; }, {"pt"} );
It’s not all that different from what we used before:
auto ptcut_dataframe = pt_dataframe.Filter("pt < 145");
I feel a need. The need for speed!
There is a reason why the ROOT tutorials use lambda expressions so often: speed.9 As I noted before, compiled code is faster than “jit”ed code. If you’re working with a C++ notebook, you can easily see this for yourself: Create two new notebooks and put in the following:
// In the first cell, copy-and-paste:
ROOT::RDataFrame dataframe("tree1","experiment.root");
TCanvas canvas;
auto pt_hist = dataframe.Define("pt","sqrt(px*px + py*py)").Histo1D("pt");
// In the second cell, copy-and-paste:
%%time
pt_hist->Draw();
// In the first cell, copy-and-paste:
ROOT::RDataFrame dataframe("tree1","experiment.root");
TCanvas canvas;
auto pt_calc = [](float x,float y) { return std::sqrt(x*x+y*y); };
auto pt_hist = dataframe.Define("pt",pt_calc,{"px","py"}).Histo1D("pt");
// In the second cell, copy-and-paste:
%%time
pt_hist->Draw();
Run both notebooks. Bear in mind that this speed difference would be
even greater if experiment.root
were larger in both number of
rows and number of columns.

Figure 47: https://xkcd.com/2453/ by Randall Munroe
- 1
“Can’t I tell what the C++ code is supposed to do from the comments?” Oh, how I wish that were true!
- 2
At least, not in the current version as of the time of this writing (ROOT 6.26), or not without a whole lot of extra code wrappers and decorators. Frankly, it’s not worth the effort.
- 3
Of course, since I have no way to make you do anything, the decision is always up to you!
- 4
I think that this web page offers a better explanation of lambda expressions. However, it also includes an annoying pop-up ad.
- 5
I’ll confess that I often write multi-statement lambda functions. I do this because when I use the function in
RDataFrame
or an iterative function like for_each, I feel it’s helpful for the function to be defined right above the C++ statement that’s going to use it.- 6
It means something to put variable names within those brackets, but this page is long enough, and I need room for the xkcd cartoon. I’ll let you learn this topic from doing a web search on “c++ lambda capture”.
- 7
Wait a second. I just said that lambda expressions are used to define functions without giving them a name (anonymous functions), and yet here I am giving it a name.
I’m giving this lambda expression a name,
pt_func
, for the sake of convenience. Read on; I’ll show you a completely anonymous lambda expression in a few paragraphs.- 8
This is not always true. If you looked at the web pages I linked to above, or you do your own web search on “C++ lambda expressions”, you’ll see examples in which you do have to declare a lambda expression’s return type. However, this is not likely to matter for functions that you’d write for
RDataFrame
.- 9
It has nothing to do with the ROOT developers’ need to demonstrate how cool they are. Of course not.
Now, if you’ll excuse me, I’ve got to go write another smart-aleck footnote or stick another xkcd cartoon somewhere.