(cpp-approach)=
# The C++ approach

:::{admonition} A warning for Python programmers
:class: warning
Don't skip this section. Some of the concepts apply to you
as well.
:::

Here's a function that takes two inputs, the x-momentum and the y-momentum,
and returns one output, the transverse momentum. 

:::{code-block} c++
:name: c-pt-func
:caption: A simple C++ function to be used with RDataFrame.
float pt_func( float xmom, float ymom ) {
    return sqrt( xmom*xmom + ymom*ymom );
}
:::

Following the usual C++ standard, you'd define this function before
you defined your main routine.[^forward]

[^forward]: You could also define this function *after* your main routine, and just include a
    [forward declaration](https://stackoverflow.com/questions/4757565/what-are-forward-declarations-in-c)
    before the main routine. 

:::{admonition} Too simple?
:class: note
Obviously, this function is so simple that you're not likely to define
it separately just to pass it to `RDataFrame::Define()` (though see
the section on {ref}`lambda expressions <lambda-expressions>` later on).
The point is to start with
something simple as a "skeleton" for you to see how to create more
complex functions of your own.
:::

In order to use this function `pt_func` on a dataframe, you could do:

:::{code-block}	c++
:name: define-func-c
:caption: How to apply `pt_func` to each entry in an n-tuple
auto pt_dataframe = dataframe.Define("pt",pt_func,{"px","py"});
:::

Note how this differs from what we've done before:

:::{code-block}	c++
:name: define-jit-c
:caption: Our earlier approach to defining a new column in our n-tuple.
auto pt_dataframe = dataframe.Define("pt","sqrt(px*px + py*py)");
:::

In {numref}`Listing %s <define-jit-c>`, we supply the function in the form
of a text string, to which ROOT applies its internal compiler to jit
the string. In {numref}`Listing %s <define-func-c>`, we let C++ compile
the function from {numref}`Listing %s <c-pt-func>` and pass that function's C++ "programming layer" name to
the `Define()` method.

However, that's not enough for `RDataFrame::Define()` to use `pt_func`.
It has to be told which n-tuple columns to supply as arguments to the
function. That's why we also have to provide a list `{"px","py"}` as a
third argument to `Define`.[^short]{sup}`,`[^default]

[^short]: Could we have avoided the need to specify `{"px","py"}`
    to `Define` if we'd used those names in the definition of
    `pt_func`? For example,

    :::{code-block} c++
    float pt_func( float px, float py ) {
        return sqrt( px*px + py*py );
    }
    :::

    You've probably already guessed that the answer is no. Remember,
    names that are defined in the programming layer have no meaning
    to ROOT's internal layer. Even if we choose to use the same
    name in the programming layer as in the internal layer, ROOT
    has no direct way of matching those names between layers.

[^default]: If we omit the list of columns in `Define`, ROOT will
    assume that the user function takes every column in the n-tuple
    as an argument. For the extremely simple n-tuple `tree1`, you
    might be able to live with that; e.g.,

    :::{code-block} c++
    float pt_func( float c2, float eb, int ev,
                   float xmom, float ymom, float zmom,
                   float zvertex ) {
        return sqrt( xmom*xmom + ymom*ymom );
    }
    :::

    Then we could omit that third argument to `Define`:

    :::{code-block} c++
    auto pt_dataframe = dataframe.Define("pt",pt_func);
    :::

    However:

    - The compiler will toss out a lot of warning messages about
      "unused variables". This is accurate, since our function
      does not refer (for example) to `zv` in its body. 

    - You can't always control the order of the columns in
      an n-tuple. In particular, if you look at the n-tuple
      that you created using {ref}`Snapshot <snapshot>`,
      you may see that the method did not necessarily add the
      new columns to the end of the n-tuple.

    - The n-tuples in real experiments often have hundreds of
      columns. It's impractical to list them all in the function
      definition. If you don't, you may get "function not found"
      error messages when you compile your program; the number of
      arguments in your function (like the two in `pt_func`) won't
      match the number of arguments assumed by the compiler
      (hundreds?).

This gives us a recipe:

- Define a function that returns a value; e.g.,

      float some_function( float value1, float value2, ... ) {
         // Lines of code that use value1, value2, ...
         // to calculate a result.
         return result;
      }

- Use that function in a `Define`, supplying the n-tuple columns to be
  passed to the function as a list of strings:

      auto new_dataframe =
          dataframe.Define("new-column",some_function,{"column1", "column2", ...});

If you're writing a function that will be called by `Filter`, the
recipe is almost the same, except that function has to return a
[boolean](https://www.w3schools.com/cpp/cpp_booleans.asp) result
(`true`, `false`). For example:

:::{code-block} c++
:caption: An example of a function that could be used as an argument to `Filter`

bool energy_cut( float energy ) {
    return energy < 145;
}
:::