(df_define)=
# Walkthrough: Defining an RDataFrame
**(10 minutes)**

Defining an `RDataFrame` is usually simple. Here's how to do it in both C++ and Python:

:::{code-block} c++
:name: cpp-rdf-def
:caption: RDataFrame definition (C++)
auto ntupleName = "tree1";
auto fileName = "experiment.root";
auto dataframe = ROOT::RDataframe(ntupleName,fileName);
:::

:::{code-block} python
:name: python-rdf-def
:caption: RDataFrame definition (Python)
import ROOT

ntupleName = "tree1"
fileName = "experiment.root"
dataframe = ROOT.RDataframe(ntupleName,fileName)
:::

:::{note}
Actually, unless I'm writing a program that accepts the name of the n-tuple or
its file as arguments, I usually don't define separate variables like **`ntupleName`**
or **`fileName`** the way I
do in the above examples. I'm more likely to just simply do:

    dataframe=ROOT.RDataframe("tree1","experiment.root")

I'm doing this the long way so you can get a sense of what the arguments mean.

The name **`dataframe`** in this example is arbitrary. If you visited the ROOT
website's [RDataFrame](https://root.cern/doc/master/classROOT_1_1RDataFrame.html)
page, you can see they typically use a short name like **`df`** to save on typing. Since I know how
to use copy-and-paste, I've opted to use a longer variable name for clarity. 
:::

:::{note}
For now, I'm showing both C++ and Python examples of the code. Eventually,
when I think I've shown enough examples so you can convert one to the 
other, I'll stop showing both in parallel. You've probably already noticed
how, at least for `RDataFrame`, the code is very similar.
:::

:::{note}
I assume that you're working through {ref}`rdataframe` interactively,
probably in a {ref}`notebook <notebook>`.
You only have to define your dataframe once per session. I'm not usually
going to include the above commands in the listings below. If you restart
ROOT or the notebook kernel, be sure to initialize **`dataframe`** again. 
:::

If you'd like to see the names of the columns in the dataframe, it's easy
to do interactively:[^describe]

[^describe]: If the `Describe` or `Display` methods don't work for
    you, don't panic. These were added to the very latest version of
    ROOT.  While I try to keep the ROOT versions up-to-date for
    everyone on the Nevis particle-physics systems, sometimes (due to
    complex reasons that are beyond irrelevant to you, trust me)
    I can't offer you the latest-and-greatest. 

:::{code-block} python
:name: rdf-describe
:caption: RDataframe description
dataframe.Describe()
:::

If you'd like a peek at the first few values (roughly equivalent to the `TTree::Scan()` 
method in {ref}`cpath` or {ref}`pythonpath`):[^pointers]

[^pointers]: If you're using C++: You'll have to observe via my
examples when an `RDataFrame` method returns a {ref}`pointer
<pointers>`; that is, when you have to use `->` to access a method.
Remember that you had to deal with this back when you were {ref}`fitting
histograms <fitting-histogram>`.

:::{code-block} c++
:name: cpp-rdf-display
:caption: RDataframe - displaying the first few rows (C++)
dataframe.Display()->Print()
:::

:::{code-block} python
:name: python-rdf-display
:caption: RDataframe - displaying the first few rows (Python)
dataframe.Display().Print()
:::

Give these commands a try to see what they tell you about the n-tuple `tree1`.

:::{figure-md} health_data-fig
:align: center

<img src="https://imgs.xkcd.com/comics/health_data.png" alt="xkcd health_data" width="75%">

<https://xkcd.com/2620/> by Randall Munroe
:::