(df_concepts)=
# RDataFrame concepts
**(5 minutes)**
Let's start with some definitions. For purposes of this tutorial, an
{dfn}`n-tuple`, a {dfn}`spreadsheet`, and a {dfn}`dataframe` are all
the same thing.[^switch]$^,$[^f103] It's something that looks like this:
[^switch]: I frequently switch from one term to the other, sometimes
within the middle of the same sentence.
[^f103]: The term "dataframe" is also an important component of the
Python data analysis package
[pandas](https://pandas.pydata.org/docs/getting_started/index.html),
the [R programming
language](https://www.tutorialspoint.com/r/r_data_frames.htm),
and the [HDF5](https://www.neonscience.org/resources/learning-hub/tutorials/about-hdf5) file format.
It pretty much means the same thing in all these environments.
If you're curious why high-energy physics prefers to use the ROOT
file format compared to HDF5, here's a [2018
paper](https://iopscience.iop.org/article/10.1088/1742-6596/1085/3/032020/pdf)
comparing the use of different file formats and databases in a
typical analysis. The TL;DR version: HDF5 is better at storing
large multi-dimensional arrays, often found in {abbr}`HPC (High
Performance Computing)` applications associated with Deep
Learning. ROOT is a better choice for storing complex data
structures.
:::{figure-md} spreadsheet-fig
:align: center
You saw this in the class introduction. These are the first few
rows and columns in the n-tuple `tree1` in file {file}`experiment.root`.
:::
Some more equivalences: a {dfn}`row` in the spreadsheet can also be
called an {dfn}`entry` in the n-tuple; a {dfn}`column` in the spreadsheet
is a {dfn}`branch` in the n-tuple.
:::{note}
In ROOT, the individual cells can have full-fledged C++ structures
in them. To keep things simple I'm sticking with numeric values
({dfn}`leaves` in ROOT's terminology) for this tutorial.
:::
Since we can think of {numref}`Figure %s ` as a spreadsheet,
let's think of the kinds of physics-analysis tasks we might do with the
columns and rows in a program like [Microsoft Excel](https://blog.hubspot.com/marketing/microsoft-excel),
[Google Sheets](https://www.google.com/sheets/about/), or [Apple Numbers](https://support.apple.com/guide/numbers/intro-to-numbers-tan0eca1a9ab/mac):
- Sum the values in a column. While this comes up a lot in the
business world, it's not common in a physics analysis.
- Statistics: Take the mean or standard deviation of a column, or find its minimum or maximum value.
- Make a histogram of the values in a column. You've already done this if you
went through the {ref}`TreeViewer ` section.
- Add new columns to the spreadsheet, with the new columns derived from
formulas based on existing columns.
The idea behind [RDataFrame](https://root.cern/doc/master/classROOT_1_1RDataFrame.html) is to
provide a simple way to perform tasks like these.