(df_concepts)= # RDataFrame concepts **(5 minutes)** Let's start with some definitions. For purposes of this tutorial, an {dfn}`n-tuple`, a {dfn}`spreadsheet`, and a {dfn}`dataframe` are all the same thing.[^switch]$^,$[^f103] It's something that looks like this: [^switch]: I frequently switch from one term to the other, sometimes within the middle of the same sentence. [^f103]: The term "dataframe" is also an important component of the Python data analysis package [pandas](https://pandas.pydata.org/docs/getting_started/index.html), the [R programming language](https://www.tutorialspoint.com/r/r_data_frames.htm), and the [HDF5](https://www.neonscience.org/resources/learning-hub/tutorials/about-hdf5) file format. It pretty much means the same thing in all these environments. If you're curious why high-energy physics prefers to use the ROOT file format compared to HDF5, here's a [2018 paper](https://iopscience.iop.org/article/10.1088/1742-6596/1085/3/032020/pdf) comparing the use of different file formats and databases in a typical analysis. The TL;DR version: HDF5 is better at storing large multi-dimensional arrays, often found in {abbr}`HPC (High Performance Computing)` applications associated with Deep Learning. ROOT is a better choice for storing complex data structures. :::{figure-md} spreadsheet-fig :align: center A spreadsheet You saw this in the class introduction. These are the first few rows and columns in the n-tuple `tree1` in file {file}`experiment.root`. ::: Some more equivalences: a {dfn}`row` in the spreadsheet can also be called an {dfn}`entry` in the n-tuple; a {dfn}`column` in the spreadsheet is a {dfn}`branch` in the n-tuple. :::{note} In ROOT, the individual cells can have full-fledged C++ structures in them. To keep things simple I'm sticking with numeric values ({dfn}`leaves` in ROOT's terminology) for this tutorial. ::: Since we can think of {numref}`Figure %s ` as a spreadsheet, let's think of the kinds of physics-analysis tasks we might do with the columns and rows in a program like [Microsoft Excel](https://blog.hubspot.com/marketing/microsoft-excel), [Google Sheets](https://www.google.com/sheets/about/), or [Apple Numbers](https://support.apple.com/guide/numbers/intro-to-numbers-tan0eca1a9ab/mac): - Sum the values in a column. While this comes up a lot in the business world, it's not common in a physics analysis. - Statistics: Take the mean or standard deviation of a column, or find its minimum or maximum value. - Make a histogram of the values in a column. You've already done this if you went through the {ref}`TreeViewer ` section. - Add new columns to the spreadsheet, with the new columns derived from formulas based on existing columns. The idea behind [RDataFrame](https://root.cern/doc/master/classROOT_1_1RDataFrame.html) is to provide a simple way to perform tasks like these.