(chains)=
# TChain: An n-tuple in multiple files

In ROOT, it's possible to distribute a single n-tuple or `TTree`
across many files. Typically you'd need this when you're running batch
jobs (as described in {ref}`batch-systems`), perhaps thousands of
them, each of which independently creates a file containing an n-tuple with
the same name and structure.

Within a single program, you can read all these files as if they
were one continuous n-tuple. The way to do this is with a
[TChain](https://root.cern.ch/doc/master/classTChain.html).[^tchain] You
construct a `TChain` using the name of the n-tuple, then use the
`TChain::Add` method to define all the files that are part of the
chain.

[^tchain]: If you clicked on that `TChain` link, you'll see another
    important ROOT class whose documentation is sorely lacking. I
    suggest doing a search within the {ref}`ROOT tutorials
    <references>` to see some examples of `TChain` in use:

        cd `root-config --tutdir`
        grep -rli tchain *


    Another tangent:

    [grep](https://matplotlib.org/stable/tutorials/text/usetex.html)
    is a program that interprets [regular
    expressions](https://www.guru99.com/linux-regular-expressions.html)
    (also known as "regexes"), a powerful method for searching,
    replacing, and processing text. More sophisticated programs that
    use regular expressions include
    [sed](https://www.gnu.org/software/sed/),
    [awk](https://www.tutorialspoint.com/awk/index.htm), and
    [perl](https://www.perl.org/docs.html); there are also regex
    libraries in Python and C++. 

    Regexes are used in manipulating
    text, not numerical calculations. Their deep nitty-gritty is
    rarely relevant in physics. On the other hand, I use them all the
    time; e.g., searching the ROOT tutorials for hints.

    Regular expressions are a complex topic, and it can take a lifetime
    to learn about them. (I've lost track of the number of your
    lifetimes I've spent. You're probably tired of the joke anyway.)

    There's a cool xkcd cartoon about regular expressions. It's too big
    to put into a footnote, so you'll have to click on the link
    yourself: <https://xkcd.com/208/>


Here's an example: Suppose instead of the single file {file}`experiment.root`
that we used in the Walkthroughs and Exercises, the n-tuple was
distributed in files {file}`experiment0.root`,
{file}`experiment1.root`, {file}`experiment2.root`, and so on through
{file}`experiment9.root`. They'd all contain the n-tuple `tree1` with
the same variables, but with different values. We could then define a
chain by:

:::{code-block} c++
:name: chain-c-code
:caption: Example use of TChain (C++)

auto tree1 = new TChain("tree1");
tree1->Add("experiment0.root");
tree1->Add("experiment1.root");
// ... and so on
tree1->Add("experiment9.root");
:::

:::{code-block} Python
:name: chain-python-code
:caption: Example use of TChain (Python)

mychain = ROOT.TChain("tree1")
mychain.Add("experiment0.root")
mychain.Add("experiment1.root")
# ... and so on
mychain.Add("experiment9.root")
:::

Note that in the example scripts given earlier in the tutorial (here
are the {ref}`C++ <cpp-run-analyze>` and {ref}`Python
<analyze-tree-python>` versions), these `TChain` definitions would
_replace_ the use of the `TFile` to define the n-tuple input file.

In {ref}`rdataframe`, after you've defined the `TChain` as above, you
can just supply the name of the chain as an argument to `RDataFrame`;
e.g.,

:::{code-block} c++
auto dataframe = ROOT::RDataFrame(tree1);
:::

or

:::{code-block} python
dataframe = ROOT.RDataFrame(mychain)
:::

:::{figure-md} chain_diagram-fig
:align: center

<img src="ChainDiagram.png" alt="ROOT Chain example diagram" width="100%">

This diagram may clarify what a ROOT Chain does. In this example, the TTree
`expTree` is distributed between three different files: `file1.root`,
`file2.root`, and `file3.root`. Because this is a slide from a C++
talk, the example code uses {ref}`TTreeReader <analyze-tree-c-reader>`
to access the columns of the n-tuple. However, the concept applies no matter which
language and method you use to access the tree.
:::

The above code is fine if you only have a few files to add to the chain,
though doing the copy-and-pasting of the lines for all those
{file}`experimentN.root` files would be a bit tedious. But what if
you've got thousands of files? Fortunately, you don't have to specify
each one of them in your program.

`TChain::Add` can accept some wildcard characters to
match against file names. The wildcard you'll probably find to be the
most useful is `*`, which matches any sequence of characters
(including none). So you can do something like this:

    mychain.Add("experiment*.root")

Note that this would also match {file}`experiment.root` and
{file}`experiment-test.root`, which may not be what you want.

If you've learned enough programming to create loops and manipulate
strings, you can also do something like this:

:::{code-block} c++
:name: chain-c-loop code
:caption: Example of using a loop to make a TChain (C++)

auto tree1 = new TChain("tree1");
for ( int i = 0; i < 10; ++i ) {
    std::string filename = "experiment" + std::to_string(i) + ".root";
    tree1->Add(filename.c_str());
}
:::

:::{code-block} Python
:name: chain-loop-python-code
:caption: Example of using a loop to make a TChain (Python)

mychain = ROOT.TChain("tree1")
for i in range(10):
    filename = "experiment" + str(i) + ".root"
    mychain.Add(filename)
:::

Extending these slight examples to thousands of files is left as an
exercise for the student.

Another approach would be to store the names of the ROOT files in a 
text file (or even another n-tuple!), read the filenames from this text file, then
add each one. Again, I leave this as a potential exercise for you. 

:::{figure-md} tech_loops-fig
:align: center

<img src="https://imgs.xkcd.com/comics/tech_loops.png" alt="xkcd tech_loops" width="75%">

<https://xkcd.com/1579/> by Randall Munroe
:::