(chains)= # TChain: An n-tuple in multiple files In ROOT, it's possible to distribute a single n-tuple or `TTree` across many files. Typically you'd need this when you're running batch jobs (as described in {ref}`batch-systems`), perhaps thousands of them, each of which independently creates a file containing an n-tuple with the same name and structure. Within a single program, you can read all these files as if they were one continuous n-tuple. The way to do this is with a [TChain](https://root.cern.ch/doc/master/classTChain.html).[^tchain] You construct a `TChain` using the name of the n-tuple, then use the `TChain::Add` method to define all the files that are part of the chain. [^tchain]: If you clicked on that `TChain` link, you'll see another important ROOT class whose documentation is sorely lacking. I suggest doing a search within the {ref}`ROOT tutorials ` to see some examples of `TChain` in use: cd `root-config --tutdir` grep -rli tchain * Another tangent: [grep](https://matplotlib.org/stable/tutorials/text/usetex.html) is a program that interprets [regular expressions](https://www.guru99.com/linux-regular-expressions.html) (also known as "regexes"), a powerful method for searching, replacing, and processing text. More sophisticated programs that use regular expressions include [sed](https://www.gnu.org/software/sed/), [awk](https://www.tutorialspoint.com/awk/index.htm), and [perl](https://www.perl.org/docs.html); there are also regex libraries in Python and C++. Regexes are used in manipulating text, not numerical calculations. Their deep nitty-gritty is rarely relevant in physics. On the other hand, I use them all the time; e.g., searching the ROOT tutorials for hints. Regular expressions are a complex topic, and it can take a lifetime to learn about them. (I've lost track of the number of your lifetimes I've spent. You're probably tired of the joke anyway.) There's a cool xkcd cartoon about regular expressions. It's too big to put into a footnote, so you'll have to click on the link yourself: Here's an example: Suppose instead of the single file {file}`experiment.root` that we used in the Walkthroughs and Exercises, the n-tuple was distributed in files {file}`experiment0.root`, {file}`experiment1.root`, {file}`experiment2.root`, and so on through {file}`experiment9.root`. They'd all contain the n-tuple `tree1` with the same variables, but with different values. We could then define a chain by: :::{code-block} c++ :name: chain-c-code :caption: Example use of TChain (C++) auto tree1 = new TChain("tree1"); tree1->Add("experiment0.root"); tree1->Add("experiment1.root"); // ... and so on tree1->Add("experiment9.root"); ::: :::{code-block} Python :name: chain-python-code :caption: Example use of TChain (Python) mychain = ROOT.TChain("tree1") mychain.Add("experiment0.root") mychain.Add("experiment1.root") # ... and so on mychain.Add("experiment9.root") ::: Note that in the example scripts given earlier in the tutorial (here are the {ref}`C++ ` and {ref}`Python ` versions), these `TChain` definitions would _replace_ the use of the `TFile` to define the n-tuple input file. In {ref}`rdataframe`, after you've defined the `TChain` as above, you can just supply the name of the chain as an argument to `RDataFrame`; e.g., :::{code-block} c++ auto dataframe = ROOT::RDataFrame(tree1); ::: or :::{code-block} python dataframe = ROOT.RDataFrame(mychain) ::: :::{figure-md} chain_diagram-fig :align: center ROOT Chain example diagram This diagram may clarify what a ROOT Chain does. In this example, the TTree `expTree` is distributed between three different files: `file1.root`, `file2.root`, and `file3.root`. Because this is a slide from a C++ talk, the example code uses {ref}`TTreeReader ` to access the columns of the n-tuple. However, the concept applies no matter which language and method you use to access the tree. ::: The above code is fine if you only have a few files to add to the chain, though doing the copy-and-pasting of the lines for all those {file}`experimentN.root` files would be a bit tedious. But what if you've got thousands of files? Fortunately, you don't have to specify each one of them in your program. `TChain::Add` can accept some wildcard characters to match against file names. The wildcard you'll probably find to be the most useful is `*`, which matches any sequence of characters (including none). So you can do something like this: mychain.Add("experiment*.root") Note that this would also match {file}`experiment.root` and {file}`experiment-test.root`, which may not be what you want. If you've learned enough programming to create loops and manipulate strings, you can also do something like this: :::{code-block} c++ :name: chain-c-loop code :caption: Example of using a loop to make a TChain (C++) auto tree1 = new TChain("tree1"); for ( int i = 0; i < 10; ++i ) { std::string filename = "experiment" + std::to_string(i) + ".root"; tree1->Add(filename.c_str()); } ::: :::{code-block} Python :name: chain-loop-python-code :caption: Example of using a loop to make a TChain (Python) mychain = ROOT.TChain("tree1") for i in range(10): filename = "experiment" + str(i) + ".root" mychain.Add(filename) ::: Extending these slight examples to thousands of files is left as an exercise for the student. Another approach would be to store the names of the ROOT files in a text file (or even another n-tuple!), read the filenames from this text file, then add each one. Again, I leave this as a potential exercise for you. :::{figure-md} tech_loops-fig :align: center xkcd tech_loops by Randall Munroe :::