TChain: An n-tuple in multiple files

In ROOT, it’s possible to distribute a single n-tuple or TTree across many files. Typically you’d need this when you’re running batch jobs (as described in Batch Systems), perhaps thousands of them, each of which independently creates a file containing an n-tuple with the same name and structure.

Within a single program, you can read all these files as if they were one continuous n-tuple. The way to do this is with a TChain.1 You construct a TChain using the name of the n-tuple, then use the TChain::Add method to define all the files that are part of the chain.

Here’s an example: Suppose instead of the file experiment.root that we used in the walkthroughs and Exercises, we had files experiment0.root, experiment1.root, experiment2.root, and so on through experiment9.root, all containing the n-tuple tree1 with the same variables. We could then define a chain by:

Listing 42: Example use of TChain (C++)
auto tree1 = new TChain("tree1");
tree1->Add("experiment0.root");
tree1->Add("experiment1.root");
// ... and so on
tree1->Add("experiment9.root");
Listing 43: Example use of TChain (Python)
mychain = ROOT.TChain("tree1")
mychain.Add("experiment0.root")
mychain.Add("experiment1.root")
# ... and so on
mychain.Add("experiment9.root")

Note that in the example scripts given earlier in the tutorial (here are the C++ and Python versions), these TChain definitions would replace the use of the TFile to define the n-tuple input file.

In The RDataframe Path, after you’ve defined the TChain as above, you can just supply the name of the chain as an argument to RDataFrame; e.g.,

auto dataframe = ROOT::RDataFrame(tree1);

or

dataframe = ROOT.RDataFrame(mychain)

The above code is fine if you only have a few files to add to the chain, though doing the copy-and-pasting of the lines for all those experimentN.root files would be a bit tedious. But what if you’ve got thousands of files? Fortunately, you don’t have to specify each one of them in your program.

TChain::Add can accept some wildcard characters to match against file names. The wildcard you’ll probably find to be the most useful is *, which matches any sequence of characters (including none). So you can do something like this:

mychain.Add("experiment*.root")

Note that this would also match experiment.root and experiment-test.root, which may not be what you want.

If you’ve learned enough programming to create loops and manipulate strings, you can also do something like this:

Listing 44: Example of using a loop to make a TChain (C++)
auto tree1 = new TChain("tree1");
for ( int i = 0; i < 10; ++i ) {
    std::string filename = "experiment" + std::to_string(i) + ".root";
    tree1->Add(filename.c_str());
}
Listing 45: Example of using a loop to make a TChain (Python)
mychain = ROOT.TChain("tree1")
for i in range(10):
    filename = "experiment" + str(i) + ".root"
    mychain.Add(filename)

Extending these slight examples to thousands of files is left as an exercise for the student.

Another approach would be to store the names of the ROOT files in a text file (or even another n-tuple!), read the filenames from this text file, then add each one. Again, I leave this as a potential exercise for you.

xkcd tech_loops

Figure 65: https://xkcd.com/1579/ by Randall Munroe


1

If you clicked on that TChain link, you’ll see another important ROOT class whose documentation is sorely lacking. I suggest doing a search within the ROOT tutorials to see some examples of TChain in use:

cd $ROOTSYS/tutorials
grep -rli tchain *