TChain: An n-tuple in multiple files
In ROOT, it’s possible to distribute a single n-tuple or TTree
across many files. Typically you’d need this when you’re running batch
jobs (as described in Batch Systems), perhaps thousands of
them, each of which independently creates a file containing an n-tuple with
the same name and structure.
Within a single program, you can read all these files as if they
were one continuous n-tuple. The way to do this is with a
TChain.1 You
construct a TChain using the name of the n-tuple, then use the
TChain::Add method to define all the files that are part of the
chain.
Here’s an example: Suppose instead of the file experiment.root
that we used in the walkthroughs and Exercises, we had files
experiment0.root, experiment1.root,
experiment2.root, and so on through experiment9.root,
all containing the n-tuple tree1 with the same variables. We could
then define a chain by:
auto tree1 = new TChain("tree1");
tree1->Add("experiment0.root");
tree1->Add("experiment1.root");
// ... and so on
tree1->Add("experiment9.root");
mychain = ROOT.TChain("tree1")
mychain.Add("experiment0.root")
mychain.Add("experiment1.root")
# ... and so on
mychain.Add("experiment9.root")
Note that in the example scripts given earlier in the tutorial (here
are the C++ and Python versions), these TChain definitions would
replace the use of the TFile to define the n-tuple input file.
In The RDataframe Path, after you’ve defined the TChain as above, you
can just supply the name of the chain as an argument to RDataFrame;
e.g.,
auto dataframe = ROOT::RDataFrame(tree1);
or
dataframe = ROOT.RDataFrame(mychain)
The above code is fine if you only have a few files to add to the chain,
though doing the copy-and-pasting of the lines for all those
experimentN.root files would be a bit tedious. But what if
you’ve got thousands of files? Fortunately, you don’t have to specify
each one of them in your program.
TChain::Add can accept some wildcard characters to
match against file names. The wildcard you’ll probably find to be the
most useful is *, which matches any sequence of characters
(including none). So you can do something like this:
mychain.Add("experiment*.root")
Note that this would also match experiment.root and
experiment-test.root, which may not be what you want.
If you’ve learned enough programming to create loops and manipulate strings, you can also do something like this:
auto tree1 = new TChain("tree1");
for ( int i = 0; i < 10; ++i ) {
std::string filename = "experiment" + std::to_string(i) + ".root";
tree1->Add(filename.c_str());
}
mychain = ROOT.TChain("tree1")
for i in range(10):
filename = "experiment" + str(i) + ".root"
mychain.Add(filename)
Extending these slight examples to thousands of files is left as an exercise for the student.
Another approach would be to store the names of the ROOT files in a text file (or even another n-tuple!), read the filenames from this text file, then add each one. Again, I leave this as a potential exercise for you.
Figure 65: https://xkcd.com/1579/ by Randall Munroe
- 1
If you clicked on that
TChainlink, you’ll see another important ROOT class whose documentation is sorely lacking. I suggest doing a search within the ROOT tutorials to see some examples ofTChainin use:cd $ROOTSYS/tutorials grep -rli tchain *