Layers

Definition of “jit”

The process of compiling code while a program is executing is called just-in-time compilation, or JIT. In programmer’s slang, this is often called “jitting,” a term that you’ll see in some of RDataFrame’s documentation.1

Up until now, we’ve used Define() and Filter() with operations that could be expressed in a short string of characters. Those strings were “jit’ed” by ROOT’s C++ interpreter (even if you work in Python). For example:

.Define("ptot","sqrt(px*px + py*py + pz*pz)")
.Filter("ptot > 40 && zv > 10")

This might not be sufficient if:

  • The operation you want to perform is much more complex, perhaps requiring many lines of code.

  • You want to get the greatest possible speed out of using RDataFrame. Supplying ROOT with a compiled operation is much faster than using “jitted” code.

When working with jit in a software toolkit like ROOT, you may have to deal with multiple operational layers. In ROOT, these layers are:

  1. The notebook layer. This layer implements the cell layout, keeps track of variables between cells, etc. (If you’re programming on the command-line, you don’t have this layer.)

  2. The programming/interpreter layer. In ROOT C++, this is cling; in Python, it’s… well… Python.

  3. ROOT’s internal layer of variables, methods, libraries, etc.

I vaguely alluded to the last layer when I had you define the double-Gaussian function in The Basics. Remember the first line I had you to type into ROOT?

TF1 f1("func1","sin(x)/x",0,10)

The name f1 was created in the programming layer. The name func1 was created in ROOT’s internal layer. If you want to draw the function, you have to use the name defined in the programming layer:

f1.Draw()

If you want to use the function within ROOT’s internal layer, you have to use its internal name:

TH1D somehistogram("histogramName","Histogram Title",100,-3,3)
somehistogram.FillRandom("func1",1000)

As a general rule, if you’re supplying a name or operation within a character string that’s an argument to a ROOT method (e.g., the func1 in FillRandom("func1",1000)), then that string is being interpreted within ROOT’s internal layer. To the programming layer, it’s just a character string. 2

Separate names for separate layers

I could have shown you examples in which I gave these variables the same name, e.g.:

TF1 func1("func1","sin(x)/x",0,10)

I didn’t do that because, although it can be convenient, it can create the false impression that the programming layer and the internal layer share names.

This will be important as we think about constructing our functions in the following pages. In particular:

  • The internal layer “knows” the names of all the columns in the RDataFrame; it gets them when you define the dataframe.

  • The programming layer does not have access to the variables and functions defined in the internal layer.

  • The internal layer usually does not have access to the functions defined in the programming layer. The exception is when you can define a link between the two, as I show later for C++ and Python.


1

Like many programmers, I’m sloppy with my grammar. I use jit as a noun, verb, adjective, adverb, gerund, and insult without any form of consistency.

Anyone who can’t accept that is a jitting jit who jitly jits their jits. That definitely makes me a jit as well.

2

In theory, this lets you create code within character strings dynamically; e.g.:

   std::string operation
   if ( a = 1 ) 
      { operation = "<"; }
   else
      { operation = ">"; }
   int limit;
   std::cout << "Enter limit: " << std::endl;
   std::cin >> limit;
   std::string code = "pz" + operation + std::to_string(limit);
   // If 'a' was set to 1, and the limit the user entered is 145,
   // then the value of variable 'code' is "pz<145". 
   auto pzcut = dataframe.Filter(code.c_str());

In practice, this can get really confusing and is prone to errors.