Batch Farm

Let’s take a look at a model of how you can submit a job from one computer and have it run on many other computers.

batch-manager

Figure 104: An example of a batch system managing multiple instances of a program on multiple computers.

You can probably guess how things flow from top to bottom in the above figure: You submit a program from your computer; we’ll get into how that happens later. The computer that receives the submission is the “batch manager”. The batch manager looks over its collection of “batch nodes”, that is, a group of computers that do nothing else but run users’ program remotely.

The batch manager keeps track of which copy of your program has been submitted to which node. It also controls which copy is actually running on a given node. Other copies of your program are “on hold”, waiting until a queue on that batch node is free for your program to start running.

Of course, your programs are being interleaved with the programs submitted by all the other users of this “batch farm.”

“Hey, if I have to run my program just five times, what’s the point of all this fuss?” Don’t forget the “more” principle: I’ve seen researchers submit 20,000 instances of their program to be executed on a batch farm. You may not need a batch system now, but if you continue your scientific research, you almost certainly will in the future.

xkcd software_development

Figure 105: https://xkcd.com/2021/ by Randall Munroe