Batch Services at Nevis This is a description of the batch job submission services available on the Linux cluster at Nevis Labs. The topics discussed are:

This web page, like the batch system itself, is a work in progress. It was last modified on 06-Jun-2006.


Batch server

As of 16-Aug-2007, the following paragraph is obsolete. The machine hermes is being repaired. The replacement condor batch manager is riverside.nevis.columbia.edu. However, as documented below, you don't need to login to riverside to use condor.

The system responsible for administering batches services is hermes.nevis.columbia.edu. Users typically cannot log in to this machine; you submit and monitor jobs from your local box on the Linux cluster. As far as job submission and execution are concerned, the existence of hermes may be completely transparent to you.

In addition to any RAID drives attached to your workgroup's servers, there are additional "common" RAID drives that are intended to be shared among the users of Nevis batch system. They are initially to be used by the ATLAS and D0 groups, as noted below, but may be made available to other groups as the need arises. These disks are available via automount on the Linux cluster; each has a capacity of about 1.5TB. The names of these drives are:

For example, the permissions on the drives have been set so that you can do the following from any machine on the Linux cluster (if you're a member of the ATLAS group):

cd /a/data/condor/array3/atlas/ mkdir $user cd $user # ... create whatever files you want

Important! If you're skimming this page, stop and read the following paragraph!

The files on these /data partitions, like those on the /data partitions of any other systems on the Nevis cluster, are not backed up. They are stored on RAID5 arrays, which are a reliable form of storage; there is monitoring software that warns if any individual drives have failed. However, RAID arrays have been known to fail (and we've had at least one such failure at Nevis). If you have any critical data stored on these drives, make sure you backup the files yourself.

One more time: the disks on these partitions are not backed up!


Submitting batch jobs

The batch job submission system we're using at Nevis is Condor, developed at the University of Wisconsin. You can learn more about Condor from the User's Manual.

To use Condor at Nevis, the simplest way is to use the setup command:

setup condor This will set the variable $CONDOR_CONFIG to ~condor/etc/condor_config, and add ~condor/bin to your $PATH.

Condor tips and tricks

Many of the above tips, and others, have been combined into a set of example scripts. They are in ~seligman/condor/; start with the README file, which will point you to the other relevant files in the directory.


Availability of batch services

Use of Condor is not available to all systems at Nevis. If you would like access to the batch services (or feel that your system was omitted in error), please contact both Gustaaf Brooijmans and Bill Seligman.


Up to the Nevis Home Page.
Back to the previous Page.
[E-mail] Send comments and suggestions to the webmaster