Nevis 2005 Server Upgrades We're putting in three major new servers in Dec 2005. The new servers are:
  • kolya, which replaces an old ATLAS server;
  • karthur, which replaces an old D0 server;
  • shang, which replaces the broken DOE server.

Here are the details of the replacement servers. Some of the details are in flux.

Before I get into the configuration details, here's the proposed transition plan for shang:

Here's the transition plan for kolya and karthur:


kolya (ATLAS server)

Here's a comparison between the old and new machine configurations:


"old" kolya"new" kolya
ProcessorsDual 1.7GhZ Intel XeonDual 3.2GHz Intel Xeon
with hyperthreading;
four effective processor queues
RAM500 MB4 GB
size of /home13 GB35 GB
size of /data68 GB2.5 TB
operating systemRedhat 9Fedora Core 3

In addition, all the partitions have been set up using LVM. The net effect of this is that we can non-destructively adjust the partition sizes later on if we wish. In particular, the 2.5TB storage area is a RAID5 array on a chassis that has only 8 drives out of a possible 24; if we get more drives for the array, we can expand the size of /data. (Contrast this setup with hermes, which was set up without LVM: we have the awkward /data/array1, /data/array2, and /data/array3.)

Before you get too excited about this new storage space, remember our backup policy: the /data partition is not backed up!. RAID arrays have been known to fail. If you need to backup a big file on this system, consider making a copy on hermes or karthur.

After the server is fully integrated into the Nevis Linux cluster, it will be included in our batch proccessing system. Unlike hermes, users are expected to login to kolya; also, it's not clear that hyperthreading provides any benefits to Monte Carlos. Therefore, only two Condor queues will be implemented on this machine.

When you login to the new server, you'll notice a /usr/local partition. All three of the new servers are "donating" 200GB of storage space to the general needs of Nevis. The space on kolya is used as a backup applications server. Please do not attempt to write files or alter the directories in this area.


karthur (ATLAS server, shared with D0)

Here's a comparison between the old and new machine configurations:


"old" karthur"new" karthur
ProcessorsDual 500MhZ Pentium IIIDual 3.2GHz Intel Xeon
with hyperthreading;
four effective processor queues
RAM768 MB4 GB
size of /home4 GB35 GB
size of /data
plus /work
50 GB2.5 TB
operating systemRedhat 9Fedora Core 3

Note that in migrating files from the old machine, I've merged the old /data and /work partitions into a single /data partition on the new machine.

In addition, all the partitions have been set up using LVM. The net effect of this is that we can non-destructively adjust the partition sizes later on if we wish. In particular, the 2.5TB storage area is a RAID5 array on a chassis that has only 8 drives out of a possible 24; if we get more drives for the array, we can expand the size of /data. (Contrast this setup with hermes, which was set up without LVM: we have the awkward /data/array1, /data/array2, and /data/array3.)

Before you get too excited about this new storage space, remember our backup policy: the /data partition is not backed up!. RAID arrays have been known to fail. If you need to backup a big file on this system, consider making a copy on hermes or kolya.

After the server is fully integrated into the Nevis Linux cluster, it will be included in our batch proccessing system. Unlike hermes, users are expected to login to karthur; also, it's not clear that hyperthreading provides any benefits to Monte Carlos. Therefore, only two Condor queues will be implemented on this machine.

There is a practical reality that the most of the D0 users who were using the old karthur for one reason or another have or will do some ATLAS-related work. Although the borders are fuzzy (and will probably grow fuzzier over time), the general intent is that Nevis D0 users will continue to use the machine name they already know (karthur) even though they are now sharing an ATLAS server.

When you login to the new server, you'll notice a /usr/local partition. All three of the new servers are "donating" 200GB of storage space to the general needs of Nevis. The space on karthur is used for the applications server. Please do not attempt to write files or alter the directories in this area.


shang (DOE server)

The DOE users will recall that their old server experienced a hardware failure back in May-2005. Since then, you've been using "mini-shang", a spare box sitting on a table in the computer room at Nevis. Now you're moving up to some serious hardware.

If you want a "sneak peek" at the new server, log in to dhcp210.nevis.columbia.edu; you'll have to do so from another Nevis machine, since the firewall blocks access to a DHCP address from remote sites. Please don't try to change any files on that machine. Any changes you make will be overwritten due to a nightly synchronization between the directories on the "old" shang and the "new" one.

Here's a comparison between the old and new machine configurations:


"old" mini-shang"new" shang
Processors860MhZ Pentium IIIQuad 2GHz AMD Opteron
(the first 64-bit machine at Nevis)
RAM128 MB4 GB
size of /home33 GB35 GB
size of /data145 GB1.2 TB
operating systemRedhat 9Fedora Core 3
(compiled for x86_64)

It's reasonable to ask: You're moving to a machine with 1.5TB or so of storage; why aren't you getting a bigger /home partition than before? The answer is that you're already at the limit that we can comfortably backup. Remember our backup policy: the /home directory is backed up nightly, but the /data partition is not! RAID arrays have been known to fail; in fact, the system arrived with one bad drive which had to be replaced.

The partitions have been set up using LVM. The net effect of this is that we can non-destructively adjust the partition sizes later on if we wish. We can adjust things if it turns out that you need more space for /home and we get a backup server with more capacity.

When you login to the new server, you'll notice a /file partition. All three of the new servers are "donating" 200GB of storage space to the general needs of Nevis. The /file partition is used for the files from nevis1; when nevis1 is de-commisioned, shang will effectively be a file server for the users who did not migrate their files to the Nevis Linux cluster. There is a certain fairness to this, since most of the users with home directories that are still on nevis1 are in the DOE group.

However, I'm trying to encourage those users to migrate elsewhere; please do not attempt to directly write files or alter the directories in this area, even if your home directory is in that area. Please continue to use the indirect references to your home directory: ~, $HOME, or via automount by /a/file/<whatever>.

As noted above, this is the first 64-bit machine at Nevis. We hope to add it to the Nevis batch proccessing system, but it's not clear how the 64-bit architecture will work with the Condor software. Of course, everything should be backwards compatible, and all the legacy libraries are installed, but it's not clear.