Nevis 2005 Server Upgrades |
We're putting in three major new servers in Dec 2005. The new
servers are:
Here are the details of the replacement servers. Some of the details are in flux. |
Before I get into the configuration details, here's the proposed transition plan for shang:
At 6PM, I'm going to synchronize the files on the machine one last time. This means that any changes that you make to files on shang -- including reading your mail -- may not be saved. Log off and quit your e-mail program.
Assuming there are no other problems, I'll take take the "old" shang off the network and put in the "new" one. The transition should be fast (about five minutes). I will probably also have to restart the mail server to make sure it "sees" the new machine.
By 7PM, the new machine should be on-line. Note:This transition was completed on-schedule with no visible problems.
Here's the transition plan for kolya and karthur:
At 6PM, I'm going to synchronize the files on these machines one last time. This means that any changes that you make to files on kolya or karthur after that -- including reading your mail -- may not be saved. Log off and quit your e-mail program.
Assuming there are no other problems, I'll take take the "old" kolya and karthur off the network and put in the "new" ones. The transition should be fast (about five minutes). I will probably also have to restart the mail server to make sure it "sees" the new machines.
By 7PM, the new machines should be on-line. Note:This transition was completed on-schedule with no visible problems.
kolya (ATLAS server) |
Here's a comparison between the old and new machine configurations:
"old" kolya | "new" kolya | |
---|---|---|
Processors | Dual 1.7GhZ Intel Xeon | Dual 3.2GHz Intel Xeon
with hyperthreading; four effective processor queues |
RAM | 500 MB | 4 GB |
size of /home | 13 GB | 35 GB |
size of /data | 68 GB | 2.5 TB |
operating system | Redhat 9 | Fedora Core 3 |
In addition, all the partitions have been set up using LVM. The net effect of this is that we can non-destructively adjust the partition sizes later on if we wish. In particular, the 2.5TB storage area is a RAID5 array on a chassis that has only 8 drives out of a possible 24; if we get more drives for the array, we can expand the size of /data. (Contrast this setup with hermes, which was set up without LVM: we have the awkward /data/array1, /data/array2, and /data/array3.)
Before you get too excited about this new storage space, remember our backup policy: the /data partition is not backed up!. RAID arrays have been known to fail. If you need to backup a big file on this system, consider making a copy on hermes or karthur.
After the server is fully integrated into the Nevis Linux cluster, it will be included in our batch proccessing system. Unlike hermes, users are expected to login to kolya; also, it's not clear that hyperthreading provides any benefits to Monte Carlos. Therefore, only two Condor queues will be implemented on this machine.
When you login to the new server, you'll notice a /usr/local partition. All three of the new servers are "donating" 200GB of storage space to the general needs of Nevis. The space on kolya is used as a backup applications server. Please do not attempt to write files or alter the directories in this area.
karthur (ATLAS server, shared with D0) |
Here's a comparison between the old and new machine configurations:
"old" karthur | "new" karthur | |
---|---|---|
Processors | Dual 500MhZ Pentium III | Dual 3.2GHz Intel Xeon
with hyperthreading; four effective processor queues |
RAM | 768 MB | 4 GB |
size of /home | 4 GB | 35 GB |
size of /data plus /work | 50 GB | 2.5 TB |
operating system | Redhat 9 | Fedora Core 3 |
Note that in migrating files from the old machine, I've merged the old /data and /work partitions into a single /data partition on the new machine.
In addition, all the partitions have been set up using LVM. The net effect of this is that we can non-destructively adjust the partition sizes later on if we wish. In particular, the 2.5TB storage area is a RAID5 array on a chassis that has only 8 drives out of a possible 24; if we get more drives for the array, we can expand the size of /data. (Contrast this setup with hermes, which was set up without LVM: we have the awkward /data/array1, /data/array2, and /data/array3.)
Before you get too excited about this new storage space, remember our backup policy: the /data partition is not backed up!. RAID arrays have been known to fail. If you need to backup a big file on this system, consider making a copy on hermes or kolya.
After the server is fully integrated into the Nevis Linux cluster, it will be included in our batch proccessing system. Unlike hermes, users are expected to login to karthur; also, it's not clear that hyperthreading provides any benefits to Monte Carlos. Therefore, only two Condor queues will be implemented on this machine.
There is a practical reality that the most of the D0 users who were using the old karthur for one reason or another have or will do some ATLAS-related work. Although the borders are fuzzy (and will probably grow fuzzier over time), the general intent is that Nevis D0 users will continue to use the machine name they already know (karthur) even though they are now sharing an ATLAS server.
When you login to the new server, you'll notice a /usr/local partition. All three of the new servers are "donating" 200GB of storage space to the general needs of Nevis. The space on karthur is used for the applications server. Please do not attempt to write files or alter the directories in this area.
shang (DOE server) |
The DOE users will recall that their old server experienced a hardware failure back in May-2005. Since then, you've been using "mini-shang", a spare box sitting on a table in the computer room at Nevis. Now you're moving up to some serious hardware.
If you want a "sneak peek" at the new server, log in to dhcp210.nevis.columbia.edu; you'll have to do so from another Nevis machine, since the firewall blocks access to a DHCP address from remote sites. Please don't try to change any files on that machine. Any changes you make will be overwritten due to a nightly synchronization between the directories on the "old" shang and the "new" one.
Here's a comparison between the old and new machine configurations:
"old" mini-shang | "new" shang | |
---|---|---|
Processors | 860MhZ Pentium III | Quad 2GHz AMD Opteron
(the first 64-bit machine at Nevis) |
RAM | 128 MB | 4 GB |
size of /home | 33 GB | 35 GB |
size of /data | 145 GB | 1.2 TB |
operating system | Redhat 9 | Fedora Core 3 (compiled for x86_64) |
It's reasonable to ask: You're moving to a machine with 1.5TB or so of storage; why aren't you getting a bigger /home partition than before? The answer is that you're already at the limit that we can comfortably backup. Remember our backup policy: the /home directory is backed up nightly, but the /data partition is not! RAID arrays have been known to fail; in fact, the system arrived with one bad drive which had to be replaced.
The partitions have been set up using LVM. The net effect of this is that we can non-destructively adjust the partition sizes later on if we wish. We can adjust things if it turns out that you need more space for /home and we get a backup server with more capacity.
When you login to the new server, you'll notice a /file partition. All three of the new servers are "donating" 200GB of storage space to the general needs of Nevis. The /file partition is used for the files from nevis1; when nevis1 is de-commisioned, shang will effectively be a file server for the users who did not migrate their files to the Nevis Linux cluster. There is a certain fairness to this, since most of the users with home directories that are still on nevis1 are in the DOE group.
However, I'm trying to encourage those users to migrate elsewhere; please do not attempt to directly write files or alter the directories in this area, even if your home directory is in that area. Please continue to use the indirect references to your home directory: ~, $HOME, or via automount by /a/file/<whatever>.
As noted above, this is the first 64-bit machine at Nevis. We hope to add it to the Nevis batch proccessing system, but it's not clear how the 64-bit architecture will work with the Condor software. Of course, everything should be backwards compatible, and all the legacy libraries are installed, but it's not clear.