# Shared filesystems We're almost finished with the subject of batch systems. Hang in there. There's a really sick xkcd cartoon at the end. After lecturing you on keeping your execution environment clean and independent of outside directories, I have to confess: I wasn't telling you the whole truth. The reality is that in many batch-node installations, there is an external filesystem of some sort that all the nodes share. Here are a couple of reasons why that might be needed: - Software libraries, such as those maintained in in [conda](https://docs.conda.io/projects/conda/en/latest/index.html). Keeping such libraries in a shared filesystem may be the only practical way to assure that all the batch nodes have access to the same software.[^adm] - Large data files. Condor is pretty good about transferring files, but there can be problems when those files get bigger than 1GB or so. It may be easier to read those files via a shared filesystem, even though there'll be a speed and/or bandwidth penalty for many programs reading a large file over the network at the same time. There's no single standard for shared filesystems and condor. The two accelerator labs with which I'm the most familar, CERN and Fermilab, each have their own method. You've already guessed that I'm going to describe what I've set up on the Nevis particle-physics batch farms, because it's the one you're mostly likely to use if you've read my condor pages so far. From this point forward, everything I describe is in the context of the Nevis particle-physics [Linux cluster](https://twiki.nevis.columbia.edu/twiki/bin/view/Main/LinuxCluster). If you're not in one of the Nevis particle-physics group, or you're outside Nevis, you'll have to ask how they handle their shared filesystem (if any). ## Nevis shared filesystem The cluster shares its files via [NFS](https://www.atera.com/blog/what-is-nfs-understanding-the-network-file-system/), a standard protocol for systems to view other directories. At this point, you may want to read the Nevis wiki page on [automount](https://twiki.nevis.columbia.edu/twiki/bin/view/Main/Automount). The basic summary is if a system (let's say `olga`) has a partition (perhaps `share` as an example), then to view that partition use the path `/nevis/olga/share`.[^milne] Anyone can set up [permissions](https://www.guru99.com/file-permissions.html) to restrict others from viewing their directories. For the most part, you can view others' directories without having accounts on the individual machines. That's why you can view the directory `~seligman/root-class/`, which expands to `/nevis/tanya/home/seligman/root-class/`, even though you can't login to my desktop computer `tanya`. ## Nevis filesystems and the batch farms At this point, you may be thinking, "Hey, I don't have to bother with all the {ref}`resource planning` stuff.[^stuff] Bill just said everything is shared, right? So I'll just copy-and-paste the same lines I use to run my program into a `.sh` file and submit that. Easy-peasy!" Sorry, but that won't work. The reasons why: - It may be troublesome to figure out how to use `input=`, `transfer-input-files=`, and `transfer-output-files=`. But condor's file-transfer mechanism is much more robust than NFS. I've seen systems running hundreds of [shadow](https://htcondor.readthedocs.io/en/latest/users-manual/managing-a-job.html) processes without slowing down the system from which the jobs' files came.
- The NFS file-sharing scheme has been deliberately set up in such a way that you *can't* refer to your home directory within a condor job. It's reasonable to ask "Why not?" Consider what might happen if the batch nodes could access your home directory, and all the batch nodes on a cluster wanted to access that directory at once: :::{figure-md} dont-do-fig :align: center
This is what we *don't* permit the NFS-based shared filesystem to do.
:::
NFS is a robust protocol, but handling hundreds of access requests to
the same location on a single partition is a bit much for it. If it's
just reading, the server may slow down so much that it becomes
unusable, which irritates any users who are logged into that server
to do their work. If those hundreds of jobs are *writing* to that
directory at the same time, the server will crash.
The servers with users' home directories are called [login
servers](https://twiki.nevis.columbia.edu/twiki/bin/view/Main/DiskSharing),
because those are the servers that users primarily login to. If a
login server slows down or crashes, users can't login. Since our mail
server requires a user's home directory be available to process email,
if a login server slows down or crashes, our email slows down or
crashes.[^email]
Each Nevis particle-physics group resolves this issue by having
dedicated file servers that are distinct from their login
servers. Remember the diagram I showed you at the beginning of the
first class?
:::{figure-md} LinuxCluster-fig
:align: center
On the first day of class, I predicted that you'd forget this diagram.
Was my prediction correct?
:::
The file servers are the smaller boxes to the right in the above
figure. Each one of those file servers has at least two partitions:
- `/share`
This partition is meant for read-only access by the
batch farms. `/share` is intended for software libraries or similar
resources that a job may need in order to execute.[^backup] The size of the
`/share` partition is typically on the order of 150GB, and it's
shared among the users in that research group.
- `/data`
This partition is meant for big data files (either
inputs or outputs), and any other recreatable files associated
with your jobs.[^recreate] Typically `/data` is about a dozen or more
terabytes, though that varies widely between file servers.
This is how it looks:
:::{figure-md} do-do-fig
:align: center
This is what we permit the NFS-based shared filesystem to do.
:::
It's still possible to crash a file server in this way. But if you do,
it only affects your research group, not all the Nevis
particle-physics groups, faculty, or email. Your *group* may irritated
with you, but that's a different story.[^havent]
## Planning a batch job
Here's the general work flow:
- Develop a program or procedure in your home directory. Get the
program to work on a small scale.
- Once you're confident you have a working program, copy the
relevant files to a `/share` partition. Typically you can do this
(using `olga` as an example file server):
> mkdir -p /nevis/olga/share/$USER
> cp -arv
A sketch of how one might "bring the job to the data." In this
example, our program needs to access `bigfile4.root`.
:::
To some degree this brings us all the way back to {numref}`Figure %s