info specific to sutton, rondeau, pelee
A CFI proposal by Profs. Stastna, Poulin, and Lamb resulted in acquisition of new servers in late 2023.
Hostnames and Hardware
- sutton.math.private.uwaterloo.ca (Dell R7625)
- 2 AMD EPYC 9454 2.75 GHz 48-core CPUs, 768 GB RAM, ~90 TB storage
- pelee.math.private.uwaterloo.ca (Dell XE8545)
- 2x AMD EPYC 7713 2.0 GHz 64-core CPUs, 1 TB RAM, four NVIDIA A100 40 GB GPUs
- rondeau.math.private.uwaterloo.ca (Dell XE8545)
- same as pelee
Operating System Software and vendor add-ons
- Ubuntu 22.04 LTS
System Administration
- MFCF administers these machines. Users do not have root access.
- System management is done by SaltStack software, unlike hood which is managed using MFCF's XHier
- this means things will be different
- applications are not under the /software directory anymore
- default PATH does not have everything in it
- details below
File System Storage
- home directories are NFS-mounted from sutton
- That's where you are when you log in.
- Do not run models from your home directory: the storage device used for home directories is 'too small' and could fill up quickly!
- /fsys1 and /fsys2 are large file systems on sutton, and are exported by NFS to the others.
- These are under /mnt/autofs/sutton.math/
- Just 'cd' there and they will appear.
- /scratch is a large file system on bow exported by NFS to the others
- This is under /mnt/autofs/bow.math/.
- Just 'cd' there and it appears.
- /local_scratch is a 3.4 TB local file system on each of pelee and rondeau.
- This will be fastest for saving output of model runs.
- When done, copy important results from /local_scratch to somewhere else (i.e. fsysN) for safekeeping and delete from /local_scratch so there's room for others to do their runs.
- If you don't have a personal directory in these various places, ask MFCF to set one up for you.
Storage Utilization Guidelines
It is important to follow standard storage utilization guidelines and best practice principles when handling large amounts of data so as to not accidentally cause unintended interruptions to the shared computing environment, thereby disrupting the work of other researchers.
To avoid causing problems on a given computing server, it is important to know what the physical limitations (and not just storage limits) are to the machine itself at the time of usage. Here are some general concepts to consider:
1. How busy is the machine (how many CPUs are being used and how 'heavy' is the utilization) right now?
- This can be checked using a utility like 'top' or 'htop' or by checking howbusy.math - fluids. If the machine is very busy, what kind of work can I do with it right now without causing a disruption?
2. What about main CPU system memory (i.e., RAM)? Is there enough free memory for me to do what I am trying to accomplish?
- Again, check howbusy.math or top/htop for how much free memory is available.
- Generally speaking, try to keep any single file under 1 GB in size so it will easily fit into main memory and not take too long to load and/or process.
- If you end up with files much bigger than 1 GB, think about how the data can be partitioned, sliced, or chopped up.
- For example, most time-varying simulations should be sliced along the time-dimension (one output file per physical unit of 'output time').
- If time-sliced files are still much larger than 1 GB, consider partitioning (dividing up) your spatial grid and 'stitching' the files together later during visualization and/or analysis.
- The ParaView visualization tool will quite easily, automatically, and effectively visualize multiple files at once on the same set of axes. It works well with NetCDF and vtk output formats, as well as some others.
3. How much storage will my job consume? Is there enough free storage left on the target device?
- Consider what you are outputting and how big the result will be. Is the storage filesystem you're targeting for output files going to be big enough to hold everything and have room left over for others to continue to do their work?
- Quick back-of-the-envelope calculations can help piece this together.
Example: I am running a 3D SPINS simulation with grid specification Nx=256, Nz=128, Ny=64. I am outputting four fields (density and the velocity field components), will have about 100 000 time-steps and am outputting every 50th time-step.
Then,
1 field in double precision (storage in bytes) = Nx * Ny * Nz * sizeof(double) = 256*128*64*(8 bytes) = 16,777,216 B (about 17 MB).
Four fields in double precision = 17 MB * 4 = 68 MB
Number of outputs is (100 000 / 50) = 2000.
Total storage consumption (MB) = (Number of outputs) * (Storage required per output occurrence) = 2000 * 17 = 34000 MB or 34 GB.
- The 'df' UNIX utility can also give a system-wide summary of all available storage.
- Run 'df -h' for "human-readable" output.
- Lastly, a storage utilization report (from 'df -h') for all storage volumes, including those that may not be mounted yet, is provided upon login of these machines as well.
Application Software, Development Environment
Wherever satisfactory versions are provided by the OS distribution, things like library and include files are installed at default locations such as /usr/lib64 and /usr/include.
Third party applications are installed as modules under /opt. Run the command module avail
to see what's available.
Login shell environment
- the recommended .cshrc and .profile files set up your environment automatically with recommended compilers, libraries, etc. mentioned below to work with the configuration files included with models such as SPINS, MITgcm, IGW
- see the Login script wiki page for details
- some optional software comes in "module" format (see
man module
)- to see a list of available modules, run
module avail
- this also shows which ones you have loaded already
- load one using its exact name as shown in the list e.g. MATLAB,
module load matlab/2022b-research
- use
module unload modulename
to unload a module, e.g. if you want to switch to an alternative one
- to see a list of available modules, run
Compilers
- gcc 11.4 is in standard search rules, no need to load it with a module command
MPI environments
- MPICH is not installed yet. Let MFCF know if it is needed.
- OpenMPI 4.1.2
MATLAB
- version 2202b
module load matlab/2022b-research
- check
module avail
to see available versions
Python
- default python3 is 3.10.x
- NumPY, SciPY, etc. are installed
- you can install other packages, such as matplotlib, via:
pip3 install --user matplotlib
Models
- MIT GCM - in progress
- eventually, see config file called sutton_gcc_openmpi in the MITgcm section of the main Models page in this wiki
- ensure you call genmake2 with the -mpi flag
- NCL NCARG - not installed yet
- set environment variable NCARG_ROOT to /opt/ncl-6.4.0 and add $NCARG_ROOT/bin to your $PATH
- optionally, make a .hluresfile in your home directory if you want to customize your NCL graphical environment
- SPINS
- use the sutton.gcc.openmpi.blas.sh configuration file from the SPINS systems folder
- spins2netcdf SPINS to NetCDF converter not installed yet
Visualization
- ParaView - 5.10.1
module load paraview/5.10.1
- VisIT - not installed yet
- FFMPEG 4.4.x
- in standard search rules
- the ImageMagick suite
- commands such as 'display' and 'convert'; see "man imagemagick"
GPU computing
- pelee and rondeau have GPUs and the CUDA development environment
module load cuda/12.3.2
- use
module avail
to check for other versions