info specific to bow, minnewanka, waterton
A CFI proposal by Profs. Stastna, Lamb, and Waite resulted in acquisition of new servers in 2017.
Hostnames and Hardware
- bow.math.private.uwaterloo.ca (SGI C2112-GP2)
- minnewanka.math.private.uwaterloo.ca (SGI C1104-GP2)
- waterton.math.private.uwaterloo.ca (SGI C1104-GP2)
Each machine has:
- 2x Intel Xeon E5-2690v4 (Broadwell) CPUs, 2.6 GHz, 14 core
- 128 GB RAM
- 40 gigabit private network link for faster MPI (see "MPI jobs that span the machines" section below)
- 1 gigabit public network link (except bow has 10 gigabit)
Operating System Software and vendor add-ons
- CentOS 7.9
- HPE (SGI) MPI 1.9
- includes features formerly in SGI Performance Suite and Accelerate
- accelerated MPI called MPT (Message Passing Toolkit) 2.25
- MPI performance analyzer and diagnostic suite called MPInside
- Intel oneAPI
- compilers, numerical libraries, MPI, performance tools
System Administration
- MFCF administers these machines. Users do not have root access.
- System management is done by SaltStack software, unlike hood and thelon which are managed using MFCF's XHier
- this means things will be different
- applications are not under the /software directory anymore
- default PATH does not have everything in it
- details below
File Systems
- home directories are NFS-mounted from hood.math
- that's where you are when you log in
- do not run models from your home directory: it's too small
- the /fsys2, /fsys3, /fsys4, /fsys5, /fsys6, /fsys7 file systems are NFS-mounted from hood.math
- these are under /nfs/hood.math/
- /scratch is a very large local file system on bow, shared over the network by minnewanka and waterton
- if you don't have a personal drectory here, ask MFCF
- faster for models to use this file system than NFS-mounted file systems from hood
- /fsys1 is for application and management software, no user files
Application Software, Development Environment
Wherever satisfactory versions are provided by the OS distribution, things like library and include files are installed at default locations such as /usr/lib64 and /usr/include.
Third party applications that come as precompiled bundles (but not Linux RPMs) are installed under /opt. E.g., /opt/visit/bin/visit
Third party applications that we have to compile from source are installed under /usr/local.
Login shell environment
- the recommended .cshrc and .profile files set up your environment automatically with recommended compilers, libraries, etc. mentioned below to work with the configuration files included with models such as SPINS, MITgcm, IGW
- see the Login script wiki page for details
- some optional software comes in "module" format (see
man module
)- to see a list of available modules, run
module avail
- this also shows which ones you have loaded already
- load one using its exact name as shown in the list e.g. MATLAB,
module load matlab/2017a-research
- use
module unload modulename
to unload a module, e.g. if you want to switch to an alternative one
- to see a list of available modules, run
- some optional software comes in "Software Collection" format (see
man scl
)- to see a list of available collections, run
scl -l
- load one using its name followed by your preferred shell (it starts a sub-shell), e.g.
scl enable rh-python36 /bin/bash
- to see a list of available collections, run
Compilers
- gcc 4.8.5 is in standard search rules, no need to load it with a module command
- gcc 9.1.1 is available with devtoolset-9
- gcc 9.3.1 on CentOS 7.9
scl enable devtoolset-9 /bin/bash
- Intel oneAPI development suite, version 2021.1.0
- icc, icpc, ifort compilers
- MKL numerical library
- MPI parallel tools
- profiler, analyzer, debugger, etc.
source /opt/intel/oneapi/setvars.sh
- documentation at
/opt/intel/oneapi/readme-get-started-linux-base-kit.html
and/opt/intel/oneapi/readme-get-started-linux-hpc-kit.html
MPI environments
Choose from MPICH, OpenMPI, HPE/SGI MPT, and Intel MPI
- MPICH 3.0
module load mpi/mpich-x86_64
- OpenMPI 1.10
module load mpi/openmpi-x86_64
- HPE/SGI MPT
- This a tuned MPI that should perform best. You may wish to run some comparisons to see which MPI works best for you.
module load hmpt/2.25
for a version that is ABI-compatible with other MPICH-based MPImodule load mpt/2.25
for a version that is not compatible in that way- Includes performance analysis and diagnostic tools, HPInside
module load MPInside/4.2.52
- Documentation at /usr/share/doc/packages/hpe-mpi-1/
- Intel oneAPI MPI
source /opt/intel/oneapi/setvars.sh
MATLAB
- check
module avail
to see available versions module load matlab/2017a-research
, ormodule load matlab/2020a-research
Python
- default python is 2.7.5. Default python3 is 3.6.8
- NumPY, SciPY, etc. are installed
- prior to late December 2021, the default python3 was 3.4.5 and one had to use the SCL for rh-python36 to get 3.6
- if you had installed optional modules into your 3.4 environment, you'll need to repeat for 3.6
- you can install other packages, such as matplotlib, via:
pip3 install --user matplotlib
Models
- MIT GCM
- see config file called bow_mpt in the MITgcm section of the main Models page in this wiki
- first need to load an MPI module, such as mpt/2.16 or hmpt/2.25
- ensure you call genmake2 with the -mpi flag
- GNU Fortran compiler seems pickier about some things than Intel compiler so you may see new complaints about previously-working code
- here is a sample SLURM MITgcm script. See more SLURM information below.
- NCL NCARG version 6.4.0, without OPeNDAP
- set environment variable NCARG_ROOT to /opt/ncl-6.4.0 and add $NCARG_ROOT/bin to your $PATH
- optionally, make a .hluresfile in your home directory if you want to customize your NCL graphical environment
- SPINS
- so far we have one system configuration file
- bow.gcc.mpt.blas.sh
- with symbolic link to that called bow.sh, since it's the only one we have so far
- as the name suggests, it is set up to expect GCC, SGI/HPE MPT, and default BLAS and LAPACK
- so you need to load the MPT module first
- we expect to develop alternative configuration files for other compilers, MPI implementations, and numerical libraries for comparison to find an optimum set-up
- the spins2netcdf SPINS to NetCDF converter is available in standard search rules at /usr/local/bin/spins2netcdf (Version 2.0.0.0 from April 2018)
- so far we have one system configuration file
Visualization
- ParaView 5.4.1
- installed at /opt/paraview/
- VisIT 2.13.0
- installed at /opt/visit/
- FFMPEG
- installed at /opt/ffmpeg/
- the ImageMagick suite
- commands such as 'display' and 'convert'; see "man imagemagick"
Scheduler
These machines use SLURM to schedule jobs. This means that instead of running jobs live, they need to be submitted to a queue. See the Submitting Jobs section of the Graham Tips page for more about SLURM
Expected use of SLURM
- SLURM head node is fluids-pr1-01.math.private, alias fluidssubmit.math.private
- runs a different OS from compute nodes
- can't compile there, only submit jobs there
- need SLURM module in your environment
- recommended .cshrc and .profile do that for you (see the Login script page)
- must include
#SBATCH --partition=<partition_name>
flag in your submit script- use partition name
fluids_short
for jobs under 8 hours - use partition name
fluids_long
for jobs up to 40 days
- use partition name
MPI jobs that span the machines: The easy way (but not recommended)
When you want to span an MPI job across multiple machines, performance should be better if you force the use of the private 40 Gb network that the three machines share. This is a bit awkward with SLURM (see next section). As long as there is no contention for resources, you may run such jobs outside of SLURM. With the mpirun command, use "-a mountainlakes" to specify the machine-spanning array, and comma-separated special hostnames with -np for number of processes. The special hostnames have "-pn" (for "private network") appended to identify the dedicated high-speed network interfaces. For example:
% mpirun -a mountainlakes waterton-pn -np 2, minnewanka-pn -np 2 ./a.out Hello, world! I am 2 of 4 on host minnewanka. Hello, world! I am 0 of 4 on host waterton. Hello, world! I am 3 of 4 on host minnewanka. Hello, world! I am 1 of 4 on host waterton.
See "man array" for ways you can inspect activities across the array of machines.
MPI jobs that span the machines: The awkward way (recommended)
If there is any contention for resources, use SLURM. Special steps are needed to force it to use the private high-speed network because the SLURM head node cannot see that network itself. By using the --nodelist option, the SLURM_JOB_NODELIST environment variable, and some list manipulation, we can pass the special private network hostnames (suffix "-pn") through to the mpirun command. Here is an example sbatch script:
#!/bin/bash #SBATCH --mail-user=<userid>@uwaterloo.ca #SBATCH --mail-type=BEGIN,END,FAIL #SBATCH --job-name="mpi_job" #SBATCH --account=igw_acct #<== your slurm account #SBATCH --partition=fluids_long #<== partition: fluids_long or fluids_short #SBATCH --time=00:01:00 #SBATCH --nodelist=waterton,minnewanka #<== specify machines #SBATCH --ntasks=16 #SBATCH --ntasks-per-node=8 #SBATCH --mem-per-cpu=1GB #SBATCH --output=%j.out #SBATCH --error=%j.err # Your code below this line # First set the environment for using mpt module load mpt # compile your code make clean make # create an array of nodes list from SLURM_JOB_NODELIST env. variable nodes=($(echo $SLURM_JOB_NODELIST | tr "," " ")) # debugging messages echo "==========================================" echo "slurm ntasks = $SLURM_NTASKS " echo "slurm ntasks/Node = $SLURM_NTASKS_PER_NODE" echo "slurm Nodes = $SLURM_JOB_NODELIST" echo "slurm Node[0] = ${nodes[0]}" echo "slurm Nodes[1] = ${nodes[1]}" echo "==========================================" echo echo set -x mpirun -a mountainlakes ${nodes[0]}-pn -np $SLURM_NTASKS_PER_NODE, ${nodes[1]}-pn -np $SLURM_NTASKS_PER_NODE ./my_mpi_program