info specific to bow, minnewanka, waterton

From Fluids Wiki
Revision as of 19:06, 19 December 2018 by Rblander (talk | contribs) (Explain use of mpirun across the array for the time being, until we integrate that with SLURM.)
Jump to navigation Jump to search


A CFI proposal by Profs. Stastna, Lamb, and Waite resulted in acquisition of new servers in 2017.

Hostnames and Hardware

  • bow.math.private.uwaterloo.ca (SGI C2112-GP2)
  • minnewanka.math.private.uwaterloo.ca (SGI C1104-GP2)
  • waterton.math.private.uwaterloo.ca (SGI C1104-GP2)

Each machine has:

  • 2x Intel Xeon E5-2690v4 (Broadwell) CPUs, 2.6 GHz, 14 core
  • 128 GB RAM
  • 40 gigabit private network link for faster MPI
  • 1 gigabit public network link (except bow has 10 gigabit)

Operating System Software and vendor add-ons

  • CentOS 7.4
  • SGI Foundation 2.16
  • SGI Accelerate 1.14
  • SGI Performance Suite 1.14 with an accelerated MPI called MPT 2.16

System Administration

  • MFCF administers these machines. Users do not have root access.
  • Home directories are common with the previous machines via NFS
  • System management is done by SaltStack software, unlike hood and thelon which are managed using MFCF's XHier
    • this means things will be different
    • applications are not under the /software directory anymore
    • default PATH does not have everything in it
    • details below

Application Software, Development Environment

Wherever satisfactory versions are provided by the OS distribution, things like library and include files are installed at default locations such as /usr/lib64 and /usr/include.

Third party applications that come as precompiled bundles (but not Linux RPMs) are installed under /opt. E.g., /opt/visit/bin/visit

Third party applications that we have to compile from source are installed under /usr/local.

Login shell environment

  • the recommended .cshrc and .profile files set up your environment automatically with recommended compilers, libraries, etc. mentioned below to work with the configuration files included with models such as SPINS, MITgcm, IGW
  • some optional software comes in "module" format (see man module)
    • to see a list of available modules, run module avail
      • this also shows which ones you have loaded already
    • load one using its exact name as shown in the list e.g. Matlab, module load matlab/2017a-research
    • use module unload modulename to unload a module, e.g. if you want to switch to an alternative one
  • some optional software comes in "Software Collection" format (see man scl)
    • to see a list of available collections, run scl -l
    • load one using its name followed by your preferred shell (it starts a sub-shell), e.g. scl enable rh-python36 /bin/bash

Compilers

  • gcc 4.8.5 is in standard search rules, no need to load it with a module command
  • Intel compilers etc. are not yet installed

MPI environments

Choose from MPICH, OpenMPI, and MPT

  • MPICH 3.0
    • module load mpi/mpich-x86_64
  • OpenMPI 1.10
    • module load mpi/openmpi-x86_64
  • SGI/HPE MPT
    • This claims to be a tuned MPI that should perform best. You may wish to run some comparisons to see which MPI works best for you.
    • Includes an mpicc command.
    • Documentation at /opt/hpe/hpc/mpt/mpt-2.16/doc/README.relnotes
    • module load mpt/2.16

Matlab

  • module load matlab/2017a-research

Python

  • default python is 2.7.5. Default python3 is 3.4.5
  • NumPY, SciPY, etc. are installed
  • Python 3.6 (including matching pip, NumPy, SciPy, virtualenv, etc.) is available using scl enable rh-python36 /bin/bash (or shell of your choice)
    • Once pip has been loaded, you can install other packages, such as matplotlib, via: pip install --user matplotlib.
    • You may also want to upgrade pip using: pip install --upgrade --user pip

Models

  • MIT GCM
    • see config file called bow_mpt in the MITgcm section of the main Models page in this wiki
    • need to load the mpt/2.16 module first
    • ensure you call genmake2 with the -mpi flag
    • GNU Fortran compiler seems pickier about some things than Intel compiler so you may see new complaints about previously-working code
    • here is a sample SLURM MITgcm script. See more SLURM information below.
  • NCL NCARG version 6.4.0, without OPeNDAP
    • set environment variable NCARG_ROOT to /opt/ncl-6.4.0 and add $NCARG_ROOT/bin to your $PATH
    • optionally, make a .hluresfile in your home directory if you want to customize your NCL graphical environment
  • SPINS
    • so far we have one system configuration file
      • bow.gcc.mpt.blas.sh
      • with symbolic link to that called bow.sh, since it's the only one we have so far
    • as the name suggests, it is set up to expect GCC, SGI/HPE MPT, and default BLAS and LAPACK
      • so you need to load the MPT module first
    • we expect to develop alternative configuration files for other compilers, MPI implementations, and numerical libraries for comparison to find an optimum set-up
    • the spins2netcdf SPINS to NetCDF converter is available in standard search rules at /usr/local/bin/spins2netcdf (Version 2.0.0.0 from April 2018)

Visualization

  • ParaView 5.4.1
    • installed at /opt/paraview/
  • VisIT 2.13.0
    • installed at /opt/visit/

Scheduler

These machines use SLURM to schedule jobs. This means that instead of running jobs live, they need to be submitted to a queue. See the Submitting Jobs section of the Graham Tips page for more about SLURM

Exception for MPI jobs that span the machines

We don't yet have SLURM integrated with the MPI configuration that lets you span an MPI job across multiple machines. In the short term, while there is not yet contention for resources, you may run such jobs outside of SLURM. With the mpirun command, use "-a mountainlakes" to specify the machine-spanning array, and comma-separated special hostnames with -np for number of processes. The special hostnames have "-pn" (for "private network") appended to identify the dedicated high-speed network interfaces. For example:

% mpirun -a mountainlakes waterton-pn -np 2, minnewanka-pn -np 2 ./a.out
Hello, world! I am 2 of 4 on host minnewanka.
Hello, world! I am 0 of 4 on host waterton.
Hello, world! I am 3 of 4 on host minnewanka.
Hello, world! I am 1 of 4 on host waterton.

Once we integrate SLURM with the MPI array, you'll use the following normal method instead.

(See "man array" for ways you can inspect activities across the array of machines.)

Expected use of SLURM

  • SLURM head node is fluids-pr1-01.math.private, alias fluidssubmit.math.private
    • runs a different OS from compute nodes
    • can't compile there, only submit jobs there
  • need SLURM module in your environment
    • recommended .cshrc and .profile do that for you (see the Login script page)
  • must include #SBATCH --partition=<partition_name> flag in your submit script
    • use partition name fluids_short for jobs under 8 hours
    • use partition name fluids_long for jobs up to 40 days