sample SLURM MITgcm script

From Fluids Wiki
Jump to navigation Jump to search

Sample SLURM MITgcm script

Non-trivial jobs on the mountain lake machines are expected to be submitted via SLURM. Below is a sample SLURM script called submit.sh for an MITgcm run. Some important things to notice:

  • run time estimate of three minutes for this example
  • memory estimate of 4 GB per CPU
  • running on waterton
  • using the fluids_short partition (or queue)
  • 4 MPI threads
    • This must agree with the model dimension defined in the SIZE.h file in the model's code directory, or the mpirun program will fail.
    • In this example, SIZE.h has nPx = nPy = 2, so x times y is 4.
    • setting SBATCH --ntasks=4 results in environment variable $SLURM_NTASKS also being set so we can conveniently use that for the mpirun -np flag.
  • the required modules are explicitly loaded, even though they must already be in your login environment for you to have compiled the model, because the SLURM head node has to run this on your behalf, you're not doing it directly

To invoke this SLURM script, either log in to the SLURM head node for the Fluids machines (described at info specific to bow, minnewanka, waterton), and run

  • sbatch ~/path wherever you put it/submit.sh

or invoke it remotely:

  • ssh fluidssubmit.math.private "sbatch ~/path to wherever you put it/submit.sh"

Invoking it remotely avoids the nuisance of logging in to the SLURM head node, but if you want to run other SLURM commands such as checking the queue or cancelling your job, you might as well log in there.

The sample script follows.

#!/bin/bash
#SBATCH --time=00-03:00
#SBATCH --job-name="MITgcm_test"
#SBATCH --output=results
#SBATCH --mem-per-cpu=4G
#SBATCH --nodelist=waterton
#SBATCH --partition=fluids_short
#SBATCH --ntasks=4
module load slurm/17.11.8
module load mpt/2.16
cd ~/MITgcm/test_trial/run.slurm
mpirun -np $SLURM_NTASKS ./mitgcmuv < /dev/null