sample SLURM MITgcm script
Sample SLURM MITgcm script
Non-trivial jobs on the mountain lake machines are expected to be submitted via SLURM. Below is a sample SLURM script called submit.sh for an MITgcm run. Some important things to notice:
- run time estimate of three minutes for this example
- memory estimate of 4 GB per CPU
- running on waterton
- using the fluids_short partition (or queue)
- 4 MPI threads
- This must agree with the model dimension defined in the SIZE.h file in the model's code directory, or the mpirun program will fail.
- In this example, SIZE.h has nPx = nPy = 2, so x times y is 4.
- setting SBATCH --ntasks=4 results in environment variable $SLURM_NTASKS also being set so we can conveniently use that for the mpirun -np flag.
- the required modules are explicitly loaded, even though they must already be in your login environment for you to have compiled the model, because the SLURM head node has to run this on your behalf, you're not doing it directly
To invoke this SLURM script, either log in to the SLURM head node for the Fluids machines (described at info specific to bow, minnewanka, waterton), and run
sbatch ~/path wherever you put it/submit.sh
or invoke it remotely:
ssh fluidssubmit.math.private "sbatch ~/path to wherever you put it/submit.sh"
Invoking it remotely avoids the nuisance of logging in to the SLURM head node, but if you want to run other SLURM commands such as checking the queue or cancelling your job, you might as well log in there.
The sample script follows.
#!/bin/bash
#SBATCH --time=00-03:00
#SBATCH --job-name="MITgcm_test"
#SBATCH --output=results
#SBATCH --mem-per-cpu=4G
#SBATCH --nodelist=waterton
#SBATCH --partition=fluids_short
#SBATCH --ntasks=4
module load slurm/17.11.8
module load mpt/2.16
cd ~/MITgcm/test_trial/run.slurm
mpirun -np $SLURM_NTASKS ./mitgcmuv < /dev/null