Graham Tips: Difference between revisions

From Fluids Wiki
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 1: Line 1:
'''''Note''''' This page is still being developed.
The following are a list of scripts, functions and aliases that will make your life on the Sharcnet Graham cluster much easier. Section 1, [[#Submitting Jobs|Submitting Jobs]], provides submission scripts to your jobs. Section 2, [[#~/.bashrc|~/.bashrc]], lists other useful commands for checking job information (such as memory usage, expected start time, etc), changing directories to a running job.


== ~/.bashrc ==
== Submitting Jobs ==
 
The following are recommendations for the use of our contributed resources on Graham. As the UW fluids group, we have been allocated 832 processor years (for 2017-2018). Until these processor years have been expended, we have priority to run jobs on any processor ahead of regular users. Much of what you need to know for submitting and monitoring jobs is presented here. Further questions may be answered on the Compute Canada Graham page [https://docs.computecanada.ca/wiki/Running_jobs#Accounts_and_projects Running jobs].
 
Remember to be courteous about your memory usage and use only what you need! However, if you are using entire nodes then this does not apply since no one else will have access to that memory.
 
=== Submit script ===
An MPI job may be submitted to the Graham scheduler with either of the following bash scripts (A suggested name for them is submit.sh). The first script is for use of entire nodes, while the second allows for the processors to be spread out over many nodes. The first will take longer in queue, but should in theory run quicker since all processors have fewer connections. The second will likely start quicker since there is no requirement to wait for entire nodes to become available and can start whenever enough processors become available. Each script is broken into two parts: 1) the run dependent parameters and 2) the permanent parameters.
 
To submit the job execute <code>sbatch submit.sh</code> in a login window. Since this file won't go anywhere it is a handy way to look up what submission parameters you used (number of processors, memory, etc).
 
One thing to be aware of is that the memory requirement MUST be an integer. Decimals will not be accepted. The unit can be changed to Gigs (G) or other.
 
The requested time will play are large role in the time the job spends in queue. This is because the nodes on Graham have been specified to only accept jobs with a run time less than specific values. That is, there are far fewer nodes that accept jobs with a run time of 28 days than there are which accept a 3 hour run time. This is because the 3 hour job can run on any node (through back-filling into gaps on the other nodes), while the larger duration job will only run on nodes which accept it. The partitions are split into the following ways:
*3 hours or less
*12 hours or less
*24 hours (1 day) or less
*72 hours (3 days) or less
*7 days or less
*28 days or less
 
In summary, pick a value on this list and not something just larger than that (for example, pick 24 hours, not 25).
 
==== Complete Nodes ====
The run dependent parameters are the first three items: the number of nodes (and therefore the total number of processors since each node has 32), the duration of the job, and an identifiable name for the job. The remaining permanent parameters need not be changed from run to run. You will need to replace the mail option to use your email address. Additional options exist for when to be emailed (i.e. start/end/failure of job), and many other possibilities such as waiting for another job to complete (which is useful as a post-processing job). See the sbatch manual page for more information.
 
The following script will use the UW Fluids contributed account which has high priority. If for any reason you decide to submit the job to the regular queue, change the account info to <code>--account=def-supervisor</code> where supervisor is the username of your supervisor (this could be def-mmstastn, or def-kglamb, ...).


=== Show Total Allocation Usage ===
Here is the script to put into <code>submit.sh</code>:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
alias sqs='sshare -a -A ctb-mmstastn_cpu -A rrg-mmstastn_cpu'
#!/bin/bash
# bash script for submitting a job to the sharcnet Graham queue
 
#SBATCH --nodes=2              # number of nodes to use
#SBATCH --time=03-00:00        # time (DD-HH:MM)
#SBATCH --job-name="mode2rot"  # job name
 
#SBATCH --ntasks-per-node=32                # tasks per node
#SBATCH --mem=128000M                      # memory per node
#SBATCH --output=sim-%j.log                # log file
#SBATCH --error=sim-%j.err                  # error file
#SBATCH --mail-user=username@uwaterloo.ca  # who to email
#SBATCH --mail-type=FAIL                    # when to email
#SBATCH --account=ctb-mmstastn              # UW Fluids designated resource allocation
srun ./case1.x
</syntaxhighlight>
</syntaxhighlight>


=== Give Job Details ===
==== Partial Nodes ====
The run dependent parameters are the first four items: the number of processors ("tasks"), the memory per processor, the duration of the job, and an identifiable name for the job. As in the complete node section the remaining permanent parameters need not be changed from run to run.
 
Here is the script to put into <code>submit.sh</code>:
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Slurm Control for job
#!/bin/bash
function scj() {
# bash script for submitting a job to the sharcnet Graham queue
    scontrol show jobid -dd $1
 
}
#SBATCH --ntasks=64            # number of MPI processes
export -f scj
#SBATCH --mem-per-cpu=3G        # memory per processor (default in Mb)
#SBATCH --time=03-00:00        # time (DD-HH:MM)
#SBATCH --job-name="mode2rot"  # job name
 
#SBATCH --output=sim-%j.log                # log file
#SBATCH --error=sim-%j.err                  # error file
#SBATCH --mail-user=username@uwaterloo.ca  # who to email
#SBATCH --mail-type=FAIL                    # when to email
#SBATCH --account=ctb-mmstastn              # UW Fluids designated resource allocation
srun ./case1.x
</syntaxhighlight>


# Slurm account for job
function saj() {
    sacct --format=jobid,JobName,ncpus,ntasks,state,reserved,elapsed,End,AveVMSize,MaxRSS,ReqMem -j $1
}
export -f saj


# Slurm statistics for job
function ssj() {
    sstat --format=jobid,ntasks,AveRSS,MaxRSS,MaxRSSNode -j $1
}
export -f ssj


function job_summary() {
== ~/.bashrc ==
    scj $1;
    saj $1;
    ssj $1;
}
export -f job_summary
</syntaxhighlight>


=== Show List of Submitted Jobs ===
Copy the following aliases and scripts into '''~/.bashrc''' to make them available to you at the command line.
sq_hist defaults to showing the jobs submitted in the last week. Option argument is of the form YYYY-MM-DD.
<syntaxhighlight lang="bash">
function sq_hist() {


    # Date for one week ago
=== Check Job Status and Nodes Usage ===
    dt=$(perl -e 'use POSIX;print strftime "%Y-%m-%d",localtime time-604800;')


    # If argument given (YYYY-MM-DD), use that. Else default to last week.
* sqm gives a summary of all of the running jobs by <userid>
    TIME=${1:-${dt}};
* sqa gives a summary of all jobs running with the UW Fluids group contributed resources
    echo $TIME;
* sqs gives the the fair share values and recent cpu-seconds used of all users of the UW fluids contributed resources
    sacct --starttime ${TIME} --format=jobid,jobname,reserved,alloccpus,state,exitcode | grep -v extern | grep -v batch | grep -v orted | grep -v bash;
You will want to replace <userid> with your userid when adding these to '''~/.bashrc'''
}
export -f sq_hist
</syntaxhighlight>


=== Move to Directory of Currently-Running Job ===
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
function cdJob() {
alias sqm='squeue -u <userid>'
    pth=$(squeue -o %Z -j $1 | sed '1d')
alias sqa='squeue --account=ctb-mmstastn_cpu,rrg-mmstastn_cpu'
    echo "cd-ing to ${pth}"
alias sqs='sshare -a -A ctb-mmstastn_cpu -A rrg-mmstastn_cpu'
    cd ${pth}
}
export -f cdJob
</syntaxhighlight>
</syntaxhighlight>


=== Show Current Usage of Group Allocation ===
See [https://docs.computecanada.ca/wiki/Job_scheduling_policies Scheduling Policies] to find out more about fair share, but the basic idea is that values closer to 1 have highest priority and values close to 0 have lowest. 0.5 will result in a job wait time being roughly the "average" wait time for all jobs on the cluster.
 
Currently, we have 832 processor-years allocated to our group for the year (2017-Apr 2018). We would like to have, on average, about 832 processors running at any given moment. There is no harm in going over or under this number, so long as we roughly complete 832 processor-years by the time our allocation expires. To check the current amount of cpus that are running or pending run
 
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# How busy is our resource allocation
function nodeUsage() {
function nodeUsage() {
     jobsR=`squeue --account=ctb-mmstastn_cpu,rrg-mmstastn_cpu -o %C -t R`
     jobsR=`squeue --account=ctb-mmstastn_cpu,rrg-mmstastn_cpu -o %C -t R`
Line 97: Line 125:
     done
     done


     echo -n $cpuSUM_R;
    perc_R=$((100*$cpuSUM_R/832 + 200*$cpuSUM_R/832 % 2)) # 2nd term is for rounding
     echo -n " of 832 processors are being used (";
    perc_P=$((100*$cpuSUM_P/832 + 200*$cpuSUM_P/832 % 2)) # 2nd term is for rounding
     echo -n $((832 - cpuSUM_R));
    perc_all=$((100*($cpuSUM_R + $cpuSUM_P)/832 + 200*($cpuSUM_R + $cpuSUM_P)/832 % 2))
     echo " currently unused)."
 
     echo -n $cpuSUM_P;
     echo "Processors running:  $cpuSUM_R ($perc_R%)";
     echo " processors-worth of jobs are pending";
    echo "Processors pending:  $cpuSUM_P ($perc_P%)";
     echo "Processors total:    $((cpuSUM_P+cpuSUM_R)) ($perc_all%)";
     echo "Processors available: $((832 - cpuSUM_R-cpuSUM_P))";
}
</syntaxhighlight>
 
For these to work properly, you must also have the following in '''~/.bash_profile''':
<syntaxhighlight lang="bash">
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
     . ~/.bashrc
fi
</syntaxhighlight>
 
=== Resource Allocation Usage ===
 
The total amount of cpu-years charged to the our contributed resources is not accessible within a login node and must be accessed online. Follow these steps to see the allocated resource usage:
# sign into the [https://ccdb.computecanada.ca/security/login Compute Canada Database]
# Select View Group Usage under My Account
# Select By Resource Allocation Project
# Select a given year and then choose a project (Currently it is pim-260-ab)
 
Memory storage can be presented in a login window with the command <code>diskusage_report</code>
 
=== Job details ===
 
The following three commands give information about a particular job.
 
* scj (slurm control job): processors and nodes used by a job
* saj (slurm account job): memory and time used by a job (use on completed jobs)
* ssj (slurm status job):  job memory (use on running jobs)
* job_summary: print all of the above
 
The memory used is the AveRSS (in KB), which is the average memory per task (cpu).
 
<syntaxhighlight lang="bash">
function scj() {
    scontrol show jobid -dd $1
function saj() {
    sacct --format=jobid,JobName,ncpus,ntasks,state,reserved,elapsed,End,MaxVMSize,AveVMSize,MaxRSS,ReqMem -j $1
}
function ssj() {
    sstat --format=jobid,ntasks,AveVMSize,MaxVMSize,AveRSS,MaxRSS -j $1
}
function job_summary() {
    scj $1;
    saj $1;
    ssj $1;
}
</syntaxhighlight>
 
Usage is:
$ scj <jobID>
 
=== Move to a Simulation/Job Directory ===
 
You may find that your directory tree becomes rather involved after a while, and so changing into a simulation directory (or just remembering the path) can start to be cumbersome. A useful function is cdJob, which takes you into the working directory for a submitted job, provided that you know the jobID (which can be given by sqm).
 
This command only works for running jobs.
 
<syntaxhighlight lang="bash">
function cdJob() {
     pth=$(squeue -o %Z -j $1 | sed '1d')
     echo "cd-ing to ${pth}"
    cd ${pth}
}
</syntaxhighlight>
 
Usage is:
$ cdJob <jobID>
 
=== List Submitted Jobs ===
sq_hist defaults to showing the jobs submitted in the last week. Option argument is of the form YYYY-MM-DD.
<syntaxhighlight lang="bash">
function sq_hist() {
    # Date for one week ago
    dt=$(perl -e 'use POSIX;print strftime "%Y-%m-%d",localtime time-604800;')
 
    # If argument given (YYYY-MM-DD), use that. Else default to last week.
    TIME=${1:-${dt}};
    echo $TIME;
    sacct --starttime ${TIME} -X --format=jobid,jobname,reserved,alloccpus,state,exitcode;
}
}
export -f nodeUsage
</syntaxhighlight>
</syntaxhighlight>

Revision as of 21:38, 3 January 2018

The following are a list of scripts, functions and aliases that will make your life on the Sharcnet Graham cluster much easier. Section 1, Submitting Jobs, provides submission scripts to your jobs. Section 2, ~/.bashrc, lists other useful commands for checking job information (such as memory usage, expected start time, etc), changing directories to a running job.

Submitting Jobs

The following are recommendations for the use of our contributed resources on Graham. As the UW fluids group, we have been allocated 832 processor years (for 2017-2018). Until these processor years have been expended, we have priority to run jobs on any processor ahead of regular users. Much of what you need to know for submitting and monitoring jobs is presented here. Further questions may be answered on the Compute Canada Graham page Running jobs.

Remember to be courteous about your memory usage and use only what you need! However, if you are using entire nodes then this does not apply since no one else will have access to that memory.

Submit script

An MPI job may be submitted to the Graham scheduler with either of the following bash scripts (A suggested name for them is submit.sh). The first script is for use of entire nodes, while the second allows for the processors to be spread out over many nodes. The first will take longer in queue, but should in theory run quicker since all processors have fewer connections. The second will likely start quicker since there is no requirement to wait for entire nodes to become available and can start whenever enough processors become available. Each script is broken into two parts: 1) the run dependent parameters and 2) the permanent parameters.

To submit the job execute sbatch submit.sh in a login window. Since this file won't go anywhere it is a handy way to look up what submission parameters you used (number of processors, memory, etc).

One thing to be aware of is that the memory requirement MUST be an integer. Decimals will not be accepted. The unit can be changed to Gigs (G) or other.

The requested time will play are large role in the time the job spends in queue. This is because the nodes on Graham have been specified to only accept jobs with a run time less than specific values. That is, there are far fewer nodes that accept jobs with a run time of 28 days than there are which accept a 3 hour run time. This is because the 3 hour job can run on any node (through back-filling into gaps on the other nodes), while the larger duration job will only run on nodes which accept it. The partitions are split into the following ways:

  • 3 hours or less
  • 12 hours or less
  • 24 hours (1 day) or less
  • 72 hours (3 days) or less
  • 7 days or less
  • 28 days or less

In summary, pick a value on this list and not something just larger than that (for example, pick 24 hours, not 25).

Complete Nodes

The run dependent parameters are the first three items: the number of nodes (and therefore the total number of processors since each node has 32), the duration of the job, and an identifiable name for the job. The remaining permanent parameters need not be changed from run to run. You will need to replace the mail option to use your email address. Additional options exist for when to be emailed (i.e. start/end/failure of job), and many other possibilities such as waiting for another job to complete (which is useful as a post-processing job). See the sbatch manual page for more information.

The following script will use the UW Fluids contributed account which has high priority. If for any reason you decide to submit the job to the regular queue, change the account info to --account=def-supervisor where supervisor is the username of your supervisor (this could be def-mmstastn, or def-kglamb, ...).

Here is the script to put into submit.sh:

#!/bin/bash
# bash script for submitting a job to the sharcnet Graham queue

#SBATCH --nodes=2               # number of nodes to use
#SBATCH --time=03-00:00         # time (DD-HH:MM)
#SBATCH --job-name="mode2rot"   # job name

#SBATCH --ntasks-per-node=32                # tasks per node
#SBATCH --mem=128000M                       # memory per node
#SBATCH --output=sim-%j.log                 # log file
#SBATCH --error=sim-%j.err                  # error file
#SBATCH --mail-user=username@uwaterloo.ca   # who to email
#SBATCH --mail-type=FAIL                    # when to email
#SBATCH --account=ctb-mmstastn              # UW Fluids designated resource allocation
srun ./case1.x

Partial Nodes

The run dependent parameters are the first four items: the number of processors ("tasks"), the memory per processor, the duration of the job, and an identifiable name for the job. As in the complete node section the remaining permanent parameters need not be changed from run to run.

Here is the script to put into submit.sh:

#!/bin/bash
# bash script for submitting a job to the sharcnet Graham queue

#SBATCH --ntasks=64             # number of MPI processes
#SBATCH --mem-per-cpu=3G        # memory per processor (default in Mb)
#SBATCH --time=03-00:00         # time (DD-HH:MM)
#SBATCH --job-name="mode2rot"   # job name

#SBATCH --output=sim-%j.log                 # log file
#SBATCH --error=sim-%j.err                  # error file
#SBATCH --mail-user=username@uwaterloo.ca   # who to email
#SBATCH --mail-type=FAIL                    # when to email
#SBATCH --account=ctb-mmstastn              # UW Fluids designated resource allocation
srun ./case1.x


~/.bashrc

Copy the following aliases and scripts into ~/.bashrc to make them available to you at the command line.

Check Job Status and Nodes Usage

  • sqm gives a summary of all of the running jobs by <userid>
  • sqa gives a summary of all jobs running with the UW Fluids group contributed resources
  • sqs gives the the fair share values and recent cpu-seconds used of all users of the UW fluids contributed resources

You will want to replace <userid> with your userid when adding these to ~/.bashrc

alias sqm='squeue -u <userid>'
alias sqa='squeue --account=ctb-mmstastn_cpu,rrg-mmstastn_cpu'
alias sqs='sshare -a -A ctb-mmstastn_cpu -A rrg-mmstastn_cpu'

See Scheduling Policies to find out more about fair share, but the basic idea is that values closer to 1 have highest priority and values close to 0 have lowest. 0.5 will result in a job wait time being roughly the "average" wait time for all jobs on the cluster.

Currently, we have 832 processor-years allocated to our group for the year (2017-Apr 2018). We would like to have, on average, about 832 processors running at any given moment. There is no harm in going over or under this number, so long as we roughly complete 832 processor-years by the time our allocation expires. To check the current amount of cpus that are running or pending run

function nodeUsage() {
    jobsR=`squeue --account=ctb-mmstastn_cpu,rrg-mmstastn_cpu -o %C -t R`
    jobsP=`squeue --account=ctb-mmstastn_cpu,rrg-mmstastn_cpu -o %C -t PD`

    cpuSUM_R=0
    cpuSUM_P=0
    count=0

    IFS='
    '
    for x in $jobsR;
    do
        if [ "$count" -gt "0" ]; then
            cpuSUM_R=$((x + cpuSUM_R));
        else
            count=1;
        fi
    done

    count=0;

    IFS='
    '
    for x in $jobsP;
    do
        if [ "$count" -gt "0" ]; then
            cpuSUM_P=$((x + cpuSUM_P));
        else
            count=1;
        fi
    done

    perc_R=$((100*$cpuSUM_R/832 + 200*$cpuSUM_R/832 % 2)) # 2nd term is for rounding
    perc_P=$((100*$cpuSUM_P/832 + 200*$cpuSUM_P/832 % 2)) # 2nd term is for rounding
    perc_all=$((100*($cpuSUM_R + $cpuSUM_P)/832 + 200*($cpuSUM_R + $cpuSUM_P)/832 % 2))

    echo "Processors running:   $cpuSUM_R ($perc_R%)";
    echo "Processors pending:   $cpuSUM_P ($perc_P%)";
    echo "Processors total:     $((cpuSUM_P+cpuSUM_R)) ($perc_all%)";
    echo "Processors available: $((832 - cpuSUM_R-cpuSUM_P))";
}

For these to work properly, you must also have the following in ~/.bash_profile:

 # Get the aliases and functions
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

Resource Allocation Usage

The total amount of cpu-years charged to the our contributed resources is not accessible within a login node and must be accessed online. Follow these steps to see the allocated resource usage:

  1. sign into the Compute Canada Database
  2. Select View Group Usage under My Account
  3. Select By Resource Allocation Project
  4. Select a given year and then choose a project (Currently it is pim-260-ab)

Memory storage can be presented in a login window with the command diskusage_report

Job details

The following three commands give information about a particular job.

  • scj (slurm control job): processors and nodes used by a job
  • saj (slurm account job): memory and time used by a job (use on completed jobs)
  • ssj (slurm status job): job memory (use on running jobs)
  • job_summary: print all of the above

The memory used is the AveRSS (in KB), which is the average memory per task (cpu).

function scj() {
    scontrol show jobid -dd $1
}   
function saj() {
    sacct --format=jobid,JobName,ncpus,ntasks,state,reserved,elapsed,End,MaxVMSize,AveVMSize,MaxRSS,ReqMem -j $1
}
function ssj() {
    sstat --format=jobid,ntasks,AveVMSize,MaxVMSize,AveRSS,MaxRSS -j $1
}
function job_summary() {
    scj $1;
    saj $1;
    ssj $1;
}

Usage is:

$ scj <jobID>

Move to a Simulation/Job Directory

You may find that your directory tree becomes rather involved after a while, and so changing into a simulation directory (or just remembering the path) can start to be cumbersome. A useful function is cdJob, which takes you into the working directory for a submitted job, provided that you know the jobID (which can be given by sqm).

This command only works for running jobs.

function cdJob() {
    pth=$(squeue -o %Z -j $1 | sed '1d')
    echo "cd-ing to ${pth}"
    cd ${pth}
}

Usage is:

$ cdJob <jobID>

List Submitted Jobs

sq_hist defaults to showing the jobs submitted in the last week. Option argument is of the form YYYY-MM-DD.

function sq_hist() {
    # Date for one week ago
    dt=$(perl -e 'use POSIX;print strftime "%Y-%m-%d",localtime time-604800;')

    # If argument given (YYYY-MM-DD), use that. Else default to last week.
    TIME=${1:-${dt}};
    echo $TIME;
    sacct --starttime ${TIME} -X --format=jobid,jobname,reserved,alloccpus,state,exitcode;
}