Orca Tips: Difference between revisions

From Fluids Wiki
Jump to navigation Jump to search
(Added a page to include Sharcnet tips, tricks, and a guide for courteous usage.)
 
mNo edit summary
Line 1: Line 1:


== Courteous Usage ==
== Courteous and Efficient Usage ==


The following are recommendations for courteous usage and are not required, but are somewhat strongly recommend (especially the first two).
The following are recommendations for courteous usage and are not required, but are somewhat strongly recommend (especially the first two).


<!--
* When submitting mpi runs, consider using the '''ppn''' flag
* When submitting mpi runs, consider using the '''ppn''' flag
** This flag specifies the number of ''processors per node'', and helps to keep your jobs using whole nodes when available (avoids fragmenting jobs over more nodes than necessary)
** This flag specifies the number of ''processors per node'', and helps to keep your jobs using whole nodes when available (avoids fragmenting jobs over more nodes than necessary)
Line 10: Line 11:
*** If you are using fewer than 16 processors, say 8, then you could also use '''--ppn=8''' to group your processors on one node.
*** If you are using fewer than 16 processors, say 8, then you could also use '''--ppn=8''' to group your processors on one node.
** While using the '''ppn''' may increase wait times, if everyone uses it then the effect should be minimal and will improve job performance (and consistency of scaleability).
** While using the '''ppn''' may increase wait times, if everyone uses it then the effect should be minimal and will improve job performance (and consistency of scaleability).
-->
* If your simulation is using 16 processors or fewer, consider using the '''ppn''' flag
** This flag specifies the number of ''processors per node'', and helps to keep your jobs using whole nodes when available (avoids fragmenting jobs over more nodes than necessary)
** Using no more than 16 processors means that your job an fit on a single node, which will reduce communication costs and provide a speed-up.
*** If you are using 8 processors, then use '''--ppn=8'''.
* When submitting jobs, consider how much memory you will need.
* When submitting jobs, consider how much memory you will need.
** The '''mpp''' flag specifies the ''memory per processor''
** The '''mpp''' flag specifies the ''memory per processor''

Revision as of 12:42, 9 March 2017

Courteous and Efficient Usage

The following are recommendations for courteous usage and are not required, but are somewhat strongly recommend (especially the first two).

  • If your simulation is using 16 processors or fewer, consider using the ppn flag
    • This flag specifies the number of processors per node, and helps to keep your jobs using whole nodes when available (avoids fragmenting jobs over more nodes than necessary)
    • Using no more than 16 processors means that your job an fit on a single node, which will reduce communication costs and provide a speed-up.
      • If you are using 8 processors, then use --ppn=8.
  • When submitting jobs, consider how much memory you will need.
    • The mpp flag specifies the memory per processor
    • If you are using whole nodes (--ppn=16), then you may as well use all of the memory (--mpp=4032M, for the low memory nodes)
    • If you are not using a whole node, then try to only request as much memory as is needed. This will permit other jobs to run on the same node.
  • When submitting many jobs, consider linking them using the w flag
    • This flag instructions a job to wait until other jobs have finished.
      • Suppose you want to submit two large (processor wise) jobs. If the first job has a job id of 1234567 after it is submitted, then if the second job is submitted with -w 1234567, it will not run until the first job has finished.
      • While the first job is running, the second job will continue to build priority, so it will be more likely to start shortly after the first job finished (especially if the second job doesn't require more processors than the first).
        • NOTE: If the first job dies (runs out of time, memory, or has an internal error), the the second job will not run.
    • Note that this flag is only important when the queue is busy and your jobs would otherwise monopolize the available processors
      • As a rough guide, try to limit yourself to 128 processors at a time, although more can certainly be used when necessary.
    • The w flag can also be very useful during holidays/conferences/vacations to submit a series of jobs without blocking other users unnecessarily.


~/.bashrc

Copy the following lines of code into ~/.bashrc to make them available to you at the command line. Note: Thanks to Mike Dunphy and John Yawney for some of these.

Checking Job Status and Nodes Usage

  • sqa gives a summary of all jobs submitted by the kglamb group
  • sqmpi gives a summary of all mpi jobs
  • sqh gives a node-by-node summary of the kglamb nodes
  • sqm gives a summary of all of the jobs submitted by <userid>
    • You will want to replace <userid> with your userid when adding these to ~/.bashrc
alias sqa='showq -w class=kglamb'
alias sqmpi='showq -w class=mpi'
alias sqh='sqhosts orc361-392'
alias sqm='showq -w user=<userid>'

Accessing Development Nodes

The development notes provide an opportunity to run code directly, which can be very useful for development and testing. The following functions simplify the connection process.

  • DevUsage provides a summary of how busy the nodes are (in terms of processor usage, each has 16 processors, so the lower the better)
  • DevConnect automatically connects to the least used development node.
alias DevUsage="pdsh -w orc-dev[1-4] uptime | awk '{print \$1,\$NF}' | sort -n -k 2"
function DevConnect() {
    var=$(pdsh -w orc-dev[1-4] uptime | awk '{print $1,$NF}' | sort -n -k 2)
    var2=${var:0:8}
    printf "\n*** Accessing dev node: %s ***\n\n" $var2
    ssh -X $var2
}
export -f DevConnect