Computing Resources: Difference between revisions

From Fluids Wiki
Jump to navigation Jump to search
(Emphasize variety among machines and importance of setting up environment to match with config files for models (SPINS, MITgcm, etc.).)
m (update the last-edited date at the top)
 
(12 intermediate revisions by 3 users not shown)
Line 1: Line 1:
As a member of the uWaterloo Applied Math Fluids Group you have access to a fairly substantial array of computing resources. These are correct as of March 2018.
As a member of the uWaterloo Applied Math Fluids Group you have access to a fairly substantial array of computing resources. These are correct as of August 2024.


== Compute Canada ==
== Digital Research Alliance of Canada ==


Compute Canada is perhaps the largest supplier of computing services available to us, and it provides significant computing power. To begin, you will need to create a Compute Canada account (follow instructions on [https://www.computecanada.ca/research-portal/account-management/apply-for-an-account/ the Compute Canada page]). Account creation will require approval from your supervisor. The main components of Compute Canada that are relevant to us are: SHARCNet, SciNet, and WestGrid.
Digital Research Alliance of Canada (formerly, Compute Canada) is perhaps the largest supplier of computing services available to us, and it provides significant computing power. To begin, you will need to create a Compute Canada account (follow instructions on [https://www.computecanada.ca/research-portal/account-management/apply-for-an-account/ the Compute Canada page]). Account creation will require approval from your supervisor. The main components of Compute Canada that are relevant to us are: SHARCNet, SciNet, and WestGrid.
 
=== SHARCNet ===
SHARCNet has many computing clusters. However, there are two clusters to which we have a degree of priority access: Orca and Graham
 
Help with these systems can be accessed through the SHARCNet [https://www.sharcnet.ca/my/problems/submit ticket system].
 
==== Orca (listed to be decommissioned) ====
 
Orca is now (as of Graham) a legacy system. Fluids contributed 512 processors to the compute cluster and, as a result, has priority on those compute nodes.
For information on using Orca, see [[Orca_Tips]].
For information on the Orca system and hardware, see [https://www.sharcnet.ca/my/systems/show/73 the official SHARCNet page here] and [https://www.sharcnet.ca/help/index.php/Orca here].
 
==== Graham ====
 
Graham is a significantly larger computing cluster than Orca, and provides the opportunity to run much larger simulations. Fluids contributed 1024 cores and 4 TB of memory to the Graham cluster. As of writing this, the group has been awarded dedicated computation time.
For information on using Graham, see [[Graham Tips]].
For information on the Graham system and hardware, see [https://www.sharcnet.ca/help/index.php/Graham the official documentation].


=== SciNet ===
=== SciNet ===
Line 27: Line 10:


Niagara is another large system run by Compute Canada. Our group does not have dedicated resources, but that need not stop you from running anything there.
Niagara is another large system run by Compute Canada. Our group does not have dedicated resources, but that need not stop you from running anything there.
For information on the Graham system and hardware, see documentation [https://docs.scinet.utoronto.ca/index.php/Main_Page here] and [https://docs.computecanada.ca/wiki/Niagara here].
For information on the Niagara system and hardware, see documentation [https://docs.scinet.utoronto.ca/index.php/Main_Page here] and [https://docs.computecanada.ca/wiki/Niagara here].


=== Westgrid ===
=== Westgrid ===
Line 45: Line 28:


=== Fluids-owned machines ===
=== Fluids-owned machines ===
==== kazan ====
This system is fairly old, and unless you specifically need to use it you may be better served by some of the other machines. However, some information can be found on our page about [[info specific to winisk and kazan]].


==== hood ====
==== hood ====


[[info specific to hood.math | more information can be found here]].
[[info specific to hood.math | More information can be found here]].


==== bow, minnewanka, waterton (the "mountain lakes") ====
==== bow, minnewanka, waterton (the "mountain lakes") ====


These are new (2017) systems with high-speed interconnects and are managed through the SLURM scheduler (see [[Graham Tips]] for some useful SLURM-related commands). [[info specific to bow, minnewanka, waterton | More information can be found here]].
These are (2017) systems with high-speed interconnects and are managed through the SLURM scheduler (see [[Graham Tips]] for some useful SLURM-related commands). [[info specific to bow, minnewanka, waterton | More information can be found here]].


==== kesagami, kuujjua ====
==== kesagami, kuujjua ====


These new (2018) systems are intended primarily to run the IGW model.
These (2018) systems are intended primarily to run the IGW model.
 
==== sutton, peleee, rondeau (the "provincial parks") ====
 
See details [[info_specific_to_sutton,_rondeau,_pelee | here]].


=== Faculty-wide machines ===
=== Faculty-wide machines ===
Line 68: Line 51:
== Lab Systems ==
== Lab Systems ==


Lab machines are maintained by lab members (currently Aaron Coutino), not MFCF. Accounts should exist for Fluid Lab members (if you don't have one but would like one, ask Aaron / your supervisor). Some standard software is installed.  
Lab machines are maintained by lab members (currently Andrew Grace), not MFCF. Accounts should exist for Fluid Lab members (if you don't have one but would like one, ask Aaron / your supervisor). Some standard software is installed.  


''Note: these machines do not have a queue system, so please compute responsibly and do not swamp the machines.''
''Note: these machines do not have a queue system, so please compute responsibly and do not swamp the machines.''


=== Belize ===
=== Belize2 (formerly Boogaloo) ===
As of 18 March 2018, Belize has been taken off-line. Data can still be accessed, but requires using the machine directly / in-person.
belize2 can be accessed either in-person in the Fluids Lab or via ssh with yourUserID@boogaloo.math.uwaterloo.ca.  
 
=== Boogaloo ===
Boogaloo can be accessed either in-person in the Fluids Lab or via ssh with yourUserID@boogaloo.math.uwaterloo.ca.  
See [[info specific to boogaloo.math and belize.math | our wiki page]] for information on the available hardware.
See [[info specific to boogaloo.math and belize.math | our wiki page]] for information on the available hardware.
Boogaloo has both CPU and GPU capabilities.
Boogaloo has both CPU and GPU capabilities.
Line 82: Line 62:
=== Onyx ===
=== Onyx ===


Onyx is a Windows machine that is primarily intended for visualization and is new as of March 2018. Both VisIt and ParaView are installed and should be GPU-aware.  
Onyx (2018) is a Windows machine that is primarily intended for visualization. Both VisIt and ParaView are installed and should be GPU-aware.  
This machine should be used in-person to perform high-powered visualization of your datasets.
This machine should be used in-person to perform high-powered visualization of your datasets.
Onyx is ''not intended'' for heavy computation, but is capable to running CUDA models.
Onyx is ''not intended'' for heavy computation, but is capable to running CUDA models.
=== Belize3 ===
belize3.math.uwaterloo.ca (2021) is a high-power Linux workstation co-managed by MFCF.
It has a moderate GPU in addition to two server-class multi-core CPUs.

Latest revision as of 14:13, 26 August 2024

As a member of the uWaterloo Applied Math Fluids Group you have access to a fairly substantial array of computing resources. These are correct as of August 2024.

Digital Research Alliance of Canada

Digital Research Alliance of Canada (formerly, Compute Canada) is perhaps the largest supplier of computing services available to us, and it provides significant computing power. To begin, you will need to create a Compute Canada account (follow instructions on the Compute Canada page). Account creation will require approval from your supervisor. The main components of Compute Canada that are relevant to us are: SHARCNet, SciNet, and WestGrid.

SciNet

Niagara

Niagara is another large system run by Compute Canada. Our group does not have dedicated resources, but that need not stop you from running anything there. For information on the Niagara system and hardware, see documentation here and here.

Westgrid

Cedar

We have essentially no experience with this system, but it's a thing. Documentation is here.

MFCF-administered Systems

The Math Faculty Computing Facility (MFCF) provides a central computing environment for the Faculty (excluding Computer Science). It also maintains several Linux/Unix servers that belong to the Fluids group. These servers are much smaller in scale than the Compute Canada clusters (typically tens of processors), but they are hosted locally and can be very useful tools for running smaller simulations or getting your code working properly before scaling up to Compute Canada. The MFCF staff are also very helpful regarding software installation and general computing problems / questions.

It is important to understand that these machines do not all run the same operating system. They also have different compilers, numerical libraries, MPI libraries, and so on. In order to build and run models such as SPINS, MITgcm, and IGW, your environment must be set up appropriately for each machine. The recommended shell control files (.cshrc, .profile) provided by MFCF do that for you automatically. In some cases you may wish to choose different options (e.g. different version of gcc, different MPI library), but the recommended shell control files align with the model configuration files published in this wiki.

Fluids-owned machines

hood

More information can be found here.

bow, minnewanka, waterton (the "mountain lakes")

These are (2017) systems with high-speed interconnects and are managed through the SLURM scheduler (see Graham Tips for some useful SLURM-related commands). More information can be found here.

kesagami, kuujjua

These (2018) systems are intended primarily to run the IGW model.

sutton, peleee, rondeau (the "provincial parks")

See details here.

Faculty-wide machines

The central shared computing environment comprises numerous Linux and Windows servers and a small number of GPU servers. Even if you do not use those machines, the central file server is a smart place to store a copy of your important work, such as your thesis in progress, for safety. The central file service "files.math" is accessible via the Mac mini provided to you by the department, as well as via ssh/scp to the shared Linux machines.

Lab Systems

Lab machines are maintained by lab members (currently Andrew Grace), not MFCF. Accounts should exist for Fluid Lab members (if you don't have one but would like one, ask Aaron / your supervisor). Some standard software is installed.

Note: these machines do not have a queue system, so please compute responsibly and do not swamp the machines.

Belize2 (formerly Boogaloo)

belize2 can be accessed either in-person in the Fluids Lab or via ssh with yourUserID@boogaloo.math.uwaterloo.ca. See our wiki page for information on the available hardware. Boogaloo has both CPU and GPU capabilities.

Onyx

Onyx (2018) is a Windows machine that is primarily intended for visualization. Both VisIt and ParaView are installed and should be GPU-aware. This machine should be used in-person to perform high-powered visualization of your datasets. Onyx is not intended for heavy computation, but is capable to running CUDA models.

Belize3

belize3.math.uwaterloo.ca (2021) is a high-power Linux workstation co-managed by MFCF. It has a moderate GPU in addition to two server-class multi-core CPUs.