MITgcm tips

From Fluids Wiki
Jump to navigation Jump to search

This page is intended to supplement the MITgcm documentation in a way that is more focused on how to use the model, rather than the the inner workings of the model. Recommended parameters and flags are listed, and the EXF and SEAICE packages are described.

The first section is relevant to the core model. It covers choosing an equation of state, using checkpoint files, and specifying the model grid. The following sections cover the EXF, SEAICE, and DIAGNOSTICS packages that supplement the core model.

Maybe the most important section here (at the time of writing) is the SEAICE default settings. There is currently an active pull request to fix the defaults, but there are some strongly recommended settings to use with this package.

Core model parameters and recommendations

Choosing an equation of state

The equation of state relates temperature, density, and salinity of water. The MITgcm has several choices available. The default choice is a linear equation of state. This is generally a poor choice when using SEAICE. Freshwater is most dense at 4 C, so if temperatures are near 4 C the linear equation of state will not represent the true densities. A much better choice is to use a nonlinear equation of state that includes the density maximum. In MITgcm, a good choice is the JMD95Z equation of state. More detail about this EOS is given in the EOS documentation page.

Setting the equation of state

The equation of state is specified by the namelist entry eosType in the PARM01 namelist of the data file. To choose the JMD95Z equation of state, you would include the following in your data file

&PARM01
...
eosType='JMD95Z',
....

Using checkpoint files

At some point you will be running the model, and it will fail. This can happen on Graham if the node fails, the power flicks or goes out, or any number of other reasons. This can cost you a lot of time when you're running big jobs. Therefore, it's best practice to use checkpoint files. MITgcm has built-in rolling and permanent checkpoint files. As the names suggest, rolling checkpoint files save the model current state with one of two file names (pickup.ckptA.data and pickup.ckptB.data), overwriting the oldest of the two to save each new checkpoint file. Permanent checkpoint files are saved with the model iteration number in their name and don't get overwritten. The frequencies are controlled separately by the keywords chkptFreq for rolling checkpoints and pchkptFreq for permanent checkpoint files. These keywords belong in the time namelist PARM03 (with other time-stepping parameters) in the data file.

Specifying a model grid

The size of the model grid needs to be specified at compile time and at run time. Each compiled executable must always be run with the same number of grid points and processors, but can be run for different resolutions on the same grid.

Compile-time options

The model grid is first specified in the SIZE.h file. The full domain (size Nx x Ny) is split into tiles in X and Y. Each tile is split into a number of grid points in X and Y (variables sNx and sNy). A group of tiles (size nSx x nSy) is sent to each processor, and the number of processors in the X and Y directions are specified by nPx and nPy. Therefore, we need Nx = sNx * nSx * nPx and Ny = sNy * nSy * nPy.

Each tile must overlap so each tile sharing the edge agrees on the field values on the edge. The overlap extent is specified by OLx and OLy. The required values depend on the advection scheme and packages used.

See the example SIZE.h file below. You can copy this file from MITgcm/model/inc/SIZE.h. You will need to comment out the first few lines, and change the values to match your configuration.

    CBOP
    C    !ROUTINE: SIZE.h
    C    !INTERFACE:
    C    include SIZE.h
    C    !DESCRIPTION: \bv
    C     *==========================================================*
    C     | SIZE.h Declare size of underlying computational grid.
    C     *==========================================================*
    C     | The design here supports a three-dimensional model grid
    C     | with indices I,J and K. The three-dimensional domain
    C     | is comprised of nPx*nSx blocks (or tiles) of size sNx
    C     | along the first (left-most index) axis, nPy*nSy blocks
    C     | of size sNy along the second axis and one block of size
    C     | Nr along the vertical (third) axis.
    C     | Blocks/tiles have overlap regions of size OLx and OLy
    C     | along the dimensions that are subdivided.
    C     *==========================================================*
    C     \ev
    C
    C     Voodoo numbers controlling data layout:
    C     sNx :: Number of X points in tile.
    C     sNy :: Number of Y points in tile.
    C     OLx :: Tile overlap extent in X.
    C     OLy :: Tile overlap extent in Y.
    C     nSx :: Number of tiles per process in X.
    C     nSy :: Number of tiles per process in Y.
    C     nPx :: Number of processes to use in X.
    C     nPy :: Number of processes to use in Y.
    C     Nx  :: Number of points in X for the full domain.
    C     Ny  :: Number of points in Y for the full domain.
    C     Nr  :: Number of points in vertical direction.
    CEOP
          INTEGER sNx
          INTEGER sNy
          INTEGER OLx
          INTEGER OLy
          INTEGER nSx
          INTEGER nSy
          INTEGER nPx
          INTEGER nPy
          INTEGER Nx
          INTEGER Ny
          INTEGER Nr
          PARAMETER (
         &           sNx =  30,
         &           sNy =  15,
         &           OLx =   2,
         &           OLy =   2,
         &           nSx =   2,
         &           nSy =   4,
         &           nPx =   1,
         &           nPy =   1,
         &           Nx  = sNx*nSx*nPx,
         &           Ny  = sNy*nSy*nPy,
         &           Nr  =   4)
    
    C     MAX_OLX :: Set to the maximum overlap region size of any array
    C     MAX_OLY    that will be exchanged. Controls the sizing of exch
    C                routine buffers.
          INTEGER MAX_OLX
          INTEGER MAX_OLY
          PARAMETER ( MAX_OLX = OLx,
         &            MAX_OLY = OLy )

Run-time options

The resolution is specified at run time in namelist PARM04 in the data file. The grids are specified as (int: number of points) * (float: distance between points) or by specifying a list of distances between neighbouring points. See example namelist below

&PARM04
usingCartesianGrid =.TRUE.,
usingSphericalPolarGrid=.FALSE.,
delX =400∗12.50,
delY =400∗12.50,
delR =50∗0.2 ,

Available Packages

The sections discusses the diagnostics, seaice, and exf packages. All packages required some compile time and some run time options. At minimum, packages must be enabled at compile time by adding the package name to the packages.conf file. At run time, the package must be enabled in data.pkg

   packages.conf:
   
   oceanic
   diagnostics
   exf
   cal
   seaice
   data.pkg:
   
   &PACKAGES
   useEXF        =.TRUE.,
   useCAL        =.TRUE.,
   useSEAICE.    =.TRUE.,
   useDIAGNOSTICS=.TRUE.,
   &

DIAGNOSTICS

The diagnostics package exists to make it easier to control the output files generated by the model. It makes it easy to choose which output fields are written and how frequently output files are generated.

Compile time options

At compile time you should have the file DIAGNOSTICS_SIZE.h. This can be copied from the package code pkg/diagnostics/DIAGNOSTICS_SIZE.h. The following is an example:

   C     Diagnostics Array Dimension
   C     ---------------------------
   C     ndiagMax   :: maximum total number of available diagnostics
   C     numlists   :: maximum number of diagnostics list (in data.diagnostics)
   C     numperlist :: maximum number of active diagnostics per list (data.diagnostics)
   C     numLevels  :: maximum number of levels to write    (data.diagnostics)
   C     numdiags   :: maximum size of the storage array for active 2D/3D diagnostics
   C     nRegions   :: maximum number of regions (statistics-diagnostics)
   C     sizRegMsk  :: maximum size of the regional-mask (statistics-diagnostics)
   C     nStats     :: maximum number of statistics (e.g.: aver,min,max ...)
   C     diagSt_size:: maximum size of the storage array for statistics-diagnostics
   C Note : may need to increase "numdiags" when using several 2D/3D diagnostics,
   C  and "diagSt_size" (statistics-diags) since values here are deliberately small.
         INTEGER    ndiagMax
         INTEGER    numlists, numperlist, numLevels
         INTEGER    numdiags
         INTEGER    nRegions, sizRegMsk, nStats
         INTEGER    diagSt_size
         PARAMETER( ndiagMax = 500 )
         PARAMETER( numlists = 10, numperlist = 50, numLevels=2*Nr )
         PARAMETER( numdiags = 10*Nr )
         PARAMETER( nRegions = 0 , sizRegMsk = 1 , nStats = 4 )
         PARAMETER( diagSt_size = 10*Nr )
   
   
   CEH3 ;;; Local Variables: ***
   CEH3 ;;; mode:fortran ***
   CEH3 ;;; End: ***

In this example, you can only output up to 10 diagnostics. By increasing the maximum sizes to numlists = 25 and numdiags = 25*Nr you could output up to 25 diagnostics.

Run time options

The diagnostics and frequency are specified in the file data.diagnostics. Specify the field names in the field and filename lines, and set the frequency. The file needs to have namelists &DIAGNOSTICS_LIST (for writing en entire field to file) and DIAG_STATISPARMS (for writing per-level statistics), even if one of them is empty. Example:

   # Diagnostic Package Choices
   #-----------------
   # for each output-stream:
   #  filename(n) : prefix of the output file name (only 8.c long) for outp.stream n
   #  frequency(n):< 0 : write snap-shot output every |frequency| seconds
   #               > 0 : write time-average output every frequency seconds
   #  timePhase(n)     : write at time = timePhase + multiple of |frequency|
   #  averagingFreq(n) : frequency (in s) for periodic averaging interval
   #  averagingPhase(n): phase     (in s) for periodic averaging interval
   #  repeatCycle(n)   : number of averaging intervals in 1 cycle
   #  levels(:,n) : list of levels to write to file (Notes: declared as REAL)
   #                 when this entry is missing, select all common levels of this list
   #  fields(:,n) : list of diagnostics fields (8.c) (see "available_diagnostics.log"
   #                 file for the list of all available diag. in this particular config)
   #−−−−−−−−−−−−−−−−−
   &DIAGNOSTICS_LIST
   # diag_mnc = .FALSE. ,
   # dumpAtLast = .TRUE. ,
   #==============================
   frequency(1) = −3600.0 ,
   timePhase(1) =0,
   fields(1, 1) = ’THETA’ ,
   filename( 1) =’T’,
   #−−−−−−−−−−−−−−−−−
   frequency(2) = −3600.0,
   timePhase(2) = 0 ,
   fields(1, 2) = ’UVEL’,
   filename( 2) = ’U’,
   #−−−−−−−−−−−−−−−−− 
   # ...
   &
   
   &DIAG_STATIS_PARMS
   &
         

In this example, only the temperature and U velocity will be written to file. Their instantaneous value is written to file once per hour of model time. We could add more diagnostics by continuing the namelist in the same way.

All available diagnostics are written to the file available_diagnostics.log when the model runs.

SEAICE

The SEAICE package includes physical parameterizations for ice dynamics and simple thermodynamics. The dynamical model is described in the documentation.

Compile time options

The SEAICE packages requires the files SEAICE_OPTIONS.h and SEAICE_SIZE.h to be present at compile time. You can copy the files from the SEAICE tutorial or the package code pkg/seaice/SEAICE_OPTIONS.h and modify as desired.

Run time options

Run time options are specified in the file data.seaice. The SEAICE package defaults are some questionable. See this GitHub issue and corresponding Pull Request. In short, you should at minimum have the following in your data.seaice file

   # SEAICE parameters
   
   &SEAICE_PARM01
   SEAICEuseDYNAMICS      =.TRUE.,
   SEAICEscaleSurfStress  =.TRUE.,
   SEAICE_useMultiDimSnow = .TRUE.,
   SEAICEetaZmethod       = 3
   SEAICE_drag            = 0.001,
   SEAICEadvHEFF          = .TRUE.,
   SEAICEadvAREA          = .TRUE.,
   
   # Always use either 33 or 77
   SEAICEadvScheme        = 33,
   
   # These help thin ice behave better
   SEAICE_area_reg        = 1.0E-5,
   SEAICE_hice_reg        = 1.0E-5,
   &
   
   &SEAICEPARM03
   &

EXF

The external forcing (EXF) package allows the model to be forced with real (or simulated) observations of meteorological data. This package has temporal interpolation abilities which will be convenient when working with real observations.

Compile time options

The compile time options are specified in file EXF_OPTIONS.h. This file can be copied from the EXF package code pkg/exf/EXF_OPTIONS.h. Some CPP options are also required in CPP_OPTIONS.h (see table below).

The package has 6 different combinations of forcing fields that can be specified to force the model. The headers relate to CPP options:

ALLOW_ATM_TEMP TEMP
ALLOW_DOWNWARD_RADIATION DOWN
ALLOW_BULKFORMULAE BULK
EXF_READ_EVAP EVAP
ALLOW_READ_TURBFLUXES TURB


Possible EXF forcing combinations
# Temp DOWN BULK EVAP TURB Actions
(1) - - - - - Read in hflux, swflux, and sflux
(2) - def - - - Read in hflux, swdown, and sflux. Compute swflux.
(3) def def def - - Read in atemp, aqh, swdown, lwdown, precip, and runoff. Compute hflux, swflux, and sflux.
(4) def - def - - Read in atemp, aqh, swflux, lwflux, precip, and runoff. Compute hflux and sflux.
(5) def def - def def Read in hs, hl, swdown, lwdown, evap, precip, and runoff. Compute flux, swflux, and flux.
(6) def - - def def Read in hs, hl, swflux, lwflux, evap, precip, and runoff. Compute hflux and flux.

Configurations (3)-(6) are compatible with SEAICE. I have usually run with configuration (3).


Run time options

The run time options are specified in file data.exf. Forcing files are specified in EXF_NML_02 with the keywords [fieldname]file.

This package is intended to handle time-dependent forcing. You must specify the time between consecutive entries in the forcing files, as well as a global repeat time for the period of the forcing.

The time between consecutive entries is specified for each field as [fieldname]period (EXF_NML_02), and the global repeat period is repeatPeriod (EXF_NML_01). The snippet below is part of the data.exf file for forcing that repeats daily, with 1 hour increments in the forcing fields.

The starting date for the forcing file is specified in the entry [fieldname]startdate1 (YYYYMMDD format), and the time is startdate2 (hhmmss format).

&EXF_NML_01
repeatPeriod = 86400.0,
&

&EXF_NML_02
uwindstartdate1 = 20050401,
uwindstartdate2 = 000000,
uwindperiod = 3600.0,
uwindfile = 'u_wind.bin',
&

You will also have to specify the same start dates in the data.cal file,

&CAL_NML
TheCalendar='gregorian',
startDate_1 = 20050401,
startDate_2 = 000000,
&

In EXF, when using the integrated bulk forcing, you are able to specify the following coefficients in data.exf:

cdrag_1 = 0.0027000,
cdrag_2 = 0.0001420,
cdrag_3 = 0.0000764,
cstanton_2 = 0.0327,
cstanton_1 = 0.0180,
cdalton = 0.0346,
ocean_emissivity  = 0.9700

Some attention must be paid when changing these coefficients, as cdalton is not the actual Dalton number, nor are cstanton_1 or cstanton_2 parts of the actual Stanton number. Rather, the Dalton and Stanton numbers are calculated as:

,

,

,

where the choice of Stanton number depends on whether the system is stable or unstable. EXF in its default configuration (not using the Large and Yeager formulation) considers it stable if , and unstable otherwise. By default, the product .

The drag coefficient depends on the 10m wind speed and is calculated as:

.

Note that since the drag coefficient is undefined when the wind speed is equal to 0 m/s, the MITgcm enforces a minimum wind speed, set to 0.5 m/s by default.

OBCS

OBCS allows the user to specify 4 separate, irregular boundaries which can act as inflows or outflows for tracers, temperature, velocity, etc. Boundaries are initialized at compile-time, and their points are set through data.obcs at run time. Northern and southern boundaries are nx by nz by forcing step number arrays, and eastern and western boundaries are ny by nz by forcing step number arrays. OBCS can be set to read from files describing these arrays which contain values for the fields which are conditioned at these boundary points. All essential fields (tracers, temperature, and at least one component of velocity) must be conditioned at each boundary in order for the simulation to run correctly.

The discretization of the MITgcm utilizes an Arakawa C grid with cell-centred tracer values. The velocity grid goes from index 1 to index nx+1 in the x direction (see http://mitgcm.org/public/r2_manual/latest/code_reference/vdb/code/101.htm), but OBCS utilizes indices on the tracer grid which goes from index 1 to index nx. Thus, for each cell in the tracer grid, there are North, South, East, and West velocities. Conditioning the west boundary of cell (x, y) is the same as conditioning the east boundary of cell (x-1, y). OBCS syntax is such that setting a boundary at 0 is equivalent to toggling it off, and setting a boundary in 1 through to nx (or ny) is to condition the face bordering that cell in the direction of the boundary.

Since each boundary reads from only one file for each field, if a boundary is irregular careful attention must be paid to ensure the points selected in data.obcs correspond to the same points in the file's array. This is best communicated by an example.

Consider a 200x200 empty basin, where the user intends a northern boundary between x=0 and x=99 at y=99 and between x=100 and x=199 at y=199 (indices read from 0). The proper syntax for data.obcs to select this boundary is:

OB_Jnorth(1:100) = 100*100,
OB_Jnorth(101:200) = 100*200, (or  OB_Jnorth(101:200) = 100*-1,)
OB_Iwest(100:200) = 100*100,

You might also initialize the entire boundary at one value and then modify the points to take on another value, like so:

OB_Jnorth = 200*-1,
OB_Jnorth(1:100) = 100*100,
OB_Iwest(100:200) = 100*100,

Now since OBCS reads from only one file, the OBCSN files must contain the information for the first boundary in the first 100 indices, and the information for the second in the following 100 indices. Note that the west boundary must be included unless bathymetry provides boundary conditions along the boundary between y=100 and y=200 at x=100. There are many alternative formulations which are possible, but all must condition the fluid around the inset northern boundary slice to ensure that the boundary slice does not have to function both as an inflow and as an outflow/wall. In the example above, the fluid in the upper left region segmented by the west and northern boundary points is separated from the fluid in the rest of the basin, and every cell is fully conditioned. These considerations can become very complex with complicated bathymetry, and so it is important to be thorough.

Sometimes it's pointless to bother figuring out the precise issue in the OBCS check, and better to just brute-force the errors away. For this purpose, there is a script available which determines which points are throwing errors in the OBCS check and sets their boundaries so that the error is corrected.

EXCH2

The Extended Cubed Sphere Topology (exch2) package is a powerful package that can be used to control the processor decomposition of the simulation domain, and was originally designed to implement the cubed-sphere topology for spherical domains in the MITgcm. It is flexible enough to be used for other purposes, however, an example of which is the removal of non-fluid-containing tiles (blanks) from the processor decomposition. If a run contains large areas where there is only land, this functionality can reduce resource requirements considerably. An example of how this particular functionality of exch2 may be implemented is outlined below.

First, enable the exch2 package in packages.conf. Create a file called W2_EXCH2_SIZE.h in the same directory as packages.conf that looks as follows:

   CBOP
   C    !ROUTINE: W2_EXCH2_SIZE.h
   C    !INTERFACE:
   C    include W2_EXCH2_SIZE.h
   C    !DESCRIPTION: \bv
   C     *==========================================================*
   C     | W2_EXCH2_SIZE.h
   C     | Declare size of Wrapper2-Exch2 arrays
   C     *==========================================================*
   C     | Expected to be modified for unconventional configuration
   C     | (e.g., many blank-tiles) or specific topology.
   C     *==========================================================*
   CEOP
   C---   Size of Tiling topology structures
   C  W2_maxNbFacets   :: Maximum number of Facets (also and formerly called
   C                   :: "domains" or "sub-domains") of this topology.
   C  W2_maxNeighbours :: Maximum number of neighbours any tile has.
   C  W2_maxNbTiles    :: Maximum number of tiles (active+blank) in this topology
   C  W2_ioBufferSize  :: Maximum size of Single-CPU IO buffer.
          INTEGER W2_maxNbFacets
          INTEGER W2_maxNeighbours
          INTEGER W2_maxNbTiles
          INTEGER W2_ioBufferSize
          INTEGER W2_maxXStackNx
          INTEGER W2_maxXStackNy
          INTEGER W2_maxYStackNx
          INTEGER W2_maxYStackNy
   C---   Default values :
   C      (suitable for 6-face Cube-Sphere topology, compact global I/O format)
   C      W2_maxNbTiles = Nb of active tiles (=nSx*nSy*nPx*nPy) + Max_Nb_BlankTiles
   C      default assume a large Max_Nb_BlankTiles equal to Nb of active tiles
   C      resulting in doubling the tile number.
          PARAMETER ( W2_maxNbFacets = 10 )
          PARAMETER ( W2_maxNeighbours = 8 )
          PARAMETER ( W2_maxNbTiles = nSx*nSy*nPx*nPy * 2 )
          PARAMETER ( W2_ioBufferSize = W2_maxNbTiles*sNx*sNy )
          PARAMETER ( W2_maxXStackNx = W2_maxNbTiles*sNx )
          PARAMETER ( W2_maxXStackNy = W2_maxNbTiles*sNy )
          PARAMETER ( W2_maxYStackNx = W2_maxNbTiles*sNx )
          PARAMETER ( W2_maxYStackNy = W2_maxNbTiles*sNy )
   C- Note: Overestimating W2_maxNbFacets and, to less extent, W2_maxNeighbours
   C        have no or very little effects on memory footprint.
   C        overestimated W2_maxNbTiles does not have large effect, except
   C        through ioBufferSize (if related to, as here).
   C---+----1----+----2----+----3----+----4----+----5----+----6----+----7-|--+----|

If you anticipate that the number of land tiles will exceed the number of fluid-containing tiles (note that these are tiles in the processor decomposition, not grid cells), ensure that W2_maxNbTiles is set to the total number of processors you intend to use.

Next, edit SIZE.h so that nPx is the total number of processors you intend to use, and nPy is 1. Compile the MITgcm in this configuration.

Add a file called data.exch2 to your run directory that looks like the following:

   # EXCH2 Package: Wrapper-2 User Choice
   #--------------------
   #  preDefTopol   :: pre-defined Topology selector:
   #                :: = 0 : topology defined from processing "data.exch2";
   #                :: = 1 : simple, single facet topology;
   #                :: = 2 : customized topology (w2_set_myown_facets)
   #                :: = 3 : 6-facet Cube (3 face-dims: nRed, nGreen, nBlue).
   #  dimsFacets    :: facet pair of dimensions (n1x,n1y, n2x,n2y ...)
   #  facetEdgeLink :: Face-Edge connectivity map:
   #    facetEdgeLink(i,j)=XX.1 : face(j)-edge(i) (i=1,2,3,4 <==> N,S,E,W)
   #    is connected to Northern edge of face "XX" ; similarly,
   #    = XX.2 : to Southern.E, XX.3 = Eastern.E, XX.4 = Western.E of face "XX"
   #  blankList     :: List of "blank" tiles
   #  W2_mapIO      :: global map IO selector (-1 = old type ; 0 = 1 long line in X
   #                :: 1 = compact, mostly in Y dir)
   #  W2_printMsg   :: option for information messages printing
   #                :: <0 : write to log file ; =0 : minimum print ; 
   #                :: =1 : no duplicated print ; =2 : all processes do print
   #--------------------
    &W2_EXCH2_PARM01
    W2_printMsg= 0,
   # preDefTopol= 0,
     W2_mapIO   = 1,
     dimsFacets = NX, NY,
    &

where NX, NY are replaced with the true domain size (in grid cells) of your domain. Next, run your simulation using this setup for a single time-step so that the STDOUT files are output for each processor. Ensure for this step that the #undef SINGLE_DISK_IO flag is set.

After this has completed, process the STDOUT files to determine the list of blank processors. This can be done by executing the following terminal commands:

% GET THE BLANK TILES
grep Empty STDOUT.* > trash1.txt
%CROP BEGINNG OF THIS TEXT
sed -r 's .{47}  ' trash1.txt > trash2.txt
%AND FIX THE END
sed 's/ (bi,bj=   1   1 )/, /g' trash2.txt > trash3.txt
%NOW GRAB THE TILE NUMBERS
cat trash3.txt | cut -c 1-8 >  trash4.txt
%MAKE ALL ONE LINE
cat trash4.txt | tr '\n' ' ' > blanks.txt
%count how many blanks
wc -l trash4.txt 

Now copy the list of blanks into data.exch2. If you have, for example, 19 blank tiles, you would write:

 blankList(1:19)=1,2,3,4,5,6,7,8,9,10,19,24,28,29,30,31,32,33,34, 

When you do this ensure that, if you have many blank tiles, that the list is able to fit within fortran's character limit (per line), such that if you have many blank processors you need to split the list across various lines. Now recompile the run with nPx being smaller by the number of blanks (e.g. if it was 119, its now 100. You now only need 100 cores).

Finally, submit the run with the new, smaller number of processors, and it should now run with a smaller processor allocation.


Visualization tools

Some of the available tools for visualizing MITgcm data are:

  • The gcmpy python postprocessing routines exist for making quick animations of model output
  • 3D visualization with Visit. See Visualization, Visit, and Visit scripting
  • ParaView, using the netCDF files generated by gcmpy.