De Mey, P., 2007 : sequoiamanual.pdf 272.95 kB , Documentation package of SEQUOIA system 29pp.
Sequoia User Manual
Table of contents
 1 Installation
 2 Configuration
 3 Testing
 4 Bug Fixes
 5 Known Problems and Workarounds
 6 Release Notes and Version History
 7 Command Line
 8 Output Files
 9 Parameter Modules
 10 Parameter File
 11 SEQUOIA/Model Coupling Reference
 11.1 1. FORTRAN INTERFACES
 11.2 2. ESTIMATION GRID
 11.3 3. VERTICAL ORDERING: THE MULTIVARIATE VERTICAL SEQUENCE (MVS)
 11.4 4. VARIABLE TYPES AND OBSERVATION OPERATORS
 11.5 5. KD TREE SEARCH
 11.6 6. FORECAST ERROR STATISTICS
 11.7 7. DIMENSIONALITY
 11.8 8. COORDINATES
 11.9 9. TIME
 11.10 10. SEQUENCING AND COUPLING WITH THE NUMERICAL MODEL
 11.11 11. ENSEMBLE EXECUTION
 11.12 12. SUMMARY OF USERWRITTEN ROUTINES
 11.13 13. SUMMARY OF USERCALLABLE ROUTINES
 11.14 14. EVENT LOGGING
Installation
DISTRIBUTION TYPES
SEQUOIA is distributed in two forms: an installer, or individual tarballs.
INSTALLER
If you get an installer, uncompress it at the current location and run make install. Normally, the installer will contain the right tarballs and be configured by default for your type of application (analysis kernel, etc.), but of course the implementation require some tuning and some interfacing with a numerical model. If you want to check or change the configuration, have a look at what we say in the next sections.
INDIVIDUAL TARBALLS
If you get individual tarballs, they will be of the following types:
sequoiatreeVV.tgz  Install that archive first, this is the main SEQUOIA directory tree (VV=version) 
sequoiakernelsharedVV.tgz  Shared kernel library (VV=version) 
sequoiasapVV.tgz  SAP (sequential analysis platform) library (VV=version) 
sequoiadocVV.tgz  Documentation (VV=version) 
sequoiaKKKKVV.tgz  Specific analysis kernel library (KKKK=kernel) (VV=version) 
sequoiaKKKKulibUUUUVV.tgz  Specific user library (ulibUUUU=ULIB type) (KKKK=kernel) (VV=version) 
To run SEQUOIA, you must have (1) the tree, (2) the shared kernel library, (3) the SAP library, (4) one analysis kernel of your choice, and (5) one corresponding user library. Check the README files in each archive for Release Notes and Compatibility information.
At this time the following Analysis Kernels are available:
mantaray  Reducedorder analysis with 3D multivariate EOFs (akin to SEEK filter analysis) Kernel archive name: sequoiamantarayVV.tar.gz 
sofa  Reducedorder analysis with 1D vertical multivariate EOFs (akin to SOFA analysis) Kernel archive name: sequoiasofaVV.tar.gz 
beluga  Dualspace (observationspace) Kalmantype analysis Can be run in Ensemble Filter mode Kernel archive name: sequoiasofaVV.tar.gz 
User Libraries contain all of the userdefined code and parameters, including the interface with the numerical model, as well as the main programs (launchers). Typically at this time the following types of User Libraries (ULIBs) are available (note that some types of ULIBs only exist for some specific kernels and not others):
“ulib1”  User Library 1. Use this library as a skeleton for building your own User Library. (Contains a “null model” interface to be replaced by yours.) Note that the numerical model library itself does not have to be placed in an ULIB — its location can be defined via the macro EXTLIBS in sequoia.conf 
“ulibktest”  User Library for kernel tests. This shortcuts the SAP library and directly interfaces with the analysis kernel. 
“ulibCCCC”  User Library for particular test cases (CCCC=name of test case). This tests pretty much all of an installed and configured SEQUOIA system. 
Here are the general installation steps if you get individual tarballs (if you get an installer, this will be done automatically for you):
(1) Copy the sequoiatree tarball to the appropriate location of your choice. Then simply unpack the files (tar xzf sequoiatreeVV.tgz). This will create the SEQUOIA directory tree, including the documentation. cd to the root of the tree. Some of the main files and directories at the root of the tree are shown below:
sequoia.conf.default  The default configuration file 
Makefile  The main Makefile 
README  Information on the Contents, License, and Release Notes on this version 
gpl.htm  The GPL license — DO NOT REMOVE 
components/  The directory where plugin components are installed 
work/  Work directory where executables and parameter files normally are and where execution will occur, unless the WORKDIR macro is changed in sequoia.conf 
(2) Copy the desired tarballs to the components/ directory. cd components and unpack those tarballs.
Configuration
Here are the general configuration steps, to be repeated each time you modify the configuration:
(1) Optionally, install and unpack new tarballs in the components/ directory
(2) At the tree root, edit sequoia.conf and specify your own configuration parameters (new kernel, new ULIB, new compiler, these sorts of things)
(3) make config will configure your installation. Note that this also creates direct links to the active components: shak/ akern/ ulib/ sap/ doc/ . Then to access the active ULIB, no need to go to components/ and to the longname directory for that ULIB, just use cd ulib from the tree root. Please note that make config must be issued whenever something is changed in the configuration in sequoia.conf, such as upgrading or adding a component, or moving the whole tree to a new location.
(4) make from the tree root will compile everything and make the executable. make cleanall will clean all objects and libraries. make tarballall will make all tarballs, ready to be copied to a safe location.
Typically, you will have to interface a numerical model to work with with SEQUOIA. This is usually done by creating a new ULIB which will contain that interface. It is a good idea to start from an existing “ulib1” package for your analysis kernel (see what is said about the various ULIBs in the Installation section). Rename the existing “ulib1” package, reconfigure SEQUOIA to use it, and modify it to suit the needs of your interface. Guidelines on how to build an interface for your numerical model are given in the Coupling section. At this time, we are aware of existing interfaces for the following numerical models: SYMPHONIE, POLCOMS, POM, OPA/NEMO, MOM (the last three have been interfaced with the previous SOFA code and would require additional work to be coupled with SEQUOIA).
Testing
See what we say about ULIB libraries in the Installation section. See which testing ULIB tarballs are available for your kernel. Install those tarballs if necessary and reconfigure your installation.
To run the tests, see the README file and Makefile in the corresponding ULIB directory.
Bug Fixes
See the README file associated with the particular component you are interested in.
Known Problems and Workarounds
See the README file associated with the particular component you are interested in.
KP01 — sofa kernel — Need to introduce the notion of vertical distance making it possible to select several observations on the same vertical profile. Right now this is not possible since dist_rel==0 in that case. — Workaround: set AK_IF_THIN==.FALSE. but this makes the calculations costlier. This notion has been introduced in version 1.2 of the kernels.
KP02 — shared kernel library — The code does not work near the pole since distances use the local cartesian approximation. — Workaround: none in this version, unless the pole is rotated externally prior to analysis.
Release Notes and Version History
See the README file associated with the particular component you are interested in.
Command Line
When the SEQUOIA system has been properly compiled, a launcher is available for execution.
General case: The launcher is made at the location pointed to by the WORKDIR macro in sequoia.conf. SEQUOIA can then be launched as follows:
launcher_name parameter_file 
where “launcher_name” is the name of the launcher (defined by the LAUNCHER macro), and “parameter_file” is the name of the input parameter file.
Ensemble assimilation case: Each identical instance of the program is invoked with a member rank:
launcher_name m–mastermember_rank parameter_file 
Member instances can be run on the same machine or on different machines (for instance on a cluster with fast network connections).
Case of testing ULIBs: The executable is made at the root of the ULIB directory, or in a subdirectory. See the README file associated with the ULIB.
Output Files
In addition to exchanging information with the numerical model, the execution of SEQUOIA produces several types of output files depending on your configuration, including:
 A log output with detailed information about the configuration and the run (this is output to stdout but can be redirected to a file).
 An OffLine Analysis file named label.ola, where “label” is defined in the run parameters. The caribou program which is made along with the launcher and has been introduced in version 1.2.1 can be used to explore those files. More functions should appear in the next versions.
Parameter Modules
Fortran MODULEs containing static parameters can be found in almost all SEQUOIA components. They are named after something_mod.f . Here are a few of them which you may want to change if needed. Comments are provided in each file. In most cases, the default values are sufficient. In some particular cases (in red below), the default choices must be reviewed with more care as they may not work for your configuration.
shak/  ak_constants_mod.f (general constants) ak_efs_mod.f (ensemble forecasting system integration) ak_event_mod.f (message level and logging) ak_geo_mod.f (topology and grid definition parameters) ak_kdtree_lib.f (kD tree search) ak_obs_mod.f (datarelated parameters including maximum sizes, etc.) 
akern/  ak_params_mod.f (numerical switches) ak_stats_mod.f (error statistical parameters and sizes) 
Parameter File
SEQUOIA reads its variable parameters from the parameter file specified on the Command Line. Some of those parameters are needed by the Analysis Kernel while some others are needed by SAP. The input format, as well as the definition of each parameter, are respectively given in two usermodifiable files in ULIB: ulib/u_io_ak_prm.f and ulib/u_io_sap_prm.f. The standard format in “ulib1” uses FORTRAN NAMELIST statements. Each NAMELIST corresponds to the samename group below.
Please note that the list of parameters can change in future versions. The ultimate references remain ulib/u_io_ak_prm.f and ulib/u_io_sap_prm.f .
“LOG” GROUP: IDENTIFICATION PARAMETERS
CHARACTER(LEN=800) :: LABEL (sap/sap_mod.f)
The label of the run, which will be used to build output filenames. Default= “saprun”.
“OBS” GROUP: OBSERVATIONRELATED PARAMETERS
INTEGER :: AK_S (shak/ak_obs_mod.f)
Number of data files (also called datasets) in input. Default=1, maximum=AK_DATASETS.
CHARACTER(LEN=800) :: FNAME(AK_S) (shak/ak_obs_mod.f)
The names of the files containing data to be analyzed by SEQUOIA. No default.
INTEGER :: DFORMAT(AK_S) (shak/ak_obs_mod.f)
Specifies the data format associated with each data file. this is passed to u_io_data.f which must then select the appropriate format processing. No default.
INTEGER :: STATUS(AK_S) (shak/ak_obs_mod.f)
Specifies the status of observations in the dataset. Useful choices are indicated below (note that the actual numeric values are defined in shak/ak_obs_mod.f and can be different from the values shown in the table below!). No default.
STATUS  
0 (AK_REG)  this dataset contains normal observations to be read (this datasetlevel status definition will be superseded by observationlevel status definitions if they are different) 
1 (AK_VERIF)  this dataset contains verification observations which will be used to produce an innovation, but that part of the innovation will not be used in the analyses (this datasetlevel status definition will supersede observationlevel status definitions unless they are “bogus” in which case an error will be reported) 
2 (AK_GZR)  this dataset contains GZR (zeroguess) “bogus” observations: at these locations the model guess will be assumed to be zero (equivalent to objective analysis with no prior guess) (this datasetlevel status definition will supersede observationlevel status definitions) 
3 (AK_OZR)  this dataset contains OZR (observe zero) “bogus” observations, which come back to “observing” the climatology when assimilating fluctuations — this is a “statistical damping term” (this datasetlevel status definition will supersede observationlevel status definitions) 
5 (AK_ONC)  this dataset contains ONC (observe no change) “bogus” observations: at these locations the misfits are assumed to be zero (which is an information in itself) — useful for regularization and to avoid changing fields at some locations (this datasetlevel status definition will supersede observationlevel status definitions) 
REAL :: AMPMAX (shak/ak_obs_mod.f)
The maximum absolute value of the input data. Data values outside of the range [AMPMAX,+AMPMAX] are discarded. Default=1e10.
NOTE: In its present form, AMPMAX is not very useful as it does not depend on the dataset or data type, but it has been added for compatibility reasons with SOFA. It is better to use ulib/u_range.f which allows typedependent skimming of the innovation vector.
INTEGER :: QCRANGE(2) (shak/ak_obs_mod.f)
The acceptable min/max values for the data quality flag. Default=(/ 0, 0 /).
INTEGER :: NSEL(AK_S) (shak/ak_obs_mod.f) (sofa kernel only)
The maximum number of influential observations per dataset in the suboptimal data selection. Default=20.
REAL :: REDUN(AK_S) (shak/ak_obs_mod.f) (sofa kernel only)
The informational redundancy factor in the suboptimal data selection. Default=1.0.
“REGMASK” GROUP: REGIONAL MASKS
INTEGER :: NMASKS (shak/ak_geo_mod.f)
The number of regional masks to be defined for diagnostic calculations. Default=0.
NOTE: This feature is yet unimplemented
“SEQ” GROUP: TIME SEQUENCING PARAMETERS (see also this comment on time units)
REAL :: TTMIN (sap/sap_mod.f)
The starting date of the run. No default.
INTEGER :: NTIMES (sap/sap_mod.f)
The number of integration/analysis cycles. No default.
REAL :: TINT (sap/sap_mod.f)
The interval between analyses. No default.
LOGICAL :: IF_SMOOTHER (sap/sap_mod.f)
Chooses how the innovation in the past and future will be handled. Default=.FALSE.
IF_SMOOTHER  
.FALSE.  filter mode. Only innovation prior to the analysis time is used in the correction. The innovation is calculated as differences between the data and the model at the same time and location as the observations. 
.TRUE.  smoother mode. The innovation within TINT time distance of the analysis time (both in past and in future) is used. This requires running the model some time in the “future” in order to calculate the modeldata differences. 
SEQUOIA/Model Coupling Reference
1. FORTRAN INTERFACES
The detailed Fortran INTERFACEs to userwritten and usercallable routines are contained in ulib/u_interface_mod.f. That file contains the arguments lists, the types and description of the routine parameters, and the returned types of FUNCTIONs.
Please note that the INTERFACEs can change in future versions. The ultimate reference remains ulib/u_interface_mod.f.
2. ESTIMATION GRID
SEQUOIA can handle both finiteelement grids made up of triangular elements, and finitedifference grids in which the cells are quadrangular. Finitedifference quadrangles are actually split into two finiteelement triangles internally. Both types of cells can therefore be mixed in a single grid. This is useful for instance in order to reproduce a smoother coastline in coastal areas.
The estimation grid can have any geometry and any topology, including islands or lakes, and can lie in a plane or on a sphere. In the spherical case it can be periodic. However the builtin geometric routines will fail in polar or quasipolar situations.
At this time, there is another important limitation: SEQUOIA assumes that all state vector variables are available at each node of the estimation grid. This is due to the builtin interpolation algorithm which interpolates over triangles to observation locations. As a consequence, if required, the user must interpolate the missing model variables at SEQUOIA grid nodes in routines such as ulib/u_yf.f. For instance, if the numerical model is solved on an Arakawa B grid, the user has two solutions:
 Set up SEQUOIA on one of the overlapping model grids, i.e. either the velocity grid (on the nodes of which the velocity components are defined) or the mass grid (where all scalar variables are defined); whenever u_yf() asks the model for a forecast variable at a location where it does not exist (because it is defined on the “other grid”), the user must interpolate that variable on the B grid; then the user must interpolate the SEQUOIA correction on the whole B grid after analysis
 Set up SEQUOIA on the grid made up of the velocity grid + the mass grid; a similar interpolation must occur in u_yf(); but there is no need for any further interpolation of the correction since it is available on both components of the B grid.
The same type of comment holds for some types of finite elements where variables are defined at the middle of triangle sides and not at vertices.
The model grid is declared in the userwritten routine ulib/u_ingrid.f. That routine must define the grid variables and arrays listed in shak/ak_geo_mod.f: gridlevel properties, nodes and the way they are connected, grid parameters, etc. There are two ways of doing this in practice in u_ingrid.f:
 Simply USE ak_geo_mod and set the required variables and arrays using the comments given in shak/ak_geo_mod.f
 Use the kernel calls ak_geo_declare_grid(), ak_geo_declare_node(), ak_geo_declare_Dcell() (where D=3 or D=4 depending on whether a finiteelement cell or finitedifference cell is declared).
The second method is more errorproof and is preferred (it is the method used in the distribution files). Here are the INTERFACEs for those kernel calls as well as the description of the call parameters:
LOGICAL FUNCTION ak_geo_declare_grid( ak_nodes, ak_cells, ak_mvs, coord_type, pradius, orglon, orglat)  Declares gridlevel parameters. See argument details in shak/ak_geo_mod.f . 
INTEGER :: ak_nodes  Number of allocated grid nodes (this must be equal to or larger than the actual number of grid nodes) 
INTEGER :: ak_cells  Number of allocated triangular cells (this must be equal to or larger than the actual number of triangular cells) (reminder: one quadrangular cell = two triangular cells!) 
INTEGER :: ak_mvs  Number of allocated MVS slots on the vertical, i.e. max. number of variables on the vertical in the state vector 
INTEGER :: coord_type  Horizontal coordinate type: 0=cartesian, 1=tangentplane(e.g. betaplane) 2=spherical 
REAL :: pradius  When coord_type == 1, this is the planet radius in kilometers — A few choices:

REAL :: orglon, orglat  When coord_type == 1, these coordinates are the longitude and latitude of the origin of the tangent plane (point of tangency). 
LOGICAL FUNCTION ak_geo_declare_node( node_id, xnod, ynod, znod, nmvs, rmvs, tmvs)  Declares an individual node. The gridlevel parameters must have been defined beforehand. See argument details in shak/ak_geo_mod.f . 
INTEGER :: node_id  A unique node identification number 
REAL :: xnod, ynod  Horizontal node coordinates 
REAL, DIMENSION(nmvs) :: znod  Vertical node coordinates (“depths” or “altitudes”) in Packed MVS format, i.e. for all elements of the Reduced Multivariate Vertical Sequence (MVS) 
INTEGER :: nmvs  The number of elements of the Reduced Multivariate Vertical Sequence (MVS) at that location (this is typically the number of active state vector elements at that location, for instance the total size of the state vector minus the number of elements which are “under the seafloor”) 
REAL, DIMENSION(nmvs) :: rmvs  The Reduced Multivariate Vertical Sequence (MVS) at that location 
REAL, DIMENSION(nmvs) :: tmvs  The variable types in Packed MVS format, i.e. for all elements of the Reduced Multivariate Vertical Sequence (MVS) at that location. 
LOGICAL FUNCTION ak_geo_declare_3cell( node_ids)  Declares an individual triangular cell. The corresponding nodes must have been declared beforehand. See argument details in shak/ak_geo_mod.f . 
INTEGER, DIMENSION(3) :: node_ids  The identification numbers of the nodes located at the vertices of the triangle. 
LOGICAL FUNCTION ak_geo_declare_4cell( node_ids)  Declares an individual quadrangular cell. The corresponding nodes must have been declared beforehand. See argument details in shak/ak_geo_mod.f . 
INTEGER, DIMENSION(4) :: node_ids  The identification numbers of the nodes located at the vertices of the quadrangle. WARNING: Nodes must follow each other and be ordered in anticlockwise (direct) sequence around the quadrangle. 
3. VERTICAL ORDERING: THE MULTIVARIATE VERTICAL SEQUENCE (MVS)
SEQUOIA uses hypotheses on how things are ordered on the vertical. This is explained in more detail now.
Let us consider the list of prognostic model variables at one (x,y,t) location. Depending on the model, this list could look like {u(z),v(z),w(z),h,T(z),S(z)} with classic notations. In order to be restarted after correction, the model will need the new values of all these variables (which we call the “model state”) derived in a consistent manner.
It is generally possible to identify a short list of corrections which will in turn infer corrections on the rest of the model state. For instance, if we assumed geostrophy on increments, it would be sufficient to calculate corrections on {T(z),S(z),h}, or on {T(z),S(z),bstr} where h is the surface height anomaly and bstr is the barotropic transport streamfunction (for a rigidlid model). We will call this short list of variables defined at any (x,y,t) location the “full state” or “filter state”. We will call that sequence of variables the Multivariate Vertical Sequence, or MVS. For instance, the MVS might be:
 31 levels of temperature
 31 levels of salinity
 1 instance of barotropic transport streamfunction
which makes a sequence with a length of 63. Of course, sone of the “slots” in the sequence might be empty at some locations (e.g. when some model levels are not present because of bottom topography or even land).
Note that the full state needs not be composed of prognostic variables only. Diagnostic variables are also appropriate if they permit to go back to the complete model state and if they are observable.
At some location (x,y,t), SEQUOIA will expect a profile to appear in “Packed MVS” format. Let us assume that in the previous example there are only two model levels available. Therefore, only five slots will be filled in the MVS. Such a profile stored in MVS format would use too much space. In Packed MVS format, two vectors are defined: a vector of values, and a vector of pointers to the original MVS sequence. For our example:
 vector_1 = {T(1),T(2),S(1),S(2),bstr}
 vector_2 = {1,2,32,33,63}
vector_2 above is called a Reduced MVS or “RMVS” in the SEQUOIA system. vector_1 is said to be in a Packed MVS format. The index allowing to access elements of the RMVS and of MVSpacked arrays is called MVS index (in our example, the MVS index can vary between 1 and 5 at that location).
The Reduced MVS must be defined at every model gridpoint during the definition of the estimation grid.
4. VARIABLE TYPES AND OBSERVATION OPERATORS
As many variable types can be defined in SEQUOIA as needed within the limit of AK_TYPE in shak/ak_obs_mod.f. For instance, temperature, salinity, humidity, x/y components of wind, x/y components of currents, sea elevation, etc. are variable types. Variable types must be defined:
 At each node of the estimation grid, to qualify the model variables which are available there
 For each input observation. An observation operator must be provided for each observational variable type. Observation operators must be entered by the user in ulib/u_obsop.f (simple examples as well as comments are given in the distribution, see also the summary of userwritten routines). In SEQUOIA, observation operators are local and act only in the vertical and crossvariable dimensions.
Obviously, variable types definitions for the model must match the definitions for the observations (e.g. if temperature is type 2, if will be 2 both for the model and for the observations).
5. KD TREE SEARCH
By default, SEQUOIA uses fast search routines based on kD tree search (akin to multidimensional dichotomy). The parameters to switch off kD tree search or to modify its behavior are located in shak/ak_kdtree_lib.f. If kD tree search is switched off, slower exhaustive search is used. The results should be the same, but kD tree search is much faster if the data count and/or grid size are large.
6. FORECAST ERROR STATISTICS
The required forecast error statistics are defined dynamically at each analysis time in ulib/u_setstats.f. They can therefore be timedependent and calculated as part of the integration of a filter. SEQUOIA can obtain those error statistics in either one of the following ways:
 If IF_EFS=.TRUE. (Ensemble execution), the kernel will use ensemble statistics calculated from member samples gathered, e.g. from the cluster nodes (see ensemble execution). In this way, an Ensemble filter is programmed. For instance, for a fullorder kernel such as BELUGA, an Ensemble Kalman Filter can be programmed. If the kernel is a reducedorder kernel, a ReducedOrder Ensemble Kalman Filter is programmed.
 Otherwise, by default, error statistics will have to be directly specified in ulib/u_setstats.f. This will be the case if some sort of Optimal Interpolation scheme is programmed. If the kernel is a reducedorder kernel, a ReducedOrder Optimal Interpolation scheme will be programmed.
The actual statistics, fields and parameters which must be set inside ulib/u_setstats.f depend on the type of analysis kernel which is used. Examples follow: (see the distribution versions of ulib/u_setstats.f for more details)
 “sofa” reducedorder kernel: inverse of the simplification operator on the vertical (s_inv) in Packed MVS format, correlation radii (rcx, rcy, rct), size of the influence bubble (multiplicative factors rix_factor, riy_factor, rit_factor)
 “mantaray” reducedorder kernel: inverse of the 3D simplification operator (s_inv) in Packed MVS format, size of the influence bubble if analysis is local (ri_x, ri_y)
 “beluga” local, fullorder kernel: local representers and representer matrix, size of influence bubble (ri_x, ri_y)
The reducedorder kernels (sofa, mantaray) also call ulib/u_ef_std.f which must return the forecast error standard deviation in reduced state space.
7. DIMENSIONALITY
One natural way of solving an assimilation problem with SEQUOIA is to keep everything dimensional: the observation operator in u_obsop(), the forecast error statistics in u_setstats(), the analyzed state increment returned by ak_dxa(). However most analysis kernels involve an algebraic inversion of some matrix which can be illconditioned in multivariate problems depending on how units are chosen. It might therefore be more efficient to use a nondimensional version of the state vector. This comes back to changing the norm for the problem at hand.
In univariate problems, the norm of the state vector is simply the sum of its squares. In multivariate estimation problems, a norm which will not depend on the choice of units must be defined. In general, this comes back to defining a constant scale for each variable type (e.g., 0.5°C for temperature, etc.) and defining a derived norm by (1) normalizing the vector with the scales, then (2) calculating a standard norm (sum of squares). When temperature (or salinity, etc.) profiles are included in the state vector, it is good practice to divide the corresponding values by the square root of the number of levels so that the “profiles” part of the state vector does not dominate the norm.
If we nondimensionalize our problem along those lines:
 The values returned by u_obsop() (i.e. the elements of the observation operator) must be dimensional and must therefore have been multiplied by the ad hoc scales inside u_obsop()
 The forecast error statistics in u_setstats() must be nondimensional (they can be calculated from dimensional state vector samples whose elements have been divided by the scales)
 The analyzed state increment returned by ak_dxa() is nondimensional and must be multiplied by the scales before using it for correcting the model forecast
 The observations, innovation vector and observational error statistics remain dimensional.
The first two steps above are done automatically by the ak_geo_declare_scales() call (see shak/ak_geo_declare_scales.f).
8. COORDINATES
Grid coordinates and data coordinates must be in the same coordinate system (the coordinate system is set as part of the estimation grid definition). For horizontal coordinates:
coord_type == 0  all horizontal coordinates must be in cartesian units (e.g. km) 
coord_type == 1,2  all horizontal coordinates must be in degrees; the first coordinate is assumed to be longitude, and the second to be latitude 
The vertical coordinate is arbitrary but must be the same everywhere. The sole exception is the “ulib1” version of u_obsop.f which assumes that vertical coordinates are downwardincreasing depths (but that can be easily changed).
The horizontal scales (influence and correlation radii) needed by some kernels must be in cartesian coordinates (e.g. km).
9. TIME
All times in SEQUOIA are assumed to be Julian Dates or fractions thereof. All times (parameter file, data files, Fortran calls) must be in the same Julian calendar. Besides this, everything is possible, including using seconds or centuries as units; the time sequencing will be correct provided that everything is in the same unit.
10. SEQUENCING AND COUPLING WITH THE NUMERICAL MODEL
In the SEQUOIA system, the SAP component takes care of the sequencing and of the macrocoupling with (1) the analysis kernel, and (2) the numerical model.
SAP needs to control the model’s basic functions: initialization, integration, and conclusion. To that end, three LOGICAL FUNCTIONs must be provided by the user as part of the User Library to serve as an interface between SEQUOIA and the numerical model:
 u_modini(i_efs,t) must initialize the model for member i_efs at time t
 u_modrun(i_efs,t1,t2,t3) must correct the model fields at time t1 using the SEQUOIA analysis and integrate the model from time t1 to time t3 while saving a restart file at intermediate time t2
 u_modend(i_efs,t) must conclude the model run at time t
 In addition, the routine u_postan(), usually empty, is called as a postanalysis step during each assimilation cycle.
Again, details can be found in the distribution ULIBs, and in particular in ulib/u_interface_mod.f and u_nullmodel.f.
One possible handy way to split the model calculations into those three functions is to turn the main model PROGRAM into a SUBROUTINE and then to add ENTRY statements corresponding to those three sections. (The ENTRY statement is obsolete in Fortran90, but several models are still written in Fortran77.)
Inside u_modrun(i_efs,t1,t2,t3), we will find the following sections:
 Retrieve the restart fields at time t1
 Call ak_dxa() (see ulib/u_interface_mod.f) and correct the restart fields
 Timeloop from time t1 to time t3
 At the end of each time step, call ak_dy() to store the current innovation components
 At time t2, store fields to restart from on next iteration
One last item: at any time, the model must be able to provide a proxy (“model equivalent”) to any given observation which occurred during the current time step. This is done by means of u_yf(), which must be provided by the user. Note that u_yf() is a LOGICAL FUNCTION and can return .FALSE., which will result in rejecting the observation from the upcoming analysis. This mechanism permits for instance to omit observations which lie in a dynamically masked area such as a river plume, etc.
11. ENSEMBLE EXECUTION
At this time, Ensemble execution is only implemented with the BELUGA kernel, yielding a Local Ensemble Kalman Filter (LEnKF). There is no theoretical or practical obstacle to modifying the other kernels for Ensemble operation as well, but such an application has not been developed in SEQUOIA yet and would have to be programmed by the user.
Interprocess member synchronization is done by means of the ASEYA library, a part of the shared kernel library.
The ASEYA library has been developed for SEQUOIA but is available separately. It typically runs ensemble members as slaves of a master process which organizes “powwows” at different execution stages to synchronize their execution. The processes (identical instances of a SEQUOIA executable) can run on the same machine or on different machines (for instance on a cluster, with fast internal network connections). Typically in an ASEYA implementation a slave process is an ensemble member, while the master process takes care of the collection of data produced by the members in order to calculate the appropriate statistics for data assimilation. However, the master process can also be an ensemble member (with member rank 0) which will run the numerical model as well.
ASEYA has several nice features hardly found elsewhere:
 Delays and timeouts are managed and programmable (e.g. faulty communications do not hang Ensemble calculations forever)
 Hanging or crashed or otherwise unsatisfactory members or cluster nodes can be manually dropped at any time by the user, or automatically dropped after a timeout, making it possible to continue execution of the Ensemble with the remaining members and nodes
 Any member can kill itself (abort) as it encounters trouble, with the result that it will automatically and gracefully be dropped by the master and by the other members, allowing the execution to continue
 If the master aborts, all members will automatically abort
 It is possible to manually and simultaneously lock all members (e.g. in order to examine results or to see which ones should be dropped), and also to simultaneously abort all members.
This particular implementation of ASEYA is based on files. All members and the master must run in the same directory, which requires the use of NFS or similar technology on a cluster. It is then up to the user to decide how large data transfers to and from the master node are handled — as soon as they are ready, or as batches. Of course each member can also have its private files not shared with the master node (not crossexported).
Typically the steps needed for Ensemble execution of SEQUOIA on a cluster are as follows:
 Choose a master node, or the frontend, and export a working directory from that machine to all other cluster nodes where members will be run
 “make” the same SEQUOIA executable on all machines (make once and copy if architectures are the same)
 Run an instance of that executable on all machines, specifying the member rank on the command line.
More detailed information can be found in shak/aseya_lib.f .
12. SUMMARY OF USERWRITTEN ROUTINES
SEQUOIA calls routines which must be written by the user. They must be contained in the user’s customized User Library (ULIB). Their list and functions are summarized in the table below. The third columns shows the default versions which can be found in the “ulib1” version of the User Library. Detailed Fortran INTERFACEs can be found in ulib/u_interface_mod.f.
ROUTINE NAME  FUNCTION  DISTRIBUTION (sequoiaKKKKulib1VV.tar.gz) 
u_ingrid  sets the estimation grid  u_ingrid_FD.f: cartesian, rectangular, nonperiodic FD grid with one variable type and trivial MVS 
u_io_ak_prm  loads analysis kernel parameters  u_io_ak_prm.f: fully functional loader with NAMELIST input 
u_io_sap_prm  loads SAP parameters  u_io_sap_prm.f: fully functional loader with NAMELIST input 
u_io_data  loads data for the entire run  u_io_data.f: fully functional loader with choice of freeformat input (DFORMAT==0) or binary input (DFORMAT==1) 
u_modini  initializes the numerical model (cold start)  u_nullmodel.f: empty 
u_modrun  corrects the numerical model (warm start) and integrates it forward  u_nullmodel.f: only calls ak_dy() after each time step 
u_postan  perform postanalysis tasks  u_nullmodel.f: empty 
u_modend  perform numerical model conclusion tasks  u_nullmodel.f: empty 
u_yf  returns model forecast in data space  u_nullmodel.f: returns zero (null model) 
u_xf  returns model forecast in state space (needed by EFS only)  u_nullmodel.f: returns zero (null model) 
u_obsop  observation operator (see also Dimensionality)  u_obsop.f: simple observation operator with choice of sealevel (TYPE==1), temperature (TYPE==2), salinity (TYPE==3), uvelocity (TYPE=4), vvelocity (TYPE=5), vertical average (TYPE=6) 
u_efs  “ulib1”: not used “ulib2”: EFS integration; collect members results across cluster and calculate Ensemble assimilation statistics 
u_efs.f: “ulib1”: skeleton “ulib2”: contains skeletons of results query routines 
u_setstats  sets the forecast error statistics (see also Dimensionality)  u_setstats.f: kerneldependent simple statistics (e.g., stationary, homogeneous) 
u_ef_std  returns the reduced state space forecast error standard deviation in reduced order kernels (mantaray, sofa)  (for reducedorder kernels:) u_ef_std.f: kerneldependent simple statistics (e.g., stationary, homogeneous) 
u_range  checks if innovation is within acceptable range  u_range.f: no check (passthru) 
NOTE: u_wrola() and ulib/u_ola_lib.f do not exist anymore in version 1.2.1 (the OLA features are handled directly by the kernel)
13. SUMMARY OF USERCALLABLE ROUTINES
The table below lists SEQUOIA routines which need to be called by user code as part of the coupling between SEQUOIA and the numerical model:
ROUTINE NAME  FUNCTION  CALLER 
ak_dxa  gets the analyzed state increment (see also Dimensionality)  u_modrun: called initially by the numerical model in order to correct its forecast fields after an analysis 
ak_dy  updates the innovation vector at current model time  u_modrun: called by the numerical model at the very end of each time step 
ak_ea  gets the analysis error variance (see also Dimensionality)  only for information — not used in calculations 
14. EVENT LOGGING
Error reporting and most informational output is under the responsibility of the SEQUOIA event handler shak/ak_event.f. Events recognized in the standard version of ak_event() include: start of run, end of run, informational message, warning signal, fatal error signal. The behavior of SEQUOIA in those occurrences can be easily changed by modifying the code.
 Since all error codes are different, it is easy for instance in an operational system to intercept a particular error signal and launch a recovery procedure.
 It is also straightforward to add startup tasks and conclusion tasks.