-
Notifications
You must be signed in to change notification settings - Fork 13
Home
If you need additional help after reading this document, please contact the SI Team at siteam at gmao.gsfc.nasa.gov
This document provides an overview of GEOS GCM by describing a basic structure of the code, presentaing the supported computing environments where it is used, and how to obtain, compile and run it.
This page will point users of the GEOS GCM to documentation designed to help users of the GEOS GCM.
The GEOS GCM is made of up of variety of gridded components linked together by an infrastructure layer called MAPL based on ESMF. The model is based on a "fixture" which is a base repository which contains the necessary CMake and control files to build the model.
The GEOS GCM uses a Python utility called mepo to manage multiple git repositories instead of using other technologies like Git submodules. mepo
uses a YAML file that provides a list of components (and their versions) that are required for a particular configuration of GEOS GCM.
To learn more about mepo
, consult the following links:
- mepo command reference
- Suggested workflow for Feature development
- Suggested workflow for Science development
- Presentation about mepo
A "fixture" is what we call the "base" of the GEOS GCM. This repository, GEOSgcm, is the "fixture" for the GEOS GCM. If you clone it with git, you'll see that is it very light and there is no real source code. Instead, it is mainly just CMake code as well as an important file: components.yaml
. This YAML file is what controls how our model is laid out and what it consists of.
The main components of GEOS are other repositories in the GEOS-ESM GitHub organization. Examples include:
This components are laid out in the source tree by the components.yaml
file in the main fixture. In this file you'll see entries like:
GEOSgcm_GridComp:
local: ./src/Components/@GEOSgcm_GridComp
remote: ../GEOSgcm_GridComp.git
tag: v1.17.3
sparse: ./config/GEOSgcm_GridComp.sparse
develop: develop
FVdycoreCubed_GridComp:
local: ./src/Components/@GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSsuperdyn_GridComp/@FVdycoreCubed_GridComp
remote: ../FVdycoreCubed_GridComp.git
tag: v1.12.1
develop: develop
This file tells mepo to clone version v1.17.3
(tag
) of GEOSgcm_GridComp from the ../GEOSgcm_GridComp.git
repository (remote
, where ../
means its URL is relative to the fixtures) and put it on disk at ./src/Components/@GEOSgcm_GridComp
(local
). For more information about mepo, see the above links or contact the GMAO SI Team.
Most users of GEOS will build and run GEOS on NASA Supercomputing resources. This is mainly due to both the availability of the libraries needed to build and run GEOS as well as the boundary conditions, emissions, etc. needed to run the model.
In general, GEOS is supported at the NASA Center for Climate Simulation (NCCS) and the NASA Advanced Supercomputing (NAS). These following links contain pages at the two centers' documentation about using the systems:
GEOS does currently build Docker containers with each release, but this is still slightly unsupported and is more in a testing/experimental page. If you need information about this, please contact the SI Team.
The instructions to obtain, build, and run the GEOS GCM can be found in the main README for the fixture and often that is the canonical place to find instructions but we will provide here
Users are recommended to configure their shell start up files as below. For users of bash, for .bashrc
:
umask 0022
ulimit -s unlimited
# Run things in this if-block only if we're in an interactive shell
if [[ $- == *i* ]]
then
# Only put module use or other module commands here
# The below lines are to get the SI Team maintained modulefiles.
# Remove #NCCS if at NCCS and #NAS if at NAS
#NCCS module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
#NAS module use -a /nobackup/gmao_SIteam/modulefiles
module load GEOSenv
...
fi
and for users of csh
or tcsh
we recommend adding to .cshrc
or .tcshrc
:
umask 0022
limit stacksize unlimited
# Run things in this if-block only if we are in an interactive shell
if ($?prompt) then
# Only put module use or other module commands here
# The below lines are to get the SI Team maintained modulefiles.
# Remove #NCCS if at NCCS and #NAS if at NAS
#NCCS module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
#NAS module use -a /nobackup/gmao_SIteam/modulefiles
module load GEOSenv
...
endif
In the above codes, we run umask 0022
and the limit/ulimit
calls at all times and these are safe. The umask 0022
tells the shell to make all files and directories default readable. The limit/ulimit
calls set the stacksize to unlimited
which is needed by GEOS.
Finally, you'll see an if-block, this block is used for things you'd like in bash or tcsh, but that should only be done if in an interactive shell. You never want to run any module
commands in a non-interactive shell as it can have bad side-effects on scripts, other commands, etc. We've added some lines for loading our SI Team maintained modulefiles and a GEOSenv
metamodule in this block.
In your .bashrc
or .tcshrc
or other rc file add a line in the interactive-only section (as shown above):
module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
module use -a /nobackup/gmao_SIteam/modulefiles
On the GMAO desktops, the SI Team modulefiles should automatically be
part of running module avail
but if not, they are in:
module use -a /ford1/share/gmao_SIteam/modulefiles
You can also run this in any interactive window you have, or just re-source
your rc file. This allows you to get module files needed to correctly checkout and build the model.
At NASA centers, we maintain a module, GEOSenv
which provided more recent git
and CMake
modules as well as access to mepo
. You can get this by loading the GEOSenv
module:
module load GEOSenv
Again, as above, you should only add this to .bashrc
or .tcshrc
in the interactive-only block. Running module load
commands in shell startup files can have adverse effects otherwise.
On GitHub, there are three ways to clone the model: SSH, HTTPS, or GitHub CLI.
The first two are "git protocols" which determine how git
communicates with
GitHub (either through https or ssh). The latter is a CLI that uses either ssh or
https protocol underneath.
For developers of GEOS, the SSH git protocol is recommended as it can avoid some issues if two-factor authentication (2FA) is enabled on GitHub.
If you do not yet have a GitHub account and wish to develop GEOS, you will need to sign up for one. We recommend a username that "maps" to you if possible. For example, if your NASA AUID is jdoe
try for jdoe
at GitHub. But any username will work and we highly recommend you add your full name to your profile so that it can be easily looked for when using GitHub.
Note that GEOS-ESM requires users to have two-factor authentication enabled on GitHub for security reasons.
If you are a NASA employee, you will also need to be added to the GEOS-ESM team so you can have write permissions to the GEOS-ESM repos. To obtain this, please send an email to the SI Team at siteam at gmao.gsfc.nasa.gov
with your name and GitHub username.
Note that at the moment non-NASA contributors to GEOS should use forks to contribute.
Before you use git to make changes to the model, you should make sure you have git itself set up to sign commits correctly. Please run:
git config --global user.name "First Last"
git config --global user.email "email@domain.com"
Note, the email address you set above should be an address your GitHub account knows. You can follow this page to add emails to your GitHub account.
To clone the GEOSgcm code using the SSH url (starts with git@github.com
), issue the command:
git clone -b vX.Y.Z git@github.com:GEOS-ESM/GEOSgcm.git
where vX.Y.Z
is a tag from a GEOSgcm release. Note if you don't use -b
, you will get the main
branch and that can change from day-to-day.
If this is your first time using GitHub with any SSH URL, you might get this error:
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
If you do see this, you need to upload an ssh key to your GitHub account. This needs to be done on any machine that you want to use the SSH URL through.
Issues have been seen when using SSH access to GitHub with the update at NAS to TOSS4. There are two possible solutions for this. One replaces your SSH key and the other adds a new key and uses it. The former is probably the easiest method, but if you have concerns that that RSA key might be used elsewhere, adding a new key might be safer.
Replacing SSH Key
This section assumes that you are using an RSA key to interact with GitHub from NAS. The steps are as follows (assuming the key used on GitHub is id_rsa.pub
):
- Go to the GitHub SSH Keys page and remove the NAS RSA key.
- At NAS, replace the old RSA key (below we remove the old key, but if you want you can rename it):
cd $HOME/.ssh
rm id_rsa.pub id_rsa
ssh-keygen -t rsa -b 4096
- Upload the new
id_rsa.pub
to GitHub
Create and Use New RSA Key
In this method, we will make a new RSA key and use that
- Create a new RSA key
cd $HOME/.ssh
ssh-keygen -t rsa -b 4096 -f id_rsa_new
- Upload this new RSA key to GitHub
- Edit
.ssh/config
to tell GitHub to use the new key. For this, create or add to a section in.ssh/config
like:
Host github.com
ForwardX11 no
IdentityFile ~/.ssh/id_rsa_toss4
IdentitiesOnly yes
This is necessary because, by default, SSH will always offer id_rsa
first if it exists and if that key is still at GitHub, the bad behavior will persist.
To clone the model through HTTPS you run:
git clone -b vX.Y.Z https://github.com/GEOS-ESM/GEOSgcm.git
where vX.Y.Z
is a tag from a GEOSgcm release. Note if you don't use -b
, you will get the main
branch and that can change from day-to-day.
Note that if you use the HTTPS URL and have 2FA set up on GitHub, you will need to use personal access tokens as a password.
You can also use the GitHub CLI with:
gh repo clone GEOS-ESM/GEOSgcm -- -b vX.Y.Z
where vX.Y.Z
is a tag from a GEOSgcm release. Note if you don't use -b
, you will get the main
branch and that can change from day-to-day.
Note that when you first use gh
, it will ask what your preferred git protocol
is (https or ssh) to use "underneath". The caveats above will apply to whichever
you choose.
Regardless of the cloning method you use, you will get the directory GEOSgcm/
.
To build build the model, you need to first go to the GEOSgcm/
directory:
cd GEOSgcm
Fromthe head node, run the parallel_build.csh
script:
./parallel_build.csh
Doing so will checkout all the external repositories of the model and build it. When done, the resulting model build will be found in build/
and the installation will be found in install/
with setup scripts like gcm_setup
and fvsetup
in install/bin
.
parallel_build.csh
provides a special flag for checking out the
development branches of GEOSgcm_GridComp, GEOSgcm_App, GMAO_Shared, and GEOS_Util. If you run:
./parallel_build.csh -develop
then mepo
will run:
mepo develop GEOSgcm_GridComp GEOSgcm_App GMAO_Shared GEOS_Util
To obtain a debug version, you can run
./parallel_build.csh -debug
which will build with debugging flags. This will build in build-Debug/
and install into install-Debug/
.
Note that running with parallel_build.csh
will create and install a tarfile of the source code at build time. If you wish to avoid
this, run parallel_build.csh
with the -no-tar
option:
./parallel_build.csh -no-tar
While parallel_build.csh
has many options, it does not cover all possible CMake options possible in GEOSgcm. If you wish to
pass additional CMake options to parallel_build.csh
, you can do so by using --
and then the CMake options. Note that anything
after the --
will be interpreted as a CMake option, which could lead to build issues if not careful.
For example, if you want to build a develop Debug build on Cascade Lake while turning on StratChem reduced mechanism and the CODATA 2018 options:
parallel_build.csh -develop -debug -cas -- -DSTRATCHEM_REDUCED_MECHANISM=ON -DUSE_CODATA_2018_CONSTANTS=ON
As noted above all the "regular" parallel_build.csh
options must be listed before the --
flag.
The steps detailed below are essentially those that parallel_build.csh
performs for you. Either method should yield identical builds.
The GEOS GCM is comprised of a set of sub-repositories. These are
managed by a tool called mepo. To
clone all the sub-repos, you can run mepo clone
inside the fixture:
cd GEOSgcm
mepo clone
This command initializes the multi-repository and clones and assembles all the sub-repositories according to
components.yaml
To get development branches of GEOSgcm_GridComp, GEOSgcm_App, GMAO_Shared, and GEOS_Util (a la
the -develop
flag for parallel_build.csh
, one needs to run the
equivalent mepo
command. As mepo itself knows (via components.yaml
) what the development branch of each
subrepository is, the equivalent of -develop
for mepo
is to
checkout the development branches of GEOSgcm_GridComp, GEOSgcm_App, GMAO_Shared, and GEOS_Util:
mepo develop GEOSgcm_GridComp GEOSgcm_App GMAO_Shared GEOS_Util
This must be done after mepo clone
as it is running a git command in
each sub-repository.
On tcsh:
source @env/g5_modules
or on bash:
source @env/g5_modules.sh
We currently do not allow in-source builds of GEOSgcm. So we must make a directory:
mkdir build
The advantages of this is that you can build both a Debug and Release version with the same clone if desired.
CMake generates the Makefiles needed to build the model.
cd build
cmake .. -DBASEDIR=$BASEDIR/Linux -DCMAKE_Fortran_COMPILER=ifort -DCMAKE_INSTALL_PREFIX=../install
This will install to a directory parallel to your build
directory. If you prefer to install elsewhere change the path in:
-DCMAKE_INSTALL_PREFIX=<path>
and CMake will install there.
To obtain a debug version, you can should add:
-DCMAKE_BUILD_TYPE=Debug
which will build with debugging flags.
Note that running with parallel_build.csh
will create and install a tarfile of the source code at build time. But if CMake is run by hand, this is not the default action (as many who build with CMake by hand are developers and not often running experiments). In order to enable this at install time, add:
-DINSTALL_SOURCE_TARFILE=ON
to your CMake command.
make -jN install
where N
is the number of parallel processes. On discover head nodes, this should only be as high as 2 due to limits on the head nodes. On a compute node, you can set N
has high as you like, though 8-12 is about the limit of parallelism in our model's make system.
Once the model has built successfully, you will have an install/
directory in your checkout. To run gcm_setup
go to the install/bin/
directory and run it there:
cd install/bin
./gcm_setup
You can find more information to run the model in either atmosphere/data ocean mode (aka AMIP):
or coupled atmosphere/ocean mode: