This repository represents a dockerized science processor for generating an ARIA Sentinel-1 Geocoded Unwrapped Interferogram (ARIA-S1-GUNW) product from a collection of valid Sentinel-1 IW-mode Single Look Complex (SLC) IDs across a date pair using ISCE2. The ARIA-S1-GUNW (or simply a GUNW) is an official NASA product. The initial development of the GUNW was done under the Getting Ready for NISAR initiative and a collection of related ARIA-funded projects. This work has continued under the Project Enabling Cloud-Based InSAR Science for an Exploding NASA InSAR Data Archive (ACCESS19-0023) funded under the ACCESS program. A description of the product can be found here: https://aria.jpl.nasa.gov/products/standard-displacement-products.html
This processor plugs into the HyP3 platform and therefore can spawn processing at scale from an API. All of the necessary datasets required for processing are determined from the input SLC IDS and then downloaded from public APIs. Thus, this repository accomplishes two goals:
- Integrates into Hyp3 platform so that this processing unit can be called directly from an API or the hyp3-sdk to generate ARIA GUNWs.
- Fashions a command line interface (CLI) for generating the GUNWs for local study and research.
We note all the input datasets are publicly available using a NASA Earthdata account. This codebase can be run locally both within a conda environment or within a docker container. To generate a GUNW, one needs to only specify valid SLC IDs that span a repeat-pass for a specific Sentinel-1 viewing geometry. The main
branch of this repository is the stable release deployed via the HyP3 platform and can be accessed via the appropriate API.
TopsApp is a an ISCE2 InSAR workflow for Sentinel-1 constellation SLCs corresponding to a repeat-pass date pair. ISCE2 TopsApp generates numerous SAR analysis ready products including a geocoded unwrapped interferogram. The ARIA GUNW product packages the ISCE2 analysis ready data products into a NISAR netcdf file as discussed here. The ARIA project has generated numerous GUNWS over numerous Sentinel-1 tracks and for numerous date pairs (here are some GUNWs over JPL). These products were first generated using the topsApp Product Generation Executable (PGE) written by Mohammed Karim and David Bekaert in the ariamh repo. The repo was later reorganized here. The current processor (also dubbed a plugin) is adapted from these two repositories adding necessary localization of datasets required so that this processor can be called via an API.
- Clone this repo
git clone https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsApp.git
- Navigate with your terminal to the repo.
- Create a new environment and install requirements using
conda env update --file environment.yml
(or usemamba
to speed install up) - Install the package from cloned repo using
python -m pip install -e .
ISCE2
requires Intel x86_64
complied, conda-forge packages. Please follow the directions here i.e.
CONDA_SUBDIR=osx-64 conda create -n topsapp_env python
conda activate topsapp_env
conda config --env --set subdir osx-64
Then check
python -c "import platform;print(platform.machine())" # Should print "x86_64"
echo "CONDA_SUBDIR: $CONDA_SUBDIR" # Should print "CONDA_SUBDIR: osx-64"
- Ensure that your
~/.netrc
file has:The firstmachine urs.earthdata.nasa.gov login <username> password <password> machine dataspace.copernicus.eu login <username> password <password>
username
/password
pair are the appropriate Earthdata Login credentials that are used to access NASA data. The second pair are your credentials for the Copernicus Data Space Ecosystem. This file is necessary for downloading the Sentinel-1 files, and auxiliary data. Additionally, therequests
library automatically uses credentials stored in the~/.netrc
for authentification when none are supplied.
Make sure you have ~/.netrc
as described above. Run the following command:
isce2_topsapp --reference-scenes S1A_IW_SLC__1SDV_20220212T222803_20220212T222830_041886_04FCA3_2B3E \
S1A_IW_SLC__1SDV_20220212T222828_20220212T222855_041886_04FCA3_A3E2 \
--secondary-scenes S1A_IW_SLC__1SDV_20220131T222803_20220131T222830_041711_04F690_8F5F \
S1A_IW_SLC__1SDV_20220131T222828_20220131T222855_041711_04F690_28D7 \
--frame-id 25502
Add > topsapp_img.out 2> topsapp_img.err
to avoid unnecessary output to your terminal and record the stdout and stderr as files.
This is reflected in the sample_run.sh
.
To be even more explicity, you can use tee
to record output to both including > >(tee -a topsapp_img.out) 2> >(tee -a topsapp_img.err >&2)
.
Each ARIA-S1-GUNW at the ASF that ensures that down-stream analysis by ARIA-Tools and Mintpy is done consistently and reproducibly. There are a number of exposed parameters in this plugin that we require to be set in a certain manner for a product to be considered "standard". We now discuss the standard parameters with respect to this plugin.
Since v3+, in addition to reference and secondary scenes, a frame-id
must be supplied for a standard product to be generated. This effectively restricts processing and the resulting product to be within this frame (technically, all bursts within the frame are included in the standard product). The geojson of spatially-fixed frames with their ids can be downloaded here. These are derived from ESA's burst map. More information about finding SLC pairs and their corresponding pairs can be found here and the generation of our spatially fixed-frames is discussed here.
All standard products have the following layers:
- Data Layers (0.00083333333 deg or ~90 m at the equator)
- Unwrapped phase
- Coherence
- Connected compenents
- Unfiltered coherence - new in version 3❗
- InSAR amplitude
- Correction Layers
- Ionosphere (0.00916 deg or ~1 km at the equator) - new in version 3❗
- Solid earth tide (.1 deg or ~11 km at the equator) - new in version 3❗
- Tropo correction layers if HRRR available (see RAiDER) - new in version 3❗
- Geometry Layers (.1 deg or ~11 km)
- Incidence angle
- Azimuth angle
- Parallel baseline
- Perpendicular baseline
- Lat/lon grids
Again, tropo corrections are controlled via a separate step-function so is not included above. The repository is here. Turning off certain layers or adding available layers using the CLI arguments are permissible but will produce custom products (indicated with a prefix S1-GUNW_CUSTOM...
). The parameters are often simply exposing certain topsApp parameters discussed here. Our template for topsapp that is utilized for ISCE is found here.
The command line string and relevant plugin version used to generate every product is included in the product itself and can be used to reproduce a product. These are attributes in the top level netcdf group.
We note that the ionosphere correction layer is the (hard) work of Marin Govorcin and David Bekaert, which utilizes ISCE2 in a creative fashion. Users should refer to this file for the process.
Below indicates all available arguments for product generation and parameters required for standard product generation (again, for a given pairing and frame, one must use the enumeration of pairs described here). Use isce2_topsapp --help
for more information of available arguments.
isce2_topsapp --reference-scenes S1A_IW_SLC__1SDV_20220212T222803_20220212T222830_041886_04FCA3_2B3E \
S1A_IW_SLC__1SDV_20220212T222828_20220212T222855_041886_04FCA3_A3E2 \
--secondary-scenes S1A_IW_SLC__1SDV_20220131T222803_20220131T222830_041711_04F690_8F5F \
S1A_IW_SLC__1SDV_20220131T222828_20220131T222855_041711_04F690_28D7 \
--frame-id 25502 # latitude aligned ARIA spatially fixed frame\
--estimate-ionosphere-delay True # ionosphere correction layers\
--esd-coherence-threshold -1. # if -1, ESD is not used; else should be a value in (0, 1)\
--compute_solid_earth_tide True \
--goldstein-filter-power 0.5 # the power of the patch FFT filter used in the Goldstein filter\
--output-resolution 90 # either 30 or 90 meters\
--unfiltered-coherence True # this adds an unfiltered coherence layer\
--dense-offsets False # adds layers that compute patch wise correlation measurement done in range and azimuth which are helpful after significant surface changes\
or as a json:
{
"reference_scenes": [
"S1A_IW_SLC__1SDV_20220212T222803_20220212T222830_041886_04FCA3_2B3E",
"S1A_IW_SLC__1SDV_20220212T222828_20220212T222855_041886_04FCA3_A3E2"
],
"secondary_scenes": [
"S1A_IW_SLC__1SDV_20220131T222803_20220131T222830_041711_04F690_8F5F",
"S1A_IW_SLC__1SDV_20220131T222828_20220131T222855_041711_04F690_28D7"
],
"frame_id": 25502,
"estimate_ionosphere_delay": true,
"compute_solid_earth_tide": true,
"output_resolution": 90,
"unfiltered_coherence": true,
"goldstein_filter_power": 0.5,
"dense_offsets": false,
"wrapped_phase_layer": false,
"esd_coherence_threshold": -1.0
}
-
When running locally with root privileges (i.e. at your local workstation), build the docker image using:
docker build -f Dockerfile -t topsapp_img .
In a managed cluster/server without root privileges, build the docker with arguments for your user's
UID
andGID
:docker build -f Dockerfile -t topsapp_img --build-arg UID=$(id -u) --build-arg GID=$(id -g) .
-
Create a directory to mount the data files so you can inspect them outside of your docker container. Call it
topsapp_data
. Navigate to it. Copy thesample_run.sh
in this directory, modifying it to add your Earthdata username and password e.g.isce2_topsapp --reference-scenes S1A_IW_SLC__1SDV_20220212T222803_20220212T222830_041886_04FCA3_2B3E \ S1A_IW_SLC__1SDV_20220212T222828_20220212T222855_041886_04FCA3_A3E2 \ --secondary-scenes S1A_IW_SLC__1SDV_20220131T222803_20220131T222830_041711_04F690_8F5F \ S1A_IW_SLC__1SDV_20220131T222828_20220131T222855_041711_04F690_28D7 \ --frame-id 25502 \ > topsapp_img.out 2> topsapp_img.err
-
Take a look around a docker container, mounting a volume built from the image with:
docker run -ti -v $PWD:/home/ops/topsapp_data --entrypoint /bin/bash topsapp_img
You can even run jupyter notebooks within the docker container mirroring ports with
-p 1313:1313
. -
Run the topsapp process within a docker container:
cd /home/ops/topsapp_data && conda activate topsapp_env && source /home/ops/topsapp_data/sample_run.sh
Create a new directory (for all the intermediate files) and navigate to it.
docker run -ti -v $PWD:/home/ops/topsapp_data topsapp_img \
--reference-scenes S1A_IW_SLC__1SDV_20220212T222803_20220212T222830_041886_04FCA3_2B3E \
S1A_IW_SLC__1SDV_20220212T222828_20220212T222855_041886_04FCA3_A3E2 \
--secondary-scenes S1A_IW_SLC__1SDV_20220131T222803_20220131T222830_041711_04F690_8F5F \
S1A_IW_SLC__1SDV_20220131T222828_20220131T222855_041711_04F690_28D7 \
--frame-id 25502 \
--username <username>
--password <password>
--esa-username <esa-username> \
--esa-password <esa-password> \
where the username
/password
are the Earthdata credentials for accessing NASA data. We note the command line magic of the above is taken care of the isce2_topsapp/etc/entrypoint.sh
(written by Joe Kennedy) which automatically runs certain bash commands on startup of the container, i.e. the run commands also calls the isce2_topsapp
command line function as can be seen here.
ISCE2, gdal, and xarray are hard to balance. Ideally, we would have a dependabot to increment packages and integration tests to make sure datasets are generated correctly with each update. Unfortunately, this is not currently the case. So, we are including some snippets (credit to Joseph Kennedy) for determining where packages might fail. We have some caps in our environment.yml file. This is how we find them. Sometimes even with our rather minimal integration tests and builds, in 24 hours, a new package can entirely throw something awry with respect to builds.
The easiest way to see what was the last working build, check out the docker images for the last build. latest
refers to the latest production build on main
. test
refers to the last build on dev
. But each merge to dev
gets an image that is recorded.
- Click one of the images and it will tell you how to download an image into docker e.g.
docker pull ghcr.io/access-cloud-based-insar/dockerizedtopsapp:0.2.2.dev148_gab75888
. - Load the image and get into interactive mode e.g.
docker run --entrypoint /usr/bin/bash -it --rm ghcr.io/access-cloud-based-insar/dockerizedtopsapp:0.2.2.dev136_ga2d5389 -l
- Check the packages
conda list | grep xarray
This is an open-source plugin and we welcome contributions. Because we use this plugin for producing publicly available datasets, there is some additional tests and requirements for any new features to be integrated.
- Clone or fork this repo. If you are a member of ACCESS or working on new features of this plugin, ask to become a member of this Github organization. The integration tests will be easier to run if you are pushing to branches of this repository as opposed to a fork (require organization secrets). This will ensure new features are more quickly integrated particularly into hyp3. Otherwise, please add secrets to your repository as indicate below.
- Navigate with your terminal to the repo.
- Create a new environment and install requirements using
conda env update --file environment.yml
- Activate the environment:
conda activate topsapp_env
- Install the package from cloned repo using
python -m pip install -e .
- Create a new branch with your feature.
Note: because forked repositories do not have access to the repositories secrets, you will have to add secrets to your repository. You will need to add your Earthdata username and password to the secrets as:
EARTHDATA_USERNAME=...
EARTHDATA_PASSWORD=...
The main entrypoint is in the __main__.py
file here.
The whole standard ++slc
workflow that generates the ARIA-S1-GUNW
product is summarized in this file.
The workflow takes 1.5 - 3 hours to complete.
Likely, when developing a small new feature, we need to modify only of this long running workflow e.g. the packaging of ISCE2 outputs into a netcdf file, staging/preparation of auxiliary data, etc.
Thus, for development, it is recommended to not rerun the workflow each time, but utilize the intermediate ISCE data files and the metadata stored as json by this plugin.
We have some sample notebooks to load the relevant metadata to make this "jumping" into the code slightly easier.
We note that ISCE2
generates a lot of interemdiate files.
For our workflow, this can be between 100-150 GBs of disk required.
So be warned!
We have a test suite, but it is far from complete.
The passing of the test suite do not guarantee successful generation of a GUNW!.
The test suite helps make sure portions of the cloud workflow, e.g. downloading metadata, a simulated CMR handshake, and packaging, occur as expected.
However, the plugin takes about 1.5 to 3 hours (depending on the number of corrections requested) to generate a final GUNW.
Therefore, these integration are not sufficient to permit a new release (see more instructions below).
Until we have a complete end-to-end test of the workflow (via Hyp3), any new feature cannot be integrated into official production (i.e. the main
branch).
As a first step, it is imperative to share the output of a new feature (i.e. the GUNW file and the command to generate it).
Here is a notebook that demonstrates how to compare GUNWs that will be very helpful: https://github.com/ACCESS-Cloud-Based-InSAR/DockerizedTopsapp-Debugging-NBs/blob/dev/2_Compare_GUNWs.ipynb
There are two branches: dev
and main
. The former is the "test" branch and the later is the "production" branch. When submitting jobs to Hyp3 there are two job types related to INSAR_ISCE_TEST
and INSAR_ISCE
. The former corresponds to the dev
branch and the latter to main
. This allows us to test the plugin in the cloud incrementally before transitioning new features to production. This is a design feature of Hyp3 (thanks, Joseph Kennedy!). Since end-to-end tests are required as noted above, this ensures that our production branch is always well tested.
There are several versions to keep track of:
- Dataset Version: this is the GUNW version and incremented manually only when the GUNW has changed with relation to the end-user. This is in the GUNW name, i.e.
S1-GUNW-D-R-042-tops-20220429_20210528-140902-00123W_00034N-PP-e677-v2_0_5
has version 2.0.5. It is manually set here - Software Version: Automatically tracked and incremented in CI/CD on merges to the production branch (
main
) - Container Version: Automatically tracked and incremented in CI/D on merges to
dev
(withtest
tag) and tomain
(withlatest
tag). The various images can be downloaded fromt he Packages sidebar (link).
The release workflows within the (ASF) CI/CD allows us to be more efficient both in terms of tracking the releases and pushing container images for cloud processing. Each merge to main
or dev
creates a new docker image that is published through the github registry. The test
tag is the latest image built to dev
. The latest
tag is the most up-to-date main
release for production. We can view various containers here.
The plugin's software version is governed by git tags and semantic versioning. The software version is incremented by merges to main
(according to major
/minor
/patch
labels). Any merge into dev
is a (patch) test release so until dev
is merged into main
, the release number will correctly not increment. Software version can be seen via import isce2_topsapp; isce2_topsapp.__version__
.
When releasing a new version i.e. merging dev
into main
, ASF CI/CD requires an update the changelog with correct version that will be captured by the major
/minor
/patch
bump. If you use the label bumpless
and only CI/CD workflows or documentation has been changed, the software version remains the same. Otherwise, use major
, minor
, or patch
. For many more details see these notes.
All new features will first be merged into dev
before being released (and merged into main
).
- If the software changes, please use:
major
,minor
, orpatch
labels on a PR for maintainers to track the type of changes you are making - Update the changelog so that the bump (depending on
major
/minor
/patch
) is consistent with what the next release will hold (you may be including your changes in a larger release) - Please share a sample GUNW (and the command to generate it) so that the team can understand how the final product has been changed - unfortunately the test suite does not ensure the workflow will complete end-to-end.
Here are more detailed notes.
Even though there are some integration tests and unit tests in our test suites, this CPU intensive workflow cannot be tested end-to-end using github actions (there is simply not enough memory and CPU for these workflows).
Therefore, even if all the tests pass, there is still a nontrivial chance a GUNW is not successfully generated (e.g. the topo
step of topsapp fizzles out because numpy
API was not successfully tracked in the latest ISCE2 release).
We request a sample GUNW be shared in a PR.
Ideally, a comparison of the GUNW created with a new branch and an existing one (as done in this notebook) is ideal.
Even if we go through careful accounting, once a PR is merged into dev
, we will use hyp3
to further inspect the new plugin e.g. through this notebook using a few sites.
It is important to use INSAR_ISCE_TEST
job to ensure the features from the dev
branch are used.
Only after these manual checks, will we continue with a release of the plugin.
-
The docker build is taking a long time.
Answer: Make sure the time is spent with
conda/mamba
not copying data files. The.dockerignore
file should ignore ISCE2 data files (if you are running some examples within this repo directory, there will be GBs of intermediate files). It's crucial you don't include unnecessary ISCE2 intermediate files into the Docker image as this will bloat it. -
Need to install additional packages such as vim in your container?
Answer: Login as root user to the container and install the additional packages.
Make sure you know the container_id (e.g.
docker ps -a
). Then do the following steps:$ docker start <container_id> $ docker exec --user root -ti <container_id> /bin/bash $ conda activate topsapp_env $ conda install <package> $ exit
Return to the terminal inside the container as non-root user:
docker exec -ti <container_id> /bin/bash