Skip to content

Latest commit

 

History

History
executable file
·
84 lines (66 loc) · 6.45 KB

README.md

File metadata and controls

executable file
·
84 lines (66 loc) · 6.45 KB

riverwidthEO version 1.2 (updated July 1st)

A python package that processes river segments using satellite imagery and machine learning to create surface area estimates (delivered in a csv file).

The package enables the user to process any set of user-defined points on rivers or process any of the pre-defined 3,576,396 points on rivers across the globe. Please see the example script for information on how to use the package.

Dataset

River Segment Surface Area Dataset provides pre-computed surface area variations for 8,710 river segments in Ethiopia using Sentinel2 imagery from 2015 till 2010. This dataset is availabe through the MINT DataCatalog, and can be downloaded using the jupyter notebook (available with the package) as well.

Background

In many earth science applications, calibration of physical models is a key challenge because ground observations are very scarce or completely absent in most regions. For example, hydrological models simulate the flow of water in a basin using physical principles, but necessarily contain numerous parameters (e.g., soil conductivity at different grid points) whose values need to be calibrated for each study region with the help of observations. The most commonly used observation is discharge (volume per second) estimates that are available through ground stations. These stations are costly to install and maintain, and thus are limited in number. This paucity (or complete absence) of observation data often leads to poorly calibrated models that provide incorrect predictions or have high uncertainty in practice.

Our approach is to provide this much needed calibration data using novel machine learning techniques and multi-temporal satellite imagery that is available freely from Earth Observing satellite based sensors such as Sentinel and Landsat. The latest version (version 1.2) of the package uses descarteslabs API to download Sentinel-2 imagery of any given river segment. The multi-spectral imagery is then converted into land/water maps using CNN based deep learning techniques. The area variations thus obtained can be used to constraint hydrological models. Watch this video to see surface area variations of a river segment in Ethiopia.

Installation

Docker

Use the Dockerfile to setup the docker image -

docker build -f Dockerfile -t <image_tag> .

Use the following command to use the docker to run the script -

sudo docker run -v <path_of_local_directory>:<docker_mount_path> -it <image_tag>

Set the descarteslabs API client and secret when you are in the docker image -

export DESCARTESLABS_CLIENT_ID=...
DESCARTESLABS_CLIENT_SECRET=...
Anaconda

Install anaconda if it is currently not installed -

wget https://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh
sha256sum Anaconda3-5.0.1-Linux-x86_64.sh
bash Anaconda3-5.0.1-Linux-x86_64.sh
source ~/.bashrc

set up the conda environment -

conda create --yes -n rweo numpy pandas tensorflow keras gdal shapely scikit-image fiona geopandas
source activate rweo
pip install s2cloudless
pip install progressbar
pip install descarteslabs
Descarteslabs API

setup the client id and secret -

export DESCARTESLABS_CLIENT_ID=...
export DESCARTESLABS_CLIENT_SECRET=...

Methodology

The package uses semantic segmentation based deep neural network architecture to convert a satellite image into a land/water mask. First, a CNN based auto-encoder network was pre-trained using 11,000 unlabeled images taken from rivers around the globe. The pre-trained network weights were then used to fine-tune a semantic segmentation network using 2,900 labeled images. To make the algorithms more robust to atmospheric disturbances, land/water masks were updated using physical principles. Specifically, pixels of a river segment do not change independently but are related to each other through hydraulic and bathymetric constraints. These constraints can be used to identify and correct physical inconsistencies in land/water labels obtained from machine learning algorithms. 
Limitations

The current version of the algorithm has three limitations that we intend to address in the next version:

  1. In certain cases clouds filter has omission errors which leads to underestimation of surface area by the algorithm.
  2. Similarly, in certain cases cloud shadows get incorrectly classified as water which leads to overestimation of surface area by the algorithm.
  3. The current version does not mask out water bodies adjacent to the river segment in the area calculation which could lead to overestimation of surface area for river segments.

changes in 1.2

Additional Training Data

The image classification model was updated by adding additional ~3,000 labeled images where the previous version was performing poorly. This model was initialized using a pre-trained model trained on ~90,000 cloud free images from all over Ethiopia.

changes in 1.1

Cloud Filter

The previous cloud filter was missing clouds and hazy images. The latest version address the cloud issue by adding a new cloud filtering strategy.

  • Cloudy and hazy images show high correlation between the aerosol band and the water vapor band which can be used to flag cloudy and hazy images.
  • Images where correlation between aerosol band and water vapor band is greater than 0.8 are considered as cloudy or hazy images.
  • For these images, the cloud probability threshold is set to 0.1 instead of 0.4 (previous version).
  • A side effect of this strategy is that river segment that are surrounded by bare soil also give high correlation. To handle this, we first check correlation values of all the images for that segment. If a vast majority of the images show high correlation values, then this strategy is not applied on those segments.
Additional Training Data

The image classification model was updated by adding additional 542 image samples where the previous version was performing poorly.

Spatial Clustering combined with Semantic Segmentation

The strategy to combine clustering and semantic segmentation based maps were updated. The latest version improves detection performance in the presence of haze. A new strategy to identify cloud shadows was added to reduce errors.