Satlas aims to provide open AI-generated geospatial data that is highly accurate, available globally, and updated on a frequent (monthly) basis. One of the data applications in Satlas is globally generated Super-Resolution imagery for 2023.
This repository contains the training and inference code for the AI-generated Super-Resolution data found at https://satlas.allen.ai.
The training and validation data is available for download at this link.
The weights for our models, with varying number of Sentinel-2 images as input are available for download at these links:
The dataset consists of image pairs from Sentinel-2 and NAIP satellites, where a pair is a time series of Sentinel-2 images that overlap spatially and temporally [within 3 months] with a NAIP image. The imagery is from 2019-2020 and is limited to the USA.
The images adhere to the same Web-Mercator tile system as in SatlasPretrain.
There are two training sets: the full set, consisting of ~44million pairs and the urban set, with ~1.1 million pairs from locations within a 5km radius of cities in the USA with a population >= 50k.
There is one small validation set consisting of 30 image pairs that were held out for qualitative assessment.
Additionally, there is a test set containing eight 16x16 grids of Sentinel-2 tiles from interesting locations including Dry Tortugas National Park, Bolivia, France, South Africa, and Japan.
The NAIP images included in this dataset are 25% of the original NAIP resolution. Each image is 128x128px.
In each set, there is a naip
folder containing images in this format: naip/image_uuid/tci/1234_5678.png
.
For each NAIP image, there is a time series of corresponding 32x32px Sentinel-2 images. These time series are saved as pngs in the
shape, [number_sentinel2_images * 32, 32, 3]
. Before running this data through the models, the data is reshaped to
[number_sentinel2_images, 32, 32, 3]
.
In each set, there is a sentinel2
folder containing these time series in the format: sentinel2/1234_5678/X_Y.png
where
X,Y
is the column and row position of the NAIP image within the current Sentinel-2 image.
Our model is an adaptation of ESRGAN, with changes that allow the input to be a time series of Sentinel-2 images. All models are trained to upsample by a factor of 4.
To train a model on this dataset, run the following command, with the desired configuration file:
python -m ssr.train -opt ssr/options/urban_set_6images.yml
Make sure the configuration file specifies correct paths to your downloaded data.
Add the --debug
flag to the above command if wandb logging, model saving, and visualization creation
is not wanted.
To run inference on the provided validation or test sets, run the following command
(--data_dir
should point to your downloaded data):
python -m ssr.infer --data_dir satlas-super-resolution-data/{val,test}_set/sentinel2/ --weights_path PATH_TO_WEIGHTS --n_s2_images NUMBER_S2_IMAGES --save_path PATH_TO_SAVE_OUTPUTS
When running inference on an entire Sentinel-2 tile (consisting of a 16x16 grid of chunks), there is a --stitch
flag that will
stitch the individual chunks together into one large image.
Try this feature out on the test set:
python -m ssr.infer --data_dir satlas-super-resolution-data/test_set/sentinel2/ --stitch
There are instances where the generated super resolution outputs are incorrect.
Specifically:
- Sometimes the model generates vessels in the water or cars on a highway, but because the input is a time series of Sentinel-2 imagery (which can span a few months), it is unlikely that those things persist in one location.
- Sometimes the model generates natural objects like trees or bushes where there should be a building, or vice versa. This is more common in places that look vastly different from the USA, such as the example below in Kota, India.
Thanks to these codebases for foundational Super-Resolution code and inspiration:
If you have any questions, please email piperw@allenai.org
.