Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joss paper #114

Merged
merged 11 commits into from
Feb 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/workflows/draft_pdf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
on: [push]

jobs:
paper:
runs-on: ubuntu-latest
name: Paper Draft
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Build draft PDF
uses: openjournals/openjournals-draft-action@master
with:
journal: joss
# This should be the path to the paper within your repo.
paper-path: paper/paper.md
- name: Upload
uses: actions/upload-artifact@v1
with:
name: paper
# This is the output path where Pandoc will write the compiled
# PDF. Note, this should be the same directory as the input
# paper.md
path: paper/paper.pdf
83 changes: 83 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@

@article{shan:2022,
title = {Towards constraining soil and vegetation dynamics in land surface models: Modeling ASCAT backscatter incidence-angle dependence with a Deep Neural Network},
journal = {Remote Sensing of Environment},
volume = {279},
pages = {113116},
year = {2022},
issn = {0034-4257},
doi = {https://doi.org/10.1016/j.rse.2022.113116},
url = {https://www.sciencedirect.com/science/article/pii/S0034425722002309},
author = {Xu Shan and Susan Steele-Dunne and Manuel Huber and Sebastian Hahn and Wolfgang Wagner and Bertrand Bonan and Clement Albergel and Jean-Christophe Calvet and Ou Ku and Sonja Georgievska},
keywords = {ASCAT, Scatterometry, Radar, Vegetation, Land surface model, Machine learning, Deep Neural Network, Plant water dynamics, Soil moisture},
abstract = {A Deep Neural Network (DNN) is used to estimate the Advanced Scatterometer (ASCAT) C-band microwave normalized backscatter (σ40o), slope (σ′) and curvature (σ″) over France. The Interactions between Soil, Biosphere and Atmosphere (ISBA) land surface model (LSM) is used to produce land surface variables (LSVs) that are input to the DNN. The DNN is trained to simulate σ40o, σ′ and σ″ from 2007 to 2016. The predictive skill of the DNN is evaluated during an independent validation period from 2017 to 2019. Normalized sensitivity coefficients (NSCs) are computed to study the sensitivity of ASCAT observables to changes in LSVs as a function of time and space. Model performance yields a near-zeros bias in σ40o and σ′. The domain-averaged values of ρ are 0.84 and 0.85 for σ40o and σ′, compared to 0.58 for σ″. The domain-averaged unbiased RMSE is 8.6% of the dynamic range for σ40o and 13% for σ′, with land cover having some impact on model performance. NSC results show that the DNN-based model could reproduce the physical response of ASCAT observables to changes in LSVs. Results indicated that σ40o is sensitive to surface soil moisture and LAI and that these sensitivities vary with time, and are highly dependent on land cover type. The σ′ was shown to be sensitive to LAI, but also to root zone soil moisture due to the dependence of vegetation water content on soil moisture. The DNN could potentially serve as an observation operator in data assimilation to constrain soil and vegetation water dynamics in LSMs.}
}

@article{Forman:2014,
author = {Forman, B. and Reichle, Rolf},
year = {2014},
month = {06},
pages = {1-11},
title = {Using a Support Vector Machine and a Land Surface Model to Estimate Large-Scale Passive Microwave Brightness Temperatures Over Snow-Covered Land in North America},
volume = {8},
journal = {IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
doi = {10.1109/JSTARS.2014.2325780}
}

@article{XUE:2015,
title = {Comparison of passive microwave brightness temperature prediction sensitivities over snow-covered land in North America using machine learning algorithms and the Advanced Microwave Scanning Radiometer},
journal = {Remote Sensing of Environment},
volume = {170},
pages = {153-165},
year = {2015},
issn = {0034-4257},
doi = {https://doi.org/10.1016/j.rse.2015.09.009},
url = {https://www.sciencedirect.com/science/article/pii/S0034425715301322},
author = {Yuan Xue and Barton A. Forman},
keywords = {Sensitivity analysis, Machine learning, Brightness temperature, Snow, Data assimilation},
abstract = {Recent studies showed that machine learning (ML) algorithms (e.g., artificial neural network (ANN) and support vector machine (SVM)) reasonably reproduce passive microwave brightness temperature observations over snow-covered land as measured by the Advanced Microwave Scanning Radiometer (AMSR-E). However, these studies did not explore the sensitivities of the ML algorithms relative to ML inputs in order to determine the behavior and performance of each algorithm. In this current study, normalized sensitivity coefficients are computed to diagnose ML performance as a function of time and space. The results showed that when using the ANN, approximately 20% of locations across North America are relatively sensitive to snow water equivalent (SWE). However, more than 65% of locations in the SVM-based brightness temperature (Tb) estimates are sensitive relative to perturbations in SWE at all frequency and polarization combinations explored in this study. Further, the SVM-based results suggest the algorithm is sensitive in both shallow and deep SWE, SWE with and without overlying forest canopy, and during both the snow accumulation and snow ablation seasons. Therefore, these findings suggest that compared with the ANN, the SVM could potentially serve as a more efficient and effective measurement model operator within a Tb data assimilation framework for the purpose of improving SWE estimates across regional- and continental-scales.}
}

@article{Forman:2017,
author = {Barton A. Forman and Yuan Xue},
title = {Machine learning predictions of passive microwave brightness temperature over snow-covered land using the special sensor microwave imager (SSM/I)},
journal = {Physical Geography},
volume = {38},
number = {2},
pages = {176-196},
year = {2017},
publisher = {Taylor & Francis},
doi = {10.1080/02723646.2016.1236606},
URL = {https://doi.org/10.1080/02723646.2016.1236606},
eprint = {https://doi.org/10.1080/02723646.2016.1236606}
}

@book{mccuen1998hydrologic,
title={Hydrologic Analysis and Design},
author={McCuen, R.H.},
isbn={9780131349582},
lccn={97044779},
series={Hewlett Packard Professional Books},
url={https://books.google.com.mt/books?id=qPdRAAAAMAAJ},
year={1998},
publisher={Prentice Hall}
}

@article{Hoyer_xarray_N-D_labeled_2017,
author = {Hoyer, Stephan and Joseph, Hamman},
doi = {10.5334/jors.148},
journal = {Journal of Open Research Software},
month = apr,
number = {1},
title = {{xarray: N-D labeled Arrays and Datasets in Python}},
volume = {5},
year = {2017}
}

@inproceedings{Rocklin2015DaskPC,
title={Dask: Parallel Computation with Blocked algorithms and Task Scheduling},
author={Matthew Rocklin},
booktitle={SciPy},
year={2015},
url={https://api.semanticscholar.org/CorpusID:63554230}
}
65 changes: 65 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: 'MOTrainer: Distributed Measurement Operator Trainer for Data Assimilation Applications'
tags:
- Python
- Measurement Operator
- Data Assimilation
- Machine Learning
- Kalman Filter
authors:
- name: Ou Ku
orcid: 0000-0002-5373-5209
affiliation: 1
- name: Fakhereh Alidoost
affiliation: 1
- name: Xu Shan
affiliation: 2
- name: Pranav Chandramouli
affiliation: 1
- name: Sonja Georgievska
affiliation: 1
- name: Meiert W. Grootes
affiliation: 1
- name: Susan Steele-Dunne
corresponding: true
affiliation: 2
affiliations:
- name: Netherlands eScience Center, Netherlands
index: 1
- name: Delft University of Technology, Netherlands
index: 2
date: 22 Dec 2023
bibliography: paper.bib
---

## Summary

Data assimilation (DA) is an essential procedure in Earth and environmental sciences, enabling physical model states to be constrained using observational data.

In the DA process, observations are integrated into the physical model through the application of a Measurement Operator (MO) – a connection model mapping physical model states to observations. Researchers have observed that employing a Machine-Learning (ML) model as a surrogate MO can bypass the limitations associated with using an overly simplified MO [@Forman:2014; @XUE:2015; @Forman:2017].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What process are you referring to in the first part of this sentence In the DA process,...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make a start to talk about Measurement Operators. So I stared with the function of MO in a DA process. Maybe I should say In a DA process, ....?


## Statement of Need

A surrogate MO, as a ML model is trained with the assumption that a single MO applies when mapping physical model states to observations. When dealing with a large spatio-temporal scale, multiple mapping processes may exist, prompting consideration for training separate MOs for distinct spatial and/or temporal partitions of the dataset. As the number of partitions increases, a challenge arises in distributing these training tasks effectively among the partitions.

To address this challenge, we developed a novel approach for distributed training of MOs. We present the open Python library `MOTrainer`, which to the best of our knowledge, is the first Python library catering to researchers requiring training independent MOs across extensive spatio-temporal coverage in a distributed manner. `MOTrainer` leverages Xarray's [@Hoyer_xarray_N-D_labeled_2017] support for multi-dimensional datasets to accommodate spatio-temporal features of input/output data of the training tasks. It provides user-friendly functionalities implemented with the Dask [@Rocklin2015DaskPC] library, facilitating the partitioning of large spatio-temporal data for independent model training tasks. Additionally, it streamlines the train-test data split based on customized spatio-temporal coordinates. The Jackknife method [@mccuen1998hydrologic] is implemented as an external Cross-Validation (CV) method for Deep Neural Network (DNN) training, with support for Dask parallelization. This feature enables the scaling of training tasks across various computational infrastructures.

`MOTrainer` has been employed in a study of vegetation water dynamics [@shan:2022], where it facilitated the mapping of Land-Scape Model (LSM) states to satellite radar observations.

## Tutorial

The `MOTrainer` package includes comprehensive [usage examples](https://vegewaterdynamics.github.io/motrainer/usage_split/), as well as tutorials for:

1. Converting input data to Xarray Dataset format: [Example 1](https://vegewaterdynamics.github.io/motrainer/notebooks/example_read_from_one_df/) and [Example 2](https://vegewaterdynamics.github.io/motrainer/notebooks/example_read_from_one_df/);

2. Training tasks on simpler ML models using `sklearn` and `daskml`: [Example Notebook](https://vegewaterdynamics.github.io/motrainer/notebooks/example_daskml/);

3. Training tasks on Deep Neural Networks (DNN) using TensorFlow: [Example Notebook](https://vegewaterdynamics.github.io/motrainer/notebooks/example_dnn/).

## Acknowledgements

The authors express sincere gratitude to the Dutch Research Council (Nederlandse Organisatie voor Wetenschappelijk Onderzoek, NWO) and the Netherlands Space Office for their generous funding of the MOTrainer development through the User Support Programme Space Research (GO) call, grant ALWGO.2018.036. Special thanks to SURF for providing valuable computational resources for MOTrainer testing via the grant EINF-339.

We would also like to thanks Dr. Francesco Nattino, Dr. Yifat Dzigan, Dr. Paco López-Dekker, and Tina Nikaein for the insightful discussions, which are important contributions to this work.

## References
Loading