-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joss paper #114
Merged
Merged
Joss paper #114
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
7b48123
add joss paper
rogerkuou 70d395f
add tutorials and acknowledgements
rogerkuou f26d169
Merge branch 'main' into joss_paper
rogerkuou 186b380
formatting
rogerkuou 1c56087
GH action for compiling padf
rogerkuou 0ccbdbd
formatting
rogerkuou e409616
Update paper/paper.md
rogerkuou 3c7b106
Apply suggestions from code review
rogerkuou 7575692
Update paper/paper.md
rogerkuou 42739eb
add Xarray and DASK citations
rogerkuou 3eaf03f
Apply suggestions from code review
rogerkuou File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
on: [push] | ||
|
||
jobs: | ||
paper: | ||
runs-on: ubuntu-latest | ||
name: Paper Draft | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
- name: Build draft PDF | ||
uses: openjournals/openjournals-draft-action@master | ||
with: | ||
journal: joss | ||
# This should be the path to the paper within your repo. | ||
paper-path: paper/paper.md | ||
- name: Upload | ||
uses: actions/upload-artifact@v1 | ||
with: | ||
name: paper | ||
# This is the output path where Pandoc will write the compiled | ||
# PDF. Note, this should be the same directory as the input | ||
# paper.md | ||
path: paper/paper.pdf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
|
||
@article{shan:2022, | ||
title = {Towards constraining soil and vegetation dynamics in land surface models: Modeling ASCAT backscatter incidence-angle dependence with a Deep Neural Network}, | ||
journal = {Remote Sensing of Environment}, | ||
volume = {279}, | ||
pages = {113116}, | ||
year = {2022}, | ||
issn = {0034-4257}, | ||
doi = {https://doi.org/10.1016/j.rse.2022.113116}, | ||
url = {https://www.sciencedirect.com/science/article/pii/S0034425722002309}, | ||
author = {Xu Shan and Susan Steele-Dunne and Manuel Huber and Sebastian Hahn and Wolfgang Wagner and Bertrand Bonan and Clement Albergel and Jean-Christophe Calvet and Ou Ku and Sonja Georgievska}, | ||
keywords = {ASCAT, Scatterometry, Radar, Vegetation, Land surface model, Machine learning, Deep Neural Network, Plant water dynamics, Soil moisture}, | ||
abstract = {A Deep Neural Network (DNN) is used to estimate the Advanced Scatterometer (ASCAT) C-band microwave normalized backscatter (σ40o), slope (σ′) and curvature (σ″) over France. The Interactions between Soil, Biosphere and Atmosphere (ISBA) land surface model (LSM) is used to produce land surface variables (LSVs) that are input to the DNN. The DNN is trained to simulate σ40o, σ′ and σ″ from 2007 to 2016. The predictive skill of the DNN is evaluated during an independent validation period from 2017 to 2019. Normalized sensitivity coefficients (NSCs) are computed to study the sensitivity of ASCAT observables to changes in LSVs as a function of time and space. Model performance yields a near-zeros bias in σ40o and σ′. The domain-averaged values of ρ are 0.84 and 0.85 for σ40o and σ′, compared to 0.58 for σ″. The domain-averaged unbiased RMSE is 8.6% of the dynamic range for σ40o and 13% for σ′, with land cover having some impact on model performance. NSC results show that the DNN-based model could reproduce the physical response of ASCAT observables to changes in LSVs. Results indicated that σ40o is sensitive to surface soil moisture and LAI and that these sensitivities vary with time, and are highly dependent on land cover type. The σ′ was shown to be sensitive to LAI, but also to root zone soil moisture due to the dependence of vegetation water content on soil moisture. The DNN could potentially serve as an observation operator in data assimilation to constrain soil and vegetation water dynamics in LSMs.} | ||
} | ||
|
||
@article{Forman:2014, | ||
author = {Forman, B. and Reichle, Rolf}, | ||
year = {2014}, | ||
month = {06}, | ||
pages = {1-11}, | ||
title = {Using a Support Vector Machine and a Land Surface Model to Estimate Large-Scale Passive Microwave Brightness Temperatures Over Snow-Covered Land in North America}, | ||
volume = {8}, | ||
journal = {IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing}, | ||
doi = {10.1109/JSTARS.2014.2325780} | ||
} | ||
|
||
@article{XUE:2015, | ||
title = {Comparison of passive microwave brightness temperature prediction sensitivities over snow-covered land in North America using machine learning algorithms and the Advanced Microwave Scanning Radiometer}, | ||
journal = {Remote Sensing of Environment}, | ||
volume = {170}, | ||
pages = {153-165}, | ||
year = {2015}, | ||
issn = {0034-4257}, | ||
doi = {https://doi.org/10.1016/j.rse.2015.09.009}, | ||
url = {https://www.sciencedirect.com/science/article/pii/S0034425715301322}, | ||
author = {Yuan Xue and Barton A. Forman}, | ||
keywords = {Sensitivity analysis, Machine learning, Brightness temperature, Snow, Data assimilation}, | ||
abstract = {Recent studies showed that machine learning (ML) algorithms (e.g., artificial neural network (ANN) and support vector machine (SVM)) reasonably reproduce passive microwave brightness temperature observations over snow-covered land as measured by the Advanced Microwave Scanning Radiometer (AMSR-E). However, these studies did not explore the sensitivities of the ML algorithms relative to ML inputs in order to determine the behavior and performance of each algorithm. In this current study, normalized sensitivity coefficients are computed to diagnose ML performance as a function of time and space. The results showed that when using the ANN, approximately 20% of locations across North America are relatively sensitive to snow water equivalent (SWE). However, more than 65% of locations in the SVM-based brightness temperature (Tb) estimates are sensitive relative to perturbations in SWE at all frequency and polarization combinations explored in this study. Further, the SVM-based results suggest the algorithm is sensitive in both shallow and deep SWE, SWE with and without overlying forest canopy, and during both the snow accumulation and snow ablation seasons. Therefore, these findings suggest that compared with the ANN, the SVM could potentially serve as a more efficient and effective measurement model operator within a Tb data assimilation framework for the purpose of improving SWE estimates across regional- and continental-scales.} | ||
} | ||
|
||
@article{Forman:2017, | ||
author = {Barton A. Forman and Yuan Xue}, | ||
title = {Machine learning predictions of passive microwave brightness temperature over snow-covered land using the special sensor microwave imager (SSM/I)}, | ||
journal = {Physical Geography}, | ||
volume = {38}, | ||
number = {2}, | ||
pages = {176-196}, | ||
year = {2017}, | ||
publisher = {Taylor & Francis}, | ||
doi = {10.1080/02723646.2016.1236606}, | ||
URL = {https://doi.org/10.1080/02723646.2016.1236606}, | ||
eprint = {https://doi.org/10.1080/02723646.2016.1236606} | ||
} | ||
|
||
@book{mccuen1998hydrologic, | ||
title={Hydrologic Analysis and Design}, | ||
author={McCuen, R.H.}, | ||
isbn={9780131349582}, | ||
lccn={97044779}, | ||
series={Hewlett Packard Professional Books}, | ||
url={https://books.google.com.mt/books?id=qPdRAAAAMAAJ}, | ||
year={1998}, | ||
publisher={Prentice Hall} | ||
} | ||
|
||
@article{Hoyer_xarray_N-D_labeled_2017, | ||
author = {Hoyer, Stephan and Joseph, Hamman}, | ||
doi = {10.5334/jors.148}, | ||
journal = {Journal of Open Research Software}, | ||
month = apr, | ||
number = {1}, | ||
title = {{xarray: N-D labeled Arrays and Datasets in Python}}, | ||
volume = {5}, | ||
year = {2017} | ||
} | ||
|
||
@inproceedings{Rocklin2015DaskPC, | ||
title={Dask: Parallel Computation with Blocked algorithms and Task Scheduling}, | ||
author={Matthew Rocklin}, | ||
booktitle={SciPy}, | ||
year={2015}, | ||
url={https://api.semanticscholar.org/CorpusID:63554230} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
--- | ||
title: 'MOTrainer: Distributed Measurement Operator Trainer for Data Assimilation Applications' | ||
tags: | ||
- Python | ||
- Measurement Operator | ||
- Data Assimilation | ||
- Machine Learning | ||
- Kalman Filter | ||
authors: | ||
- name: Ou Ku | ||
orcid: 0000-0002-5373-5209 | ||
affiliation: 1 | ||
- name: Fakhereh Alidoost | ||
affiliation: 1 | ||
- name: Xu Shan | ||
affiliation: 2 | ||
- name: Pranav Chandramouli | ||
affiliation: 1 | ||
- name: Sonja Georgievska | ||
affiliation: 1 | ||
- name: Meiert W. Grootes | ||
affiliation: 1 | ||
- name: Susan Steele-Dunne | ||
corresponding: true | ||
affiliation: 2 | ||
affiliations: | ||
- name: Netherlands eScience Center, Netherlands | ||
index: 1 | ||
- name: Delft University of Technology, Netherlands | ||
index: 2 | ||
date: 22 Dec 2023 | ||
bibliography: paper.bib | ||
--- | ||
|
||
## Summary | ||
|
||
Data assimilation (DA) is an essential procedure in Earth and environmental sciences, enabling physical model states to be constrained using observational data. | ||
|
||
In the DA process, observations are integrated into the physical model through the application of a Measurement Operator (MO) – a connection model mapping physical model states to observations. Researchers have observed that employing a Machine-Learning (ML) model as a surrogate MO can bypass the limitations associated with using an overly simplified MO [@Forman:2014; @XUE:2015; @Forman:2017]. | ||
|
||
## Statement of Need | ||
|
||
A surrogate MO, as a ML model is trained with the assumption that a single MO applies when mapping physical model states to observations. When dealing with a large spatio-temporal scale, multiple mapping processes may exist, prompting consideration for training separate MOs for distinct spatial and/or temporal partitions of the dataset. As the number of partitions increases, a challenge arises in distributing these training tasks effectively among the partitions. | ||
|
||
To address this challenge, we developed a novel approach for distributed training of MOs. We present the open Python library `MOTrainer`, which to the best of our knowledge, is the first Python library catering to researchers requiring training independent MOs across extensive spatio-temporal coverage in a distributed manner. `MOTrainer` leverages Xarray's [@Hoyer_xarray_N-D_labeled_2017] support for multi-dimensional datasets to accommodate spatio-temporal features of input/output data of the training tasks. It provides user-friendly functionalities implemented with the Dask [@Rocklin2015DaskPC] library, facilitating the partitioning of large spatio-temporal data for independent model training tasks. Additionally, it streamlines the train-test data split based on customized spatio-temporal coordinates. The Jackknife method [@mccuen1998hydrologic] is implemented as an external Cross-Validation (CV) method for Deep Neural Network (DNN) training, with support for Dask parallelization. This feature enables the scaling of training tasks across various computational infrastructures. | ||
|
||
`MOTrainer` has been employed in a study of vegetation water dynamics [@shan:2022], where it facilitated the mapping of Land-Scape Model (LSM) states to satellite radar observations. | ||
|
||
## Tutorial | ||
|
||
The `MOTrainer` package includes comprehensive [usage examples](https://vegewaterdynamics.github.io/motrainer/usage_split/), as well as tutorials for: | ||
|
||
1. Converting input data to Xarray Dataset format: [Example 1](https://vegewaterdynamics.github.io/motrainer/notebooks/example_read_from_one_df/) and [Example 2](https://vegewaterdynamics.github.io/motrainer/notebooks/example_read_from_one_df/); | ||
|
||
2. Training tasks on simpler ML models using `sklearn` and `daskml`: [Example Notebook](https://vegewaterdynamics.github.io/motrainer/notebooks/example_daskml/); | ||
|
||
3. Training tasks on Deep Neural Networks (DNN) using TensorFlow: [Example Notebook](https://vegewaterdynamics.github.io/motrainer/notebooks/example_dnn/). | ||
|
||
## Acknowledgements | ||
|
||
The authors express sincere gratitude to the Dutch Research Council (Nederlandse Organisatie voor Wetenschappelijk Onderzoek, NWO) and the Netherlands Space Office for their generous funding of the MOTrainer development through the User Support Programme Space Research (GO) call, grant ALWGO.2018.036. Special thanks to SURF for providing valuable computational resources for MOTrainer testing via the grant EINF-339. | ||
|
||
We would also like to thanks Dr. Francesco Nattino, Dr. Yifat Dzigan, Dr. Paco López-Dekker, and Tina Nikaein for the insightful discussions, which are important contributions to this work. | ||
|
||
## References |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What process are you referring to in the first part of this sentence
In the DA process,...
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make a start to talk about Measurement Operators. So I stared with the function of MO in a DA process. Maybe I should say
In a DA process, ....
?