-
Notifications
You must be signed in to change notification settings - Fork 6
[WIP] Pipeline for SMAP soil moisture ingest #117
Conversation
…terraform-and-script AWS Batch terraform and run job script
Convert omno2d.003
…omno2-on-batch OMNO2 on batch
…extension Convert MCD19A2.006 HDF4 files to COG
* Update README.md * Some docs for having different workflows for different datasets * Update README.md * Update README.md Co-authored-by: Jeevan Farias <jeevanfarias@gmail.com> Co-authored-by: Jeevan Farias <jeevanfarias@gmail.com>
* Add sql and instructions for OMSO2PCA collection and update item STAC metadata * Rename some things * Update assets in stac-gen for non-cmr data * Remove 'name' key as invalid * Add example file for OMSO2PCA-COG * Update stack for OMSO2PCA metadata * Gitignore cdk context * add cdk deploy step * Regex should permit characters after the year * add OMI NO2 sql * Working on metadata fixes * Updated collection SQL for OMI_trno2-COG.sql
…lines Add/separated pipelines
…_assets Allow for multi assets.
…ns-metadata Add social vulnerability index collections metadata sql
This doesn't work (on DISCOVER) because of weird HDF5 reading errors.
This is a great start to having a Zarr workflow, thanks @ashiklom AWS CDK is cloud development kit and we are using it to have AWS cloud-hosted deployments for our workflow pipelines. As discussed in #113 we are working on consolidating some of the existing dataset-workflows to be generalized to a single but configurable pipeline which discovers, processes and publishes data. The current use cases we have should make it reasonable to configure discovery from S3 directory or CMR and then configurably process the inputs to COGs (custom functions) and then generate STAC metadata to publish for those COGs. Right now, this workflow falls into the "manual" or "local" category of a dataset workflow. Regardless of whether we get its functions into cloud-hosted services, we still want to record the code used to generate and publish the cloud-optimized format. So the next steps (not necessarily in this order) for completing the "local" version of this workflow would be:
For creating a cloud deployment, I think the workflow could be broken up into There are a couple options for moving forward with a cloud deployment of this workflow:
|
@abarciauskas-bgse Thanks for the detailed response! I'm not sure I'll be of much use on most of the points above, so here's another idea: How about we create a separate public repository for manual workflows like this that we will be developing for EIS and related work? Then, you all can use that repository as a collection of use cases for creating automated ingest workflows, Pangeo Forge recipes, etc. I think an approach like that may be more efficient than me trying to learn all of these procedures. Thoughts? |
I like that idea. We might want to iterate on it as we develop a longer term process for taking science to cloud workflows, but this sounds good for now. How about a repo "veda-data-scripts" or "veda-data-processing" ? I'm wondering if we need anything that is specific to each EIS application, but I believe many of these data sets will be shared so probably makes more sense to have 1 shared repository for all EIS and VEDA science applications. |
Following up on this. I just created https://github.com/ashiklom/veda-data-processing, which includes the SMAP workflow from this PR. All the code in there is under MIT License — you can try to do some fancy |
Awesome thanks @ashiklom should we close this PR then? |
This is a minimal, very much work-in-progress workflow for converting SMAP L3 soil moisture data into Zarr. This is nowhere close to being merged, and I'm totally fine with this being replaced with someone else's implementation of a "real" workflow for this same dataset.
That said, the Python scripts here do work. My question is: What to do next (in the specific context of EIS)? Do I need to set up a CDK (also, what is a CDK?)? Where do I put the resulting Zarr?
We will be repeating very similar workflows for a lot more datasets as EIS moves forward, so I wanted to use this as a pathfinder to understand how this process works.