You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Temperature and salinity are two fundamental variables in the ocean for many applications. While Sea Surface Temperatures (SST) has been observed by satellites for many decades (and is hopefully added to the cloud soon in #20 ), Sea Surface Salinity (SSS) has only 'recently' been added to the remote observations (starting with the European SMOS mission in 2009). Here I propose to create a pipeline for the currently active NASA-SMAP platform into ARCO storage.
Link to the website / online documentation for the data: SMAP data is processed in two time frequencies (8-day running mean, monthly mean) by two different data centers with different algorithms (JPL and RSS). All datasets are available via POODAC.
The file format (e.g. netCDF, csv): netcdf
How are the source files organized? (e.g. one file per day): The files are organized as one file per day (8-day running mean) and one file per month (monthly mean).
Any special steps required to access the data (e.g. password required): nope
Transformation / Alignment / Merging
Single time steps for the JPL product are about 30MB, so chunking in time (maybe 3 time chunks) might make sense, but should not be strictly necessary.
Output Dataset
I think one zarr store per time frequency and alorithm would be the ideal way to access this data.
Source Dataset
Temperature and salinity are two fundamental variables in the ocean for many applications. While Sea Surface Temperatures (SST) has been observed by satellites for many decades (and is hopefully added to the cloud soon in #20 ), Sea Surface Salinity (SSS) has only 'recently' been added to the remote observations (starting with the European SMOS mission in 2009). Here I propose to create a pipeline for the currently active NASA-SMAP platform into ARCO storage.
Link to the website / online documentation for the data: SMAP data is processed in two time frequencies (8-day running mean, monthly mean) by two different data centers with different algorithms (JPL and RSS). All datasets are available via POODAC.
The file format (e.g. netCDF, csv): netcdf
How are the source files organized? (e.g. one file per day): The files are organized as one file per day (8-day running mean) and one file per month (monthly mean).
How are the source files accessed (e.g. FTP): The data is available via Opendap/Thredds (example for JPL 8-day running mean)
Any special steps required to access the data (e.g. password required): nope
Transformation / Alignment / Merging
Single time steps for the JPL product are about 30MB, so chunking in time (maybe 3 time chunks) might make sense, but should not be strictly necessary.
Output Dataset
I think one zarr store per time frequency and alorithm would be the ideal way to access this data.
cc @cisaacstern @hscannell
The text was updated successfully, but these errors were encountered: