Skip to content

Commit

Permalink
Merge pull request #101 from nsidc/issue-44
Browse files Browse the repository at this point in the history
Issue 44
  • Loading branch information
juliacollins authored Dec 12, 2024
2 parents f02182f + a546c01 commit d52906e
Show file tree
Hide file tree
Showing 8 changed files with 1,460 additions and 30 deletions.
91 changes: 79 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,18 +42,85 @@ this can be accomplished are detailed in the **AWS Credentials** section below.

## Assumptions

- Checksums are all SHA256
- In the data files to be ingested:
- The global attribute "date_modified" exists and will be used to represent
the production date and time.
- Global attributes "time_coverage_start" and "time_coverage_end" exist and
will be used for the time range metadata values.
- Only one coordinate system is used by all variables (i.e. only one grid mapping variable is present in a file)
- (x[0],y[0]) represents the upper left corner of the spatial coverage.
- x,y coordinates represent the center of the pixel
- The grid mapping variable contains a GeoTransform attribute (which defines the pixel size ), and
can be used to determine the padding added to x and y values.
- Date/time strings can be parsed using `datetime.fromisoformat`
* Checksums are all SHA256
* NetCDF files have an extension of `.nc` (required by CF conventions)
* (x[0],y[0]) represents the upper left corner of the spatial coverage.
* x and y coordinate values represent the center of the pixel
* Date/time strings can be parsed using `datetime.fromisoformat`
* Only one coordinate system is used by all data variables (i.e. only one grid
mapping variable is present in a file)

### Reference links

* https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3
* https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html

### NetCDF Attributes Used to Populate UMM-G

- **Required** required
- **RequiredC** conditionally required
- **R+** highly or strongly recommended
- **R** recommended
- **S** suggested

| Attribute in use (location) | ACDD | CF Conventions | NSIDC Guidelines | Note |
| ----------------------------- | ---- | -------------- | ---------------- | ------- |
| date_modified (global) | S | | R | 1 |
| time_coverage_start (global) | R | | R | 2 |
| time_coverage_end (global) | R | | R | 2 |
| crs_wkt (`crs` variable) | | | R | 3 |
| GeoTransform (`crs` variable) | | | R | 4 |
| data (`x` variable) | | | R | 5 |
| data (`y` variable) | | | R | 6 |


| Attributes not currently used | ACDD | CF Conventions | NSIDC Guidelines | Comments |
| ----------------------------- | ---- | -------------- | ---------------- | -------- |
| Conventions (global) | R+ | Required | R | |
| standard_name (variable) | R+ | R+ | | |
| grid_mapping (data variable) | | RequiredC | R+ | 7 |
| grid_mapping_name (variable) | | RequiredC | R+ | 7 |
| `projection_x_coordinate` standard name (variable) | | RequiredC | | 8 |
| `projection_y_coordinate` standard name (variable) | | RequiredC | | 9 |
| axis (variable) | | R | | 8, 9 |
| geospatial_bounds (global) | R | | R | |
| geospatial_bounds_crs (global)| R | | R | |
| geospatial_lat_min (global) | R | | R | |
| geospatial_lat_max (global) | R | | R | |
| geospatial_lat_units (global) | R | | R | |
| geospatial_lon_min (global) | R | | R | |
| geospatial_lon_max (global) | R | | R | |
| geospatial_lon_units (global) | R | | R | |

Notes:
1. Used to populate the production date and time values in UMM-G output.
2. Used to populate the time begin and end UMM-G values.
3. The `crs_wkt` ("well known text") value is handed to the
`CRS` and `Transformer` modules in `pyproj` to conveniently deal
with the reprojection of (y,x) values to EPSG 4326 (lon, lat) values.
4. The `GeoTransform` value provides the pixel size per data value, which is then used
to calculate the padding added to x and y values to create a GPolygon enclosing all
of the data.
5. The `x` coordinate variable values are reprojected and thinned to create a GPolygon.
6. The `y` coordinate variable values are reprojected and thinned to create a GPolygon.
7. A grid mapping variable is required if the horizontal spatial coordinates are not
longitude and latitude and the intent of the data provider is to geolocate
the data. `grid_mapping` and `grid_mapping_name` allow programmatic identification of
the variable holding information about the horizontal coordinate reference system.
`metgenc` code currently assumes a variable named `crs` exists with grid
information. **TODO:** Identify the coordinate reference system variable by
looking for the `grid_mapping_name` or `grid_mapping` attribute.
8. `metgenc` code currently assumes a coordinate variable `x` exists whose
data values represent spatial information in meters.
**TODO:** Identify the x-axis coordinate variable by looking for the `standard_name`
attribute with a value of `projection_x_coordinate`, or an `axis` attribute with
the value `X`, rather than assuming the variable is named `x`.
9. `metgenc` code currently assumes a coordinate variable `y` exists whose
data values represent spatial information in meters.
**TODO:** Identify the y-axis coordinate variable by looking for the `standard_name`
attribute with a value of `projection_y_coordinate`, or an `axis` attribute with
the value `Y`, rather than assuming the variable is named `x`.


## Installing MetGenC

Expand Down
6 changes: 3 additions & 3 deletions src/nsidc/metgen/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

from nsidc.metgen import config, constants, metgen

LOGGER = logging.getLogger("metgenc")
LOGGER = logging.getLogger(constants.ROOT_LOGGER)


@click.group(epilog="For detailed help on each command, run: metgenc COMMAND --help")
Expand Down Expand Up @@ -133,13 +133,13 @@ def process(config_filename, dry_run, env, number, write_cnm, overwrite):
config.validate(configuration)
metgen.process(configuration)
except config.ValidationError as e:
logger = logging.getLogger("metgenc")
logger = logging.getLogger(constants.ROOT_LOGGER)
logger.error("\nThe configuration is invalid:")
for error in e.errors:
logger.error(f" * {error}")
exit(1)
except Exception as e:
logger = logging.getLogger("metgenc")
logger = logging.getLogger(constants.ROOT_LOGGER)
logger.error("\nUnable to process data: " + str(e))
exit(1)
click.echo("Processing complete")
Expand Down
2 changes: 1 addition & 1 deletion src/nsidc/metgen/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class Config:
def show(self):
# TODO: add section headings in the right spot
# (if we think we need them in the output)
LOGGER = logging.getLogger("metgenc")
LOGGER = logging.getLogger(constants.ROOT_LOGGER)
LOGGER.info("")
LOGGER.info("Using configuration:")
for k, v in self.__dict__.items():
Expand Down
5 changes: 5 additions & 0 deletions src/nsidc/metgen/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,14 @@
DEFAULT_NUMBER = 1000000
DEFAULT_DRY_RUN = False

# Logging
ROOT_LOGGER = "metgenc"

# JSON schema locations and versions
CNM_JSON_SCHEMA = ("nsidc.metgen.json-schema", "cumulus_sns_schema.json")
CNM_JSON_SCHEMA_VERSION = "1.6.1"
UMMG_JSON_SCHEMA = ("nsidc.metgen.json-schema", "umm-g-json-schema.json")
UMMG_JSON_SCHEMA_VERSION = "1.6.6"

# Configuration sections
SOURCE_SECTION_NAME = "Source"
Expand Down
Loading

0 comments on commit d52906e

Please sign in to comment.