Merge pull request #185 from dcs4cop/forman-181-no_metadata_written

Forman 181 no metadata written
xcube-dev · Sep 25, 2019 · 4ab769b · 4ab769b
2 parents 80ecb8c + 8387e82
commit 4ab769b
Show file tree

Hide file tree

Showing 13 changed files with 430 additions and 270 deletions.
diff --git a/CHANGES.md b/CHANGES.md
@@ -1,14 +1,13 @@
 ## Changes in 0.2.0.dev2 (in dev)
 
-* Reorganisation of the Documentation and Examples Section (partly addressing #106)
-* Loosened python conda environment to satisfy conda-forge requirements
-
 ### New
 
 * Added first version of the [xcube documentation](https://xcube.readthedocs.io/) generated from `./docs` folder.
 
 ### Enhancements
 
+* Reorganisation of the Documentation and Examples Section (partly addressing #106)
+* Loosened python conda environment to satisfy conda-forge requirements
 * Making CLI parameters consistent and removing or changing parameter abbreviations in case they were used twice for different params. (partly addressing #91)
   For every CLI command which is generating an output a path must be provided by the option `-o`, `--output`. If not provided by the user, a default output_path is generated.
   The following CLI parameter have changed and their abbreviation is not enabled anymore : 
@@ -39,6 +38,7 @@
 
 ### Fixes
 
+* `xcube gen` CLI now updates metadata correctly. (#181)
 * It was no longer possible to use the `xcube gen` CLI with `--proc` option. (#120)
 * `totalCount` attribute of time series returned by Web API `ts/{dataset}/{variable}/{geom-type}` now
    contains the correct number of possible observations. Was always `1` before.

diff --git a/docs/source/cli/xcube_gen.rst b/docs/source/cli/xcube_gen.rst
@@ -19,14 +19,16 @@ Generate xcube dataset.
 ::
 
     Usage: xcube gen [OPTIONS] [INPUT]...
-    
-      Generate xcube dataset. Data cubes may be created in one go or successively in
-      append mode, input by input. The input paths may be one or more input
-      files or a pattern that may contain wildcards '?', '*', and '**'. The
-      input paths can also be passed as lines of a text file. To do so, provide
-      exactly one input file with ".txt" extension which contains the actual
-      input paths to be used.
-    
+
+      Generate xcube dataset. Data cubes may be created in one go or
+      successively for all given inputs. Each input is expected to provide a
+      single time slice which may be appended, inserted or which may replace an
+      existing time slice in the output dataset. The input paths may be one or
+      more input files or a pattern that may contain wildcards '?', '*', and
+      '**'. The input paths can also be passed as lines of a text file. To do
+      so, provide exactly one input file with ".txt" extension which contains
+      the actual input paths to be used.
+
     Options:
       -P, --proc INPUT-PROCESSOR      Input processor name. The available input
                                       processor names and additional information
@@ -36,8 +38,8 @@ Generate xcube dataset.
                                       with simple datasets whose variables have
                                       dimensions ("lat", "lon") and conform with
                                       the CF conventions.
-      -c, --config CONFIG             xcube dataset configuration file in YAML format.
-                                      More than one config input file is
+      -c, --config CONFIG             xcube dataset configuration file in YAML
+                                      format. More than one config input file is
                                       allowed.When passing several config files,
                                       they are merged considering the order passed
                                       via command line.
@@ -50,8 +52,7 @@ Generate xcube dataset.
                                       "<width>,<height>".
       -R, --region REGION             Output region using format "<lon-min>,<lat-
                                       min>,<lon-max>,<lat-max>"
-      --variables, --vars VARIABLES
-                                      Variables to be included in output. Comma-
+      --variables, --vars VARIABLES   Variables to be included in output. Comma-
                                       separated list of names which may contain
                                       wildcard characters "*" and "?".
       --resampling [Average|Bilinear|Cubic|CubicSpline|Lanczos|Max|Median|Min|Mode|Nearest|Q1|Q3]
@@ -68,16 +69,17 @@ Generate xcube dataset.
       --prof                          Collect profiling information and dump
                                       results after processing.
       --sort                          The input file list will be sorted before
-                                      creating the xcube dataset. If --sort parameter
-                                      is not passed, order of input list will be
-                                      kept.
+                                      creating the xcube dataset. If --sort
+                                      parameter is not passed, order of input list
+                                      will be kept.
       -I, --info                      Displays additional information about format
                                       options or about input processors.
       --dry_run                       Just read and process inputs, but don't
                                       produce any outputs.
       --help                          Show this message and exit.
 
 
+
 Below is the ouput of a ``xcube gen --info`` call showing five input processors installed via plugins.
 
 ::
@@ -108,7 +110,7 @@ Configuration File
 ==================
 
 Configuration files passed to ``xcube gen`` via the ``-c, --config`` option use `YAML format`_.
-Multiple configuration files may be given. In this case all configuration are merged into a single one.
+Multiple configuration files may be given. In this case all configurations are merged into a single one.
 Parameter values will be overwritten by subsequent configurations if they are scalars. If
 they are objects / mappings, their values will be deeply merged.
 

diff --git a/docs/source/examples/xcube_gen.rst b/docs/source/examples/xcube_gen.rst
@@ -195,6 +195,41 @@ The metadata of the xcube dataset can be viewed with :doc:`cli/xcube dump` as we
     Dimensions without coordinates: bnds
     Data variables:
         analysed_sst  (time, lat, lon) float64 dask.array<shape=(3, 5632, 10240), chunksize=(1, 704, 640)>
+    Attributes:
+        acknowledgment:             Data Cube produced based on data provided by ...
+        comment:
+        contributor_name:
+        contributor_role:
+        creator_email:              info@brockmann-consult.de
+        creator_name:               Brockmann Consult GmbH
+        creator_url:                https://www.brockmann-consult.de
+        date_modified:              2019-09-25T08:50:32.169031
+        geospatial_lat_max:         62.666666666666664
+        geospatial_lat_min:         48.0
+        geospatial_lat_resolution:  0.002604166666666666
+        geospatial_lat_units:       degrees_north
+        geospatial_lon_max:         10.666666666666664
+        geospatial_lon_min:         -16.0
+        geospatial_lon_resolution:  0.0026041666666666665
+        geospatial_lon_units:       degrees_east
+        history:                    xcube/reproj-snap-nc
+        id:                         demo-bc-sst-sns-l2c-v1
+        institution:                Brockmann Consult GmbH
+        keywords:
+        license:                    terms and conditions of the DCS4COP data dist...
+        naming_authority:           bc
+        processing_level:           L2C
+        project:                    xcube
+        publisher_email:            info@brockmann-consult.de
+        publisher_name:             Brockmann Consult GmbH
+        publisher_url:              https://www.brockmann-consult.de
+        references:                 https://dcs4cop.eu/
+        source:                     CMEMS Global SST & Sea Ice Anomaly Data Cube
+        standard_name_vocabulary:
+        summary:
+        time_coverage_end:          2017-06-08T00:00:00.000000000
+        time_coverage_start:        2017-06-05T00:00:00.000000000
+        title:                      CMEMS Global SST Anomaly Data Cube
 
 The metadata for the variable analysed_sst can be viewed:
 

diff --git a/test/api/gen/default/test_gen.py b/test/api/gen/default/test_gen.py
@@ -1,6 +1,6 @@
 import os
 import unittest
-from typing import Tuple, Optional
+from typing import Tuple, Optional, Dict, Any
 
 import numpy as np
 import xarray as xr
@@ -12,7 +12,7 @@
 
 
 def clean_up():
-    files = ['l2c-single.nc', 'l2c.nc', 'l2c.zarr', 'l2c-single.zarr']
+    files = ['l2c-single.nc', 'l2c-single.zarr', 'l2c.nc', 'l2c.zarr']
     for file in files:
         rimraf(file)
         rimraf(file + '.temp.nc')  # May remain from Netcdf4DatasetIO.append()
@@ -27,11 +27,15 @@ def setUp(self):
     def tearDown(self):
         clean_up()
 
-    def test_process_inputs_single(self):
+    def test_process_inputs_single_nc(self):
         status, output = gen_cube_wrapper(
             [get_inputdata_path('20170101-IFR-L4_GHRSST-SSTfnd-ODYSSEA-NWE_002-v2.0-fv1.0.nc')], 'l2c-single.nc')
         self.assertEqual(True, status)
         self.assertTrue('\nstep 8 of 8: creating input slice in l2c-single.nc...\n' in output)
+        self.assert_cube_ok(xr.open_dataset('l2c-single.nc', autoclose=True), 1,
+                            dict(date_modified=None,
+                                 time_coverage_start='2016-12-31T12:00:00.000000000',
+                                 time_coverage_end='2017-01-01T12:00:00.000000000'))
 
     def test_process_inputs_append_multiple_nc(self):
         status, output = gen_cube_wrapper(
@@ -40,6 +44,20 @@ def test_process_inputs_append_multiple_nc(self):
         self.assertEqual(True, status)
         self.assertTrue('\nstep 8 of 8: creating input slice in l2c.nc...\n' in output)
         self.assertTrue('\nstep 8 of 8: appending input slice to l2c.nc...\n' in output)
+        self.assert_cube_ok(xr.open_dataset('l2c.nc', autoclose=True), 3,
+                            dict(date_modified=None,
+                                 time_coverage_start='2016-12-31T12:00:00.000000000',
+                                 time_coverage_end='2017-01-03T12:00:00.000000000'))
+
+    def test_process_inputs_single_zarr(self):
+        status, output = gen_cube_wrapper(
+            [get_inputdata_path('20170101-IFR-L4_GHRSST-SSTfnd-ODYSSEA-NWE_002-v2.0-fv1.0.nc')], 'l2c-single.zarr')
+        self.assertEqual(True, status)
+        self.assertTrue('\nstep 8 of 8: creating input slice in l2c-single.zarr...\n' in output)
+        self.assert_cube_ok(xr.open_zarr('l2c-single.zarr'), 1,
+                            dict(date_modified=None,
+                                 time_coverage_start='2016-12-31T12:00:00.000000000',
+                                 time_coverage_end='2017-01-01T12:00:00.000000000'))
 
     def test_process_inputs_append_multiple_zarr(self):
         status, output = gen_cube_wrapper(
@@ -48,6 +66,10 @@ def test_process_inputs_append_multiple_zarr(self):
         self.assertEqual(True, status)
         self.assertTrue('\nstep 8 of 8: creating input slice in l2c.zarr...\n' in output)
         self.assertTrue('\nstep 8 of 8: appending input slice to l2c.zarr...\n' in output)
+        self.assert_cube_ok(xr.open_zarr('l2c.zarr'), 3,
+                            dict(date_modified=None,
+                                 time_coverage_start='2016-12-31T12:00:00.000000000',
+                                 time_coverage_end='2017-01-03T12:00:00.000000000'))
 
     def test_process_inputs_insert_multiple_zarr(self):
         status, output = gen_cube_wrapper(
@@ -59,6 +81,10 @@ def test_process_inputs_insert_multiple_zarr(self):
         self.assertTrue('\nstep 8 of 8: creating input slice in l2c.zarr...\n' in output)
         self.assertTrue('\nstep 8 of 8: appending input slice to l2c.zarr...\n' in output)
         self.assertTrue('\nstep 8 of 8: inserting input slice before index 0 in l2c.zarr...\n' in output)
+        self.assert_cube_ok(xr.open_zarr('l2c.zarr'), 3,
+                            dict(date_modified=None,
+                                 time_coverage_start='2016-12-31T12:00:00.000000000',
+                                 time_coverage_end='2017-01-03T12:00:00.000000000'))
 
     def test_process_inputs_replace_multiple_zarr(self):
         status, output = gen_cube_wrapper(
@@ -71,6 +97,10 @@ def test_process_inputs_replace_multiple_zarr(self):
         self.assertTrue('\nstep 8 of 8: creating input slice in l2c.zarr...\n' in output)
         self.assertTrue('\nstep 8 of 8: appending input slice to l2c.zarr...\n' in output)
         self.assertTrue('\nstep 8 of 8: replacing input slice at index 1 in l2c.zarr...\n' in output)
+        self.assert_cube_ok(xr.open_zarr('l2c.zarr'), 3,
+                            dict(date_modified=None,
+                                 time_coverage_start='2016-12-31T12:00:00.000000000',
+                                 time_coverage_end='2017-01-03T12:00:00.000000000'))
 
     def test_input_txt(self):
         f = open((os.path.join(os.path.dirname(__file__), 'inputdata', "input.txt")), "w+")
@@ -81,6 +111,30 @@ def test_input_txt(self):
         f.close()
         status, output = gen_cube_wrapper([get_inputdata_path('input.txt')], 'l2c.zarr', sort_mode=True)
         self.assertEqual(True, status)
+        self.assert_cube_ok(xr.open_zarr('l2c.zarr'), 3,
+                            dict(time_coverage_start='2016-12-31T12:00:00.000000000',
+                                 time_coverage_end='2017-01-03T12:00:00.000000000'))
+
+    def assert_cube_ok(self, cube: xr.Dataset, expected_time_dim: int, expected_extra_attrs: Dict[str, Any]):
+        self.assertEqual({'lat': 180, 'lon': 320, 'bnds': 2, 'time': expected_time_dim}, cube.dims)
+        self.assertEqual({'lon', 'lat', 'time', 'lon_bnds', 'lat_bnds', 'time_bnds'}, set(cube.coords))
+        self.assertEqual({'analysed_sst'}, set(cube.data_vars))
+        expected_attrs = dict(title='Test Cube',
+                              project='xcube',
+                              date_modified=None,
+                              geospatial_lon_min=-4.0,
+                              geospatial_lon_max=12.0,
+                              geospatial_lon_resolution=0.05,
+                              geospatial_lon_units='degrees_east',
+                              geospatial_lat_min=47.0,
+                              geospatial_lat_max=56.0,
+                              geospatial_lat_resolution=0.05,
+                              geospatial_lat_units='degrees_north')
+        expected_attrs.update(expected_extra_attrs)
+        for k, v in expected_attrs.items():
+            self.assertIn(k, cube.attrs)
+            if v is not None:
+                self.assertEqual(v, cube.attrs[k], msg=f'key {k!r}')
 
     def test_handle_360_lon(self):
         status, output = gen_cube_wrapper(
@@ -96,7 +150,7 @@ def test_illegal_proc(self):
             gen_cube_wrapper(
                 [get_inputdata_path('20170101120000-UKMO-L4_GHRSST-SSTfnd-OSTIAanom-GLOB-v02.0-fv02.0.nc')],
                 'l2c-single.zarr', sort_mode=True, input_processor_name="")
-        self.assertEqual('Missing input_processor_name', f'{e.exception}')
+        self.assertEqual('input_processor_name must not be empty', f'{e.exception}')
 
         with self.assertRaises(ValueError) as e:
             gen_cube_wrapper(
@@ -106,7 +160,7 @@ def test_illegal_proc(self):
 
 
 # noinspection PyShadowingBuiltins
-def gen_cube_wrapper(input_paths, output_path, sort_mode=False, input_processor_name='default') \
+def gen_cube_wrapper(input_paths, output_path, sort_mode=False, input_processor_name=None) \
         -> Tuple[bool, Optional[str]]:
     output = None
 
@@ -117,13 +171,20 @@ def output_monitor(msg):
         else:
             output += msg + '\n'
 
-    config = get_config_dict(dict(input_paths=input_paths, output_path=output_path))
-    return gen_cube(input_processor_name=input_processor_name,
-                    output_size=(320, 180),
-                    output_region=(-4., 47., 12., 56.),
-                    output_resampling='Nearest',
-                    output_variables=[('analysed_sst', dict(name='SST'))],
-                    sort_mode=sort_mode,
-                    dry_run=False,
-                    monitor=output_monitor,
-                    **config), output
+    config = get_config_dict(
+        input_paths=input_paths,
+        input_processor_name=input_processor_name,
+        output_path=output_path,
+        output_size='320,180',
+        output_region='-4,47,12,56',
+        output_resampling='Nearest',
+        output_variables='analysed_sst',
+        sort_mode=sort_mode,
+    )
+
+    output_metadata = dict(
+        title='Test Cube',
+        project='xcube',
+    )
+
+    return gen_cube(dry_run=False, monitor=output_monitor, output_metadata=output_metadata, **config), output