Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More informative metadata when loading earth_relief grids #494

Closed
wants to merge 3 commits into from

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Jun 26, 2020

Description of proposed changes

Return a richer set of attributes when running pygmt.datasets.load_earth_relief(), to include all the NetCDF header info metadata such as a proper title, description (including the DOI) and GMT command history used to produce the grid. In particular, the node_offset attribute will be useful for #476. This PR will also be useful for #489.

Before (using xr.open_dataarray(fname)):

<xarray.DataArray 'elevation' (lat: 180, lon: 360)>
array([[ 2999. ,  2997.5,  2995.5, ...,  3003.5,  3002.5,  3000.5],
       [ 3126.5,  3127. ,  3127. , ...,  3125. ,  3125.5,  3126. ],
       [ 3051. ,  3050.5,  3051.5, ...,  3058. ,  3056. ,  3053.5],
       ...,
       [-3888.5, -3884.5, -3885.5, ..., -3897.5, -3894. , -3890.5],
       [-3494.5, -3566. , -3612.5, ..., -3213.5, -3294. , -3402.5],
       [-3898.5, -3859. , -3812. , ..., -3982.5, -3966. , -3939.5]],
      dtype=float32)
Coordinates:
  * lon      (lon) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
Attributes:
    long_name:         elevation relative to the geoid
    units:             meters
    vertical_datum:    EMG96
    horizontal_datum:  WGS84

After (using xr.open_dataset(fname).to_array(name="elevation").squeeze(drop=True)):

<xarray.DataArray 'elevation' (lat: 180, lon: 360)>
array([[ 2999. ,  2997.5,  2995.5, ...,  3003.5,  3002.5,  3000.5],
       [ 3126.5,  3127. ,  3127. , ...,  3125. ,  3125.5,  3126. ],
       [ 3051. ,  3050.5,  3051.5, ...,  3058. ,  3056. ,  3053.5],
       ...,
       [-3888.5, -3884.5, -3885.5, ..., -3897.5, -3894. , -3890.5],
       [-3494.5, -3566. , -3612.5, ..., -3213.5, -3294. , -3402.5],
       [-3898.5, -3859. , -3812. , ..., -3982.5, -3966. , -3939.5]],
      dtype=float32)
Coordinates:
  * lon      (lon) float64 -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
Attributes:
    Conventions:       CF-1.7
    history:           grdfilter SRTM15+V2.1.nc -Fg111.2 -D1 -I01d -rp -Gearth/earth_relief/earth_relief_01d_p.grd=ns+s0.5+o0 --IO_NC4_DEFLATION_LEVEL=9 --IO_NC4_CHUNK_SIZE=4096 --PROJ_ELLIPSOID=Sphere
    GMT_version:       6.1.0_e80cfbd_2020.05.28 [64-bit]
    node_offset:       1
    title:             Earth Relief at 01 arc degree
    description:       Obtained by Gaussian Cartesian filtering (111.2 km fullwidth) from SRTM15+V2.1.nc [Tozer et al., 2019; http://dx.doi.org/10.1029/2019EA000658]
    long_name:         elevation relative to the geoid
    units:             meters
    vertical_datum:    EMG96
    horizontal_datum:  WGS84

Fixes #

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

@weiji14 weiji14 added the enhancement Improving an existing feature label Jun 26, 2020
@weiji14 weiji14 mentioned this pull request Jun 26, 2020
9 tasks
@weiji14 weiji14 marked this pull request as draft June 28, 2020 03:16
@weiji14
Copy link
Member Author

weiji14 commented Jun 28, 2020

Having second thoughts about this PR, data provenance is a pain...

  1. If we use xr.open_dataarray() as usual, the attributes (including node_offset) are not read, but we keep the encoding["source"] information.

  2. If we use xr.open_dataset().to_array as in 5e0644e, the attributes are kept, but the encoding["source"] information is lost.

  3. If we use xr.open_dataset().z, the attributes are lost, but the encoding information is kept. What we could do though, is to copy the dataset attributes to the dataarray.

See also pydata/xarray#1614.

@seisman
Copy link
Member

seisman commented Jun 28, 2020

Do users care about the encode information?

@weiji14
Copy link
Member Author

weiji14 commented Jun 28, 2020

Do users care about the encode information?

It's used at https://github.com/GenericMappingTools/pygmt/pull/476/files#diff-18246882665270482cba82d6b754c43cR1280 for the auto-detect cartesian/geographic type functionality (though we'll probably drop that). I've got the code to work on option 3 already, just need to write some tests first.

@seisman
Copy link
Member

seisman commented Jun 28, 2020

for the auto-detect cartesian/geographic type functionality (though we'll probably drop that)

I believe we'll drop it anyway. For any netCDF grids produced by GMT, node_offset is enough. For other grids, GMT has its own complicated way to detect/guess grid registration. We probably can ask GMT to provide it as an API function.

We still need encode["source"] for non-GMT-produced grids, because we rely on GMT/grdinfo to guess its registration.

@seisman
Copy link
Member

seisman commented Jun 28, 2020

As I mentioned before, in GMT 6.1.0 gmt grdinfo -C outputs the grid registration in the last column, which could simplify the Python code. It seems useful if it can also output the grid coordinate system (i.e., GMT_GRID_IS_CARTESIAN or GMT_GRID_IS_GEO). I believe GMT also has its own complicated way to detect it.

The to_array method doesn't keep the encoding dictionary, so we're refactoring the code. Updated test to check that these metadata are indeed kept. Also partially revert some of 5e0644e.
@weiji14
Copy link
Member Author

weiji14 commented Jun 28, 2020

It seems useful if it can also output the grid coordinate system (i.e., GMT_GRID_IS_CARTESIAN or GMT_GRID_IS_GEO). I believe GMT also has its own complicated way to detect it.

Yes, it would be great if xarray can read the Geographic/Cartesian type directly from the NetCDF header. Too bad it's not in the CF Convention.

@seisman
Copy link
Member

seisman commented Jun 28, 2020

It seems useful if it can also output the grid coordinate system (i.e., GMT_GRID_IS_CARTESIAN or GMT_GRID_IS_GEO). I believe GMT also has its own complicated way to detect it.

Yes, it would be great if xarray can read the Geographic/Cartesian type directly from the NetCDF header. Too bad it's not in the CF Convention.

FYI, the feature is implemented in GenericMappingTools/gmt#3551 and will be available in GMT 6.1.0

@weiji14
Copy link
Member Author

weiji14 commented Jul 11, 2020

Superseded by #500 which is a nicer way to get get grid registration and coordinate information. We're still missing lots of metadata, but it might be possible in the future to add the metadata into the GMT xarray accessor (e.g. have grid.gmt.history, grid.gmt.description).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants