Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable names inferred from grib2 files through Nio.open_file - or xr.open_dataset with engine pynio #19

Open
chiaral opened this issue May 25, 2018 · 5 comments

Comments

@chiaral
Copy link

chiaral commented May 25, 2018

I posted this on SO, but I think it belongs here as well.

Below an edited version:

I am using the data found here (Note, these are rotating files of forecast data, so the actual date will change as time goes by, you might need to update the date in my example in a few days)

!wget ftp://ftpprd.ncep.noaa.gov/pub/data/nccf/com/cfs/prod/cfs/cfs.20180524/00/6hrly_grib_01/pgbf2018052400.01.2018052400.grb2


tmp2 = xr.open_dataset('pgbf2018052400.01.2018052400.grb2', 
                   engine='pynio')

which is read as a dataset with some coordinates and variables

<xarray.Dataset>
Dimensions:                   (lat_0: 181, lon_0: 360, lv_AMSL1: 4, lv_HTGL6: 2, lv_ISBL0: 37, lv_ISBL4: 32, lv_ISBL5: 2, lv_PVL2: 2, lv_SIGL3: 4)
Coordinates:
  * lon_0                     (lon_0) float32 0.0 1.0 2.0 3.0 4.0 5.0 6.0 ...
  * lv_ISBL0                  (lv_ISBL0) float32 100.0 200.0 300.0 500.0 ...
  * lv_ISBL5                  (lv_ISBL5) float32 50000.0 100000.0
  * lv_ISBL4                  (lv_ISBL4) float32 1000.0 2000.0 3000.0 5000.0 ...
  * lat_0                     (lat_0) float32 90.0 89.0 88.0 87.0 86.0 85.0 ...
  * lv_PVL2                   (lv_PVL2) float32 -2e-06 2e-06
  * lv_AMSL1                  (lv_AMSL1) float32 1829.0 2743.0 3658.0 4572.0
Dimensions without coordinates: lv_HTGL6, lv_SIGL3
Data variables:
    TOZNE_P0_L200_GLL0        (lat_0, lon_0) float32 ...
    HGT_P0_L100_GLL0          (lv_ISBL0, lat_0, lon_0) float32 ...
    VVEL_P0_L100_GLL0         (lv_ISBL0, lat_0, lon_0) float32 ...
    UGRD_P0_L6_GLL0           (lat_0, lon_0) float32 ...
    HGT_P0_L7_GLL0            (lat_0, lon_0) float32 ...
    CIN_P0_2L108_GLL0         (lat_0, lon_0) float32 ...
    PRES_P0_L109_GLL0         (lv_PVL2, lat_0, lon_0) float32 ...
    UGRD_P0_L7_GLL0           (lat_0, lon_0) float32 ...
    UGRD_P0_L109_GLL0         (lv_PVL2, lat_0, lon_0) float32 ...
    PRES_P0_L7_GLL0           (lat_0, lon_0) float32 ...
    PLI_P0_2L108_GLL0         (lat_0, lon_0) float32 ...

and many other variables... i am not copying here...

my struggle right now is to understand how these variable names (i.e. PLI_P0_2L108_GLL0) are generated.

If I run directly Nio:

f = Nio.open_file('pgbf2018052400.01.2018052400.grb2')
print(f)

I get:

Nio file:	pgbf2018052400.01.2018052400.grb2
   global attributes:
   dimensions:
      lat_0 = 181
      lon_0 = 360
      lv_ISBL0 = 37
      lv_AMSL1 = 4
      lv_PVL2 = 2
      lv_SIGL3 = 4
      lv_ISBL4 = 32
      lv_ISBL5 = 2
      lv_HTGL6 = 2
   variables:
      float TMP_P0_L6_GLL0 [ lat_0, lon_0 ]
         center :	US National Weather Service - NCEP (WMC)
         production_status :	Operational products
         long_name :	Temperature
         units :	K
         _FillValue :	1e+20
         grid_type :	Latitude/longitude
         parameter_discipline_and_category :	Meteorological products, Temperature
         parameter_template_discipline_category_number :	[0, 0, 0, 0]
         level_type :	Maximum wind level
         level :	0
         forecast_time :	0
         forecast_time_units :	hours
         initial_time :	05/24/2018 (00:00)
      float TMP_P0_L7_GLL0 [ lat_0, lon_0 ]

etc...

but the names of the variables are already there. So it seems like these names are defined at Nio level (I think).

However, when I inspect the data with wgrib2 I have the following:

wgrib2 pgbf2018052400.01.2018052400.grb2

1:0:d=2018052400:PRES:mean sea level:anl:
2:66570:d=2018052400:HGT:1 mb:anl:
3:102113:d=2018052400:TMP:1 mb:anl:
4:118939:d=2018052400:RH:1 mb:anl:
5:122609:d=2018052400:SPFH:1 mb:anl:
6:141106:d=2018052400:VVEL:1 mb:anl:
7.1:219137:d=2018052400:UGRD:1 mb:anl:
7.2:219137:d=2018052400:VGRD:1 mb:anl:
8:283014:d=2018052400:ABSV:1 mb:anl:
9:319471:d=2018052400:O3MR:1 mb:anl:
10:372630:d=2018052400:HGT:2 mb:anl:
11:408159:d=2018052400:TMP:2 mb:anl:
12:425397:d=2018052400:RH:2 mb:anl:
13:427490:d=2018052400:SPFH:2 mb:anl:
14:459259:d=2018052400:VVEL:2 mb:anl:
15.1:517010:d=2018052400:UGRD:2 mb:anl:
15.2:517010:d=2018052400:VGRD:2 mb:anl:
16:580043:d=2018052400:ABSV:2 mb:anl:
17:617506:d=2018052400:O3MR:2 mb:anl:
18:672805:d=2018052400:HGT:3 mb:anl:
19:707799:d=2018052400:TMP:3 mb:anl:
20:725593:d=2018052400:RH:3 mb:anl:
21:729682:d=2018052400:SPFH:3 mb:anl:
22:765030:d=2018052400:VVEL:3 mb:anl:
23.1:828383:d=2018052400:UGRD:3 mb:anl:
23.2:828383:d=2018052400:VGRD:3 mb:anl:
24:892759:d=2018052400:ABSV:3 mb:anl:
25:931422:d=2018052400:O3MR:3 mb:anl:
26:979930:d=2018052400:HGT:5 mb:anl:

The variable names are PRES, HGT, ABSV, VGRD, ACPCP, and so on.

The problem I have is that I am trying to run an open_mfdataset in xarray to concatenate along multiple dimensions (in my case forecast start time and forecast lead time) thousands of these little files, unfortunately the variable names (of what should be the same quantity, i.e. convective precipitation, which in the grib file is simply named ACPC) generated by xarray/PyNIO change throughout the dataset going from ACPCP_P8_L1_GLL0_acc, to ACPCP_P8_L1_GLL0_acc6h, to ACPCP_P0_L1_GLL0, making things complicated.

I tried to overwrite the variable names with some preprocessing function, but I wanted to understand the rationale behind those names, to be sure that I am doing it correctly.

So, where does the part of the variable name, like "_P0_L6_GLL0", attached to the grib variable name come from? is there a way to control that?

@chiaral
Copy link
Author

chiaral commented May 25, 2018

After digging in the PyNIO page I found the GRIB2 options, and a description of how the names are created. Apologies for not having gone through it all before.

GRIB2 data variable name encoding
(Note: examples show intermediate steps in the formation of the name)

if production status is TIGGE test or operational and matches entry in TIGGE table:
   <parameter_short_name> (ex: t) 
else if entry matching product discipline, parameter category, and parameter number is found:
   <parameter_short_name> (ex: TMP) 
else:
   VAR_<product_discipline_number>_<parameter_category_number>_<parameter_number> (ex: VAR_3_0_9)

_P<product_definition_template_number> (ex: TMP_P0)

if single level type:
   _L<level_type_number> (ex: TMP_P0_L103)
else if two levels of the same type: 
   _2L<level_type_number> (ex: TMP_P0_2L106)
else if two levels of different types: 
   _2L<_first_level_type_number>_<second_level_type_number> (ex: LCLD_P0_2L212_213)

if grid type is supported (fully or partially):
   _G<grid_abbreviation><grid_number> (ex: UGRD_P0_L108_GLC0)
else:
   _G<grid_number> (ex: UGRD_P0_2L104_G0)

if not statistically processed variable and not duplicate name the name is complete at this point.

if statistically-processed variable and constant statistical processing duration:
   if statistical processing type is defined:
      _<statistical_processing_type_abbreviation><statistical_processing_duration><duration_units> (ex: APCP_P8_L1_GLL0_acc3h) 
   else
      _<statistical_processing_duration><duration_units> (ex: TMAX_P8_L103_GCA0_6h)
else if statistically-processed variable and variable-duration processing always begins at initial time:
   _<statistical_processing_type_abbreviation> (ex: ssr_P11_GCA0_acc)

if variable name is duplicate of existing variable name (this should not normally occur):
   _n (where n begins with 1 for first duplicate) (ex: TMAX_P8_L103_GCA0_6h_1)

You also note on the same page linked above "Generally speaking all of the applicable characteristics must be the same for two records to be considered part of the same variable."

Is there a way/option to limit it to, for example, TMAX?

I am very new to GRIB2, and I usually extracted data using WGRIB2, or pygrib, using shortname to access the variable i am interested in. I am just curious to know why the choice to add these attributes (<statistical_processing_type_abbreviation><statistical_processing_duration><duration_units>, for example) to the definition of the name?
I know that GRIB2 files format is a bit of a wild west, and that this might make sense for the majority of the times, but some time we might not want that behavior.

Do you think this is a feature you would be interested in developing (meaning, limit the name to the variable name to just, i.e., parameter_short_name)?

That would make the usage of PyNIO within open_mfdataset() flawless, whereas now I need to write ad-hoc preprocessing function to change the variable name constructed by PyNIO.

thanks!

@david-ian-brown
Copy link
Contributor

david-ian-brown commented May 25, 2018 via email

@chiaral
Copy link
Author

chiaral commented May 25, 2018

Thanks! indeed this helped a little:

using the one example on the webpage and scrolling down to Nio Usage:

import Nio
Nio.option_defaults

{'CompressionLevel': -1,
 'DefaultNCEPPtable': 'operational',
 'ExplicitFillValues': None,
 'Format': 'classic',
 'HeaderReserveSpace': 0,
 'InitialTimeCoordinateType': 'numeric',
 'MaskAboveValue': None,
 'MaskBelowValue': None,
 'MaskedArrayMode': 'MaskedIfFillAtt',
 'MissingToFillValue': True,
 'PreFill': True,
 'SafeMode': False,
 'SingleElementDimensions': 'none',
 'ThinnedGridInterpolation': 'cubic',
 'TimePeriodSuffix': True,
 'UseAxisAttribute': False}

opt = Nio.options()
opt.TimePeriodSuffix = False

Then running
f = Nio.open_file('gribfile.grb2', options=opt)
gets rid of the hour. I still would like to get rid of the whole P0..... but at least it helps reducing some differences.

@chiaral
Copy link
Author

chiaral commented May 29, 2018

For future references, in order to modify any of Nio's options when we use it as an engine in xr.open_mfdataset(), one needs to modify the default values:

Nio.option_defaults['TimePeriodSuffix']=True

@NicWayand
Copy link

Hi @chiaral, have you gotten clean short variable names working with the new pynio backend_kwargs? I still get TypeError: __init__() got an unexpected keyword argument 'TimePeriodSuffix' errors. It doesn't seem to like any of my Nio backend keywords. I also can't get Nio.option_defaults['TimePeriodSuffix']=True to effect my xr.open_dataset(...). Although using Nio.open_file(..) with option, does drop the suffix as expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants