-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable names inferred from grib2 files through Nio.open_file - or xr.open_dataset with engine pynio #19
Comments
After digging in the PyNIO page I found the GRIB2 options, and a description of how the names are created. Apologies for not having gone through it all before.
You also note on the same page linked above "Generally speaking all of the applicable characteristics must be the same for two records to be considered part of the same variable." Is there a way/option to limit it to, for example, TMAX? I am very new to GRIB2, and I usually extracted data using WGRIB2, or pygrib, using shortname to access the variable i am interested in. I am just curious to know why the choice to add these attributes (<statistical_processing_type_abbreviation><statistical_processing_duration><duration_units>, for example) to the definition of the name? Do you think this is a feature you would be interested in developing (meaning, limit the name to the variable name to just, i.e., parameter_short_name)? That would make the usage of PyNIO within open_mfdataset() flawless, whereas now I need to write ad-hoc preprocessing function to change the variable name constructed by PyNIO. thanks! |
There is an Nio option that you can set called TimePeriodSuffix. It does
not seem to be mentioned in the PyNIO documentation, but here is what the
NCL documentation says. I don't think there is any reason it would not be
available for PyNIO as well. It may not do everything you want, but it may
help.
*TimePeriodSuffix**Default value*: *True*
This option applies to GRIB1- and GRIB2-formatted files. A value of True
indicates that statistically-processed variables such as averages and
accumulations have a time period and time unit added after the suffix
indicating the statistical variable type. For example, the suffix "_avg3h"
represents a 3 hour average. These suffixes are required to uniquely
characterize otherwise identically-named variables that have different
periods and/or units within the same file. However, when concatenating
variables from different files using the *addfiles*
<http://www.ncl.ucar.edu/Document/Functions/Built-in/addfiles.shtml> function
differences in these suffixes can prevent individual variables from being
concatenated into a single composite variable when it is actually
desireable. Setting this option to False removes the time period and units
from the variable name leaving only the statistical processing type (e.g.
"_avg" for an average or "_acc" for an accumulation).
…On Fri, May 25, 2018 at 8:32 AM, chiaral ***@***.***> wrote:
After digging in the PyNIO page I found the GRIB2 options
<https://www.pyngl.ucar.edu/NioFormats.shtml#GRIB2-support-details>, and
a description of how the names are created. Apologies for not having gone
through it all before.
GRIB2 data variable name encoding
(Note: examples show intermediate steps in the formation of the name)
if production status is TIGGE test or operational and matches entry in TIGGE table:
<parameter_short_name> (ex: t)
else if entry matching product discipline, parameter category, and parameter number is found:
<parameter_short_name> (ex: TMP)
else:
VAR_<product_discipline_number>_<parameter_category_number>_<parameter_number> (ex: VAR_3_0_9)
_P<product_definition_template_number> (ex: TMP_P0)
if single level type:
_L<level_type_number> (ex: TMP_P0_L103)
else if two levels of the same type:
_2L<level_type_number> (ex: TMP_P0_2L106)
else if two levels of different types:
_2L<_first_level_type_number>_<second_level_type_number> (ex: LCLD_P0_2L212_213)
if grid type is supported (fully or partially):
_G<grid_abbreviation><grid_number> (ex: UGRD_P0_L108_GLC0)
else:
_G<grid_number> (ex: UGRD_P0_2L104_G0)
if not statistically processed variable and not duplicate name the name is complete at this point.
if statistically-processed variable and constant statistical processing duration:
if statistical processing type is defined:
_<statistical_processing_type_abbreviation><statistical_processing_duration><duration_units> (ex: APCP_P8_L1_GLL0_acc3h)
else
_<statistical_processing_duration><duration_units> (ex: TMAX_P8_L103_GCA0_6h)
else if statistically-processed variable and variable-duration processing always begins at initial time:
_<statistical_processing_type_abbreviation> (ex: ssr_P11_GCA0_acc)
if variable name is duplicate of existing variable name (this should not normally occur):
_n (where n begins with 1 for first duplicate) (ex: TMAX_P8_L103_GCA0_6h_1)
You also note on the same page linked above "Generally speaking all of the
applicable characteristics must be the same for two records to be
considered part of the same variable."
Is there a way/option to limit it to, for example, TMAX?
I am very new to GRIB2, and I usually extracted data using WGRIB2, or
pygrib, using shortname to access the variable i am interested in. I am
just curious to know why the choice to add these attributes
(<statistical_processing_type_abbreviation><statistical_
processing_duration><duration_units>, for example) to the definition of
the name?
I know that GRIB2 files format is a bit of a wild west, and that this
might make sense for the majority of the times, but some time we might not
want that behavior.
Do you think this is a feature you would be interested in developing
(meaning, limit the name to the variable name to just, i.e.,
parameter_short_name)?
That would make the usage of PyNIO within open_mfdataset() flawless,
whereas now I need to write ad-hoc preprocessing function to change the
variable name constructed by PyNIO.
thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#19 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB7VbVybg-8UxocatYCk7ilb9QSS3lIuks5t2BXjgaJpZM4UNNSd>
.
|
Thanks! indeed this helped a little: using the one example on the webpage and scrolling down to Nio Usage:
Then running |
For future references, in order to modify any of Nio's options when we use it as an engine in xr.open_mfdataset(), one needs to modify the default values:
|
Hi @chiaral, have you gotten clean short variable names working with the new pynio backend_kwargs? I still get |
I posted this on SO, but I think it belongs here as well.
Below an edited version:
I am using the data found here (Note, these are rotating files of forecast data, so the actual date will change as time goes by, you might need to update the date in my example in a few days)
which is read as a dataset with some coordinates and variables
and many other variables... i am not copying here...
my struggle right now is to understand how these variable names (i.e. PLI_P0_2L108_GLL0) are generated.
If I run directly Nio:
I get:
etc...
but the names of the variables are already there. So it seems like these names are defined at Nio level (I think).
However, when I inspect the data with wgrib2 I have the following:
The variable names are PRES, HGT, ABSV, VGRD, ACPCP, and so on.
The problem I have is that I am trying to run an open_mfdataset in xarray to concatenate along multiple dimensions (in my case forecast start time and forecast lead time) thousands of these little files, unfortunately the variable names (of what should be the same quantity, i.e. convective precipitation, which in the grib file is simply named ACPC) generated by xarray/PyNIO change throughout the dataset going from ACPCP_P8_L1_GLL0_acc, to ACPCP_P8_L1_GLL0_acc6h, to ACPCP_P0_L1_GLL0, making things complicated.
I tried to overwrite the variable names with some preprocessing function, but I wanted to understand the rationale behind those names, to be sure that I am doing it correctly.
So, where does the part of the variable name, like "_P0_L6_GLL0", attached to the grib variable name come from? is there a way to control that?
The text was updated successfully, but these errors were encountered: