Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard names: lake variables #25

Open
GeyerB opened this issue Aug 21, 2020 · 50 comments
Open

Standard names: lake variables #25

GeyerB opened this issue Aug 21, 2020 · 50 comments
Assignees
Labels
frequently asked question This issue or similar has been raised before and it should be considered for inclusion in the FAQ standard name (added by template) Requests and discussions for standard names and other controlled vocabulary

Comments

@GeyerB
Copy link

GeyerB commented Aug 21, 2020

Proposer's names Beate Geyer and Burkhardt Rockel
Date 2020/08/21

In numerical land surface models lake properties are taken into account. Lakes and reservoirs are included whereas rivers are excluded. Several quantities defined for sea water can be adapted. In #119 @taylor13 already expected such a demand - here we go:

Analogue to sea_floor_depth_below_sea_surface we want to apply for the name lake_floor_depth_below_lake_surface. The variable is used for modelling of temperature, stratification and evaporation from inland lakes.

-Term lake_floor_depth_below_lake_surface
-Definition The lake_floor_depth_below_lake_surface is the vertical distance between the lake surface and the lake bed as measured at a given point in space.
or
The lake_floor_depth_below_lake_surface is the vertical distance between the lake or reservoir surface and the lake/reservoir bed as measured at a given point in space.
-Units m

Analogue to sea_water_temperature: lake_water_temperature

-Term lake_water_temperature
-Definition Lake water temperature is the in situ temperature of the lake water. To specify the depth at which the temperature applies use a vertical coordinate variable or scalar coordinate variable.
-Units K

Analogue to ocean_mixed_layer_thickness: lake_mixed_layer_thickness

-Term lake_mixed_layer_thickness
-Definition The lake mixed layer is the upper part of the ocean, regarded as being well-mixed. Various criteria are used to define the mixed layer; this can be specified by using a standard name of lake_mixed_layer_defined_by_X. "Thickness" means the vertical extent of a layer.
-Units m

If agreement is reached on the wording "lake and reservoirs", all proposed descriptions will have to be adapted.

@GeyerB GeyerB added the standard name (added by template) Requests and discussions for standard names and other controlled vocabulary label Aug 21, 2020
@roy-lowry
Copy link

I think now is the time to question whether we start down the road of assigning parallel sets of Standard Names to cover different types of water bodies

Coming from an observational background I am uncomfortable with the different name for different water body type approach. I have handled data sets where a small vessel measuring temperature, salinity, nutrients, etc. in the Humber Estuary system started in the open North Sea and finished in the River Ouse at York many miles inland. Were there separate Standard Names for sea_water_temperature and river_water_temperature would semantic labelling force a single, coherent data set to be split?

Another issue is whether temperature measurements in the Dead Sea would be labelled sea_water_temperature or lake_water_temperature.

This issue affects a huge number of Standard Names - a search for 'sea_water' returns 432 hits and there are many more potential candidates when phrases like sea_surface are considered. The number of new Standard Names in this proposal is relatively small but it sets a precedent for many, many more.

So, a couple of questions to discuss

  1. Do we start down this road of Standard Name propagation for multiple water body types?
  2. If not, do we:
    a) Alias large numbers of Standard Names to replace sea_water by something more generic?
    b) Use some form of weasel words in the description field to indicate that by 'sea water' CF means the water in everything from puddles through to oceans?

@cothel
Copy link

cothel commented Aug 21, 2020

I am glad that Beate and Burkhardt are bringing the lake issue.
Some regional climate models are already coupled with lake model. At Ouranos, we are using CRCM5 coupled with FLake.
To define these new variables, we mimicked "sea" or "ocean" standard name. We are currently using:
lake_area_fraction
lake_depth
water_temperature_at_lake_floor
lake_ice_fraction
lake_ice_thickness
lake_mixed_layer_temperature
lake_mixed_layer_thickness

FLake offers more variables, but these are the ones we decided to archive.
I think we really need a new realm "lake" in ordrer to define were the coupling applies.

I certainly hear Roy's concerns, although don't think that lake -or river- related standard name will be as numerous as for the ocean. At this point, I am more comfortable with the "parallel" naming approach. But as the climate models resolution increase, modellers will soon face the same dilemmas raised by Roy from the obervationnal community.

Hélène

@roy-lowry
Copy link

I find the revalation that people are making up their own 'Standard' names without any attempt to get them incorporated into the accepted Standard Names list a concern. When we first attempted Europe-wide metadata interoperability 20 years ago in SEASEARCH every data centre had a copy of the 'standard' vocabulary set as locally-held Excel spreadsheets. Many extended their copy without telling anybody else as that was the 'easy' way. They then wondered why their metadata records were rejected by the ingestion tools developed for the project. Unless we work together and follow the rules ALL of the time the resulting standards are useless!

@cothel
Copy link

cothel commented Aug 21, 2020

I agree with you Roy.
In my experience in regional climate modelling, variables officially requested in internationnal initatives (MIPs, CORDEX, etc) are carefully handled but for all the others we often have to find a quick solution in order to avoid delays in simulation production. This is espacially true for RCMs since CORDEX requires a much shorter list of variables than CMIPs. By doing so, people in charge of data management are perfectly aware that extra efforts will be needed later to reach conformity. So planning ahead save us precious resources. Interoperability is a journey.
Hélène

@roy-lowry
Copy link

Can I suggest in future that if a quick fix is needed then place the draft names in the long_name attribute leaving the standard_name null, but not until the submission process for the draft names has been initiated. That way the files will always be CF-compliant (they are not compliant if there is something not in the list in the standard_name attribute). The problem with extra efforts later is that later never comes.

You are spot on about the need for thinking ahead. I have been managing vocabulary content and infrastructure for over 30 years and if I had a penny for every time I made that point I would be rich!

@taylor13
Copy link

good suggestion to place the proposed standard_name in long_name. As I recall we might have done something like that for one of the phases of CMIP. I'm not sure whether having an "empty" standard_name attribute would be considered compliant. Might be though. I think it's possible with the nco utility to add a global attribute to an existing netCDF file. Again not positive about this.

@roy-lowry
Copy link

@taylor13 CF rules state that either the standard_name or the long_name needs to be present. By 'null' standard_name I was thinking of no attribute rather than having the attribute there but set blank: maybe that wasn't clear. Some communities using CF have declared the standard_name to be mandatory but with very good reason this has never been endorsed by CF (just think of the pressure it would put on Standard Names maintenance). BTW these are parameter attributes not global attributes.

@taylor13
Copy link

yes, of course, standard_name and long_name are not a global attribute. Thanks for correcting this. To be sure, I thought neither was required by CF (although they're recommended). Do I have this wrong?

@roy-lowry
Copy link

My understanding is that neither is required but one of them must be present. It doesn't matter which one.

@tobstac
Copy link

tobstac commented Aug 25, 2020

I’d like to support Hélènes proposal to clearly separate sea and land-based water-related variables. One the one hand, this separation is necessary if data shall be interpolated between different grids (e.g. in the coupling of compartments) so that the origin of the data is still obvious. In satellite-based remote sensing, this separation was not done for a long time. However, after repeated requests by the land surface modeling and observing community, this has now been recognized as important.

Furthermore, it is always easier to merge several datasets into one (if necessary) instead of splitting it, as the latter requires much more additional information. While agreeing with Roy about the nuisance of splitting a comprehensive datasets into sea and land components, depending on the analysis users might want to do this anyway. In such cases, prescribing different variable names would require the data producer doing the separation, who is the best qualified person anyway, instead of having end users to guess.

@roy-lowry
Copy link

My understanding is that the mechanism in CF for differentiating between a temperature measurement in the sea or in a lake should be the area_type. This is an ancillary variable with the Standard Name 'area_type' populated from a controlled vocabulary (http://cfconventions.org/Data/area-type-table/current/build/area-type-table.html). rather than the Standard Name. A quick eyeball indicates that 'land' and 'sea' are covered, but not 'lake' or 'river'. However, these could easily be added through a GitHub ticket.

The area_type can be a scalar variable for data from a single area type, which would be best for your use case where you clearly want to classify data by area type. However, it can also be a vector as in the example below from the Conventions document. This suits my use case better providing a mechanism for me to have a data from multiple area types.

dimensions:
lat=73;
lon=96;
maxlen=20;
ls=2;
variables:
float surface_temperature(lat,lon);
surface_temperature:cell_methods="area: mean where land";
float surface_upward_sensible_heat_flux(ls,lat,lon);
surface_upward_sensible_heat_flux:coordinates="land_sea";
surface_upward_sensible_heat_flux:cell_methods="area: mean where land_sea";
char land_sea(ls,maxlen);
land_sea:standard_name="area_type";
data:
land_sea="land","sea";

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Aug 25, 2020 via email

@tobstac
Copy link

tobstac commented Aug 25, 2020

What about cases on a coarse model grid, where a given coastal grid cell may contain ocean values and land (lake) values. Can this also be separated using the area_type?

@roy-lowry
Copy link

What you would have to do there is have two variables with the same standard name to indicate water temperature (perfectly legal), one with the area_type 'sea' and one with the area_type 'land' (or 'lake' if that's set up). For sanity's sake I would also use the long_name to store the concatenation of Standard Name and area type.

@GeyerB
Copy link
Author

GeyerB commented Aug 25, 2020

@roy-lowry I tried to apply your example to our variables because I’m not sure whether my understanding is correct - at the same time it would answer Tobias’ question about the tail approach.
In your example

surface_upward_sensible_heat_flux:cell_methods="area: mean where land_sea";

I do not understand what 'where land_sea' means here.

float FR_LAKE(lat,lon);
FR_LAKE: standard_name=”area_fraction”
FR_LAKE: long_name=”lake area fraction”
FR_LAKE: coordinates="land_sea";
FR_LAKE: cell_methods="area: sum where lake";
float DEPTH_LK (lat,lon);
DEPTH_LK: standard_name =”water_floor_depth_below_water_surface”
DEPTH_LK: long_name=”lake depth”
DEPTH_LK: coordinates="land_sea";
DEPTH_LK: cell_methods="area: mean where lake";
float FR_RIVER(lat,lon);
FR_RIVER: standard_name=”area_fraction”
FR_RIVER: long_name=”river area fraction”
FR_RIVER: coordinates="land_sea";
FR_RIVER: cell_methods="area: sum where river";

char land_sea(ls,maxlen);
land_sea:standard_name="area_type";
data:
land_sea="land","sea",”lake”,”river”;`

@roy-lowry
Copy link

In the example (from CF document not mine!) the area_type is only applicable to the surface_upward_sensible_heat_flux. The surface_temperatures are only for land - hence their dimension is (lat, lon), not (ls,lat,lon) and the area type is given verbatim in the cell method. So, the fluxes are in a 3D array. For cell Y,X there will be a one flux in element (1,Y,X) and a another flux in element (2,Y,X). The 'where land_sea' tells us that to find out which is land and which is sea we need to look at land_sea(1) for element (1,Y,X) and see that it is the land flux.

Have a read through section 7.3.3 of the Conventions, which should help you understand. The important point is that there are TWO conventions. One where the area_type controlled term is included as verbatim text in the cell method and the other where several controlled terms are placed in a co-ordinate variable. You could just use the second convention and point it at a scalar variable (see below)

Now for your data. First packing your two area fractions in a single array with a generic Standard Name (my preference). Note this requires two new entries in the area_type controlled vocabulary for lake and river

atypes = 2
maxlen=20
lat= 50
lon = 50
float LATITUDE(lat)
float LONGITUDE(lon)
float AFRAC (atypes,lat,lon);
AFRAC: standard_name=”area_fraction”
AFRAC: long_name=”area fraction”
AFRAC: coordinates="area_type latitude longitude";
AFRAC: cell_methods="area: sum where area_type";
float DEPTH_LK (lat,lon);
DEPTH_LK: standard_name =”water_floor_depth_below_water_surface”
DEPTH_LK: long_name=”lake depth”
DEPTH_LK: coordinates="latitude longitude";
DEPTH_LK: cell_methods="area: mean where lake";
char area_type (atypes,maxlen)
area_type:standard_name="area_type";
data:
area_type='lake','river'

Or, as you had them in separate arrays with the area_types verbatim in the cell_method attributes.

lat= 50
lon = 50
float LATITUDE(lat)
float LONGITUDE(lon)
float FR_LAKE(lat,lon);
FR_LAKE: standard_name=”area_fraction”
FR_LAKE: long_name=”lake area fraction”
FR_LAKE: coordinates="latitude longitude";
FR_LAKE: cell_methods="area: sum where lake";
float DEPTH_LK (lat,lon);
DEPTH_LK: standard_name =”water_floor_depth_below_water_surface”
DEPTH_LK: long_name=”lake depth”
DEPTH_LK: coordinates="latitude longitude";
DEPTH_LK: cell_methods="area: mean where lake";
float FR_RIVER(lat,lon);
FR_RIVER: standard_name=”area_fraction”
FR_RIVER: long_name=”river area fraction”
FR_RIVER: coordinates="latitude longitude";
FR_RIVER: cell_methods="area: sum where river";

Note that a horde of x_area_fraction Standard Names got through before somebody noticed and came up with the generic method based on area_type, which is what I'm trying to stop happening again.

@taylor13
Copy link

I second the suggestion that a read through section 7.3.3 might help. And like Jonathan and Roy I would prefer the first suggested approach in #25 because it is absolutely clear that the same variable is being measured (but for different portions of grid cells).

@GeyerB
Copy link
Author

GeyerB commented Aug 27, 2020

OK, do I have open a new issue
CF Area Type Table: entries for lake and river
An area type of "lake" means a body of (usually fresh) water surrounded by land.
An area type of "river" means a natural stream of water of considerable volume. (see http://glossary.ametsoc.org/wiki/River)

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Aug 27, 2020 via email

@roy-lowry
Copy link

They make sense to me as well. @japamment or @feggleton should be able to advise if they can take them forward from this ticket or if a new one is needed. I would also like their opinion on Question 2 in my first posting in this thread.

@feggleton
Copy link
Collaborator

Hi all,

Thanks for discussing this thoroughly. To confirm, we are scrapping the 3 lake terms and adding lake and river to the area type table instead? In which case this is fine and I will update the editor to reflect this. From the discussion we have this for the area type table:

CF Area Type Table: entries for lake and river
An area type of "lake" means a body of (usually fresh) water surrounded by land.
An area type of "river" means a natural stream of water of considerable volume. (see http://glossary.ametsoc.org/wiki/River)

It should be ok to just take them from this ticket, @japamment can you confirm this is ok and the definitions are ok for you?

Thanks

@taylor13
Copy link

Can we clarify in our definition what distinguishes a lake from an inland sea? I've seen descriptions that say a sea is at sea level and connected to an ocean while a lake may be above or below sea level and if connected to the ocean is not a "sea". (Under this definition the "Dead Sea" is a lake.)

Also we have "sea" as an area type that includes oceans. (There is no "ocean" area type.) In some AOGCMs some (inland) seas may be resolved as an area of water, but without any dynamical treatment (i.e., treated like a lake). Perhaps these should be described as lakes in CF, not as seas??? There may be some bays that are handled similarly, but perhaps an ocean modeler can verify this and provide some advice on what should be done.

@tobstac
Copy link

tobstac commented Sep 14, 2020

It seems there is no clear distinction between lakes and inland seas as inland seas are not a class by its own, but rather a different name for very large lakes (e.g. the caspian sea) (Bootsma, 2018, Oceans, Lakes, and Inland Seas: A Virtual Issue on the Large Lakes of the World, https://doi.org/10.1002/lob.10230).
From this viewpoint, a distinction between sea and lake seems to be sufficient. There might be inconsistencies in the end, because of the individual land masks the modelling groups design for their models that might disagree in attributing a given water area either to the land or the ocean model. But I guess this cannot be solved by the CF area type definition.

@GeyerB
Copy link
Author

GeyerB commented Sep 21, 2020

Dear all,
From the discussion, we were lead to from the proposed lake variables to the general variables for water. I suggest allowing for all existing sea_water variables to avoid the need of extra coordinates for them.

An example:
-Term water_floor_depth_below_water_surface
-Definition The water_floor_depth_below_water_surface is the vertical distance between the water surface and the bed of the water body as measured at a given point in space. To specify which water body is described by a variable with standard name water_floor_depth_below_water_surface, provide a coordinate variable or scalar coordinate variable with standard name area_type. In case of area_type ‘sea’, use the more specific standard name of sea_floor_depth_below_sea_surface.
-Units m

If we need further discussion on this we could transfer it to a new issue and close this one. @taylor13 – is it ok for you when we follow @tobstac and keep the ‘easy’ definition for the new area_types given last by @feggleton?

@feggleton Should I open a new issue for water variables?

Best regards
Beate

@JonathanGregory
Copy link
Contributor

Dear all

I think the proposal to use area_type to distinguish sea and lake works well for quantities with standard_name such as area_fraction and surface_upward_sensible_heat_flux which are obviously applicable to all area-types. The simplest solution would be to do the same with "sea" quantities. For example if sea_floor_depth_below_sea_surface has an area_type of lake it means lake floor depth below lake surface. Thus we would need no new standard names. Would that be acceptable, even though not ideal?

Quite a few times before we have had discussed the alternative of introducing something generic, such as "water body". Personally I think that possibility is unattractive. It is correct, but it's not a common phrase and it's cumbersome e.g. water_body_water_temperature would sound awkward to me. In many context "water" alone would be insufficient since water exists in the atmosphere and on land as well. Moreover if we followed that approach we would probably want to make all the ~400 existing "sea" names e.g. sea_water_temperature into aliases for "water body" names.

Jonathan

@roy-lowry
Copy link

Thanks @GeyerB for pushing this forward. I asked this question back in August and offered two alternatives, which have now been proposed by yourself and @JonathanGregory . From a backwards-compatability perspective, Jonathan's proposal is the less disruptive as the hundreds of 'sea_water' Standard Names have been extensively used over the past 20-odd years. All that would be required is the addition of a line of text in the definitions along the lines of 'sea_water means the wates of sea of ocean unless associated with another type of water body through area_type.

@GeyerB suggestion requires the creation of hundreds of aliases, but no new Standard Names. There is back-office semantic infrastructure in place (mappings) that have the potential for software agents to automatically realise that 'water' and 'sea_water' are semantically equivalent. Only problem is that I'm not aware of any software with AI that actively uses these mappings. Consequently, there would be pressure to change the Standard Names in existing file stock.

I led the charge to introduce the phrase 'water_body' into Standard Names a long while ago, but failed to get it accepted. It is used extensively in the 'P01' vocabulary that I set up over 20 years ago, which has been a part of my life for so long that I consider 'water_body' to be natural language,. However, I appreciate that I may be a long way from the norm here.

Unusually, I find myself sitting firmly on the fence here. My strong belief in backwards compatibility make the definition change attractive, but its inelegance makes me shudder. The words it brings to my mind are 'fudge' and 'bodge'. Set against this is my fear of the reaction of CF user communities to the introduction of such widescale changes through aliases, potentially invalidating huge numbers of files and possibly breaking application software that uses CF operationally.

I'd love to know how others in the CF community feel.

@StefanHagemann
Copy link

I suggest to clearly separate water related variables over land from those over the ocean/sea. However, I agree that using the term 'water body' is not very common, and the pure meaning of the term refers to any water body, not just those over land. Consequently, I suggest to use the term 'inland_water' to separate variables related to lakes (or rivers) from those over the ocean/sea.

Actually, separating ocean water from land water in land cover maps has been a longstanding issue in the remote sensing community requested by modellers. This, e.g., has been brought forward by Alexander Löw and myself within the ESA CCI projects CMUG and LandCover, respectively. Eventually, this has been realised by the latest version of the ESA LandCover CCi water body product CCI WB v4.0 (Lamarche et al. 2017) which now comprises "a static map with the distinction between ocean and inland water is now available at 150 m spatial resolution. "(http://maps.elie.ucl.ac.be/CCI/viewer/download.php).

Having a text somewhere that explains that a sea water variable is not sea-water related under certain aspects is a bad solution. Not everybody will search for such a text (or even expect tha tsuch a text exists) that explains a variable is not what its name suggests.

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Sep 22, 2020 via email

@StefanHagemann
Copy link

Yes. Currently, I think that this would be sufficient for most of the variables as they are still related to the water in those inland water bodies (such as lake sand rivers). There might be a few variables that may not occur in inland water bodies, where the wording seems wrong. However, I assume that these cases are neglible.

@roy-lowry
Copy link

I'm getting confused!!! I thought we had agreed in this thread to the use area_type constructs and NOT to create new Standard Names for inland water bodies and that the stage we are now at is to decide how we can modify the existing sea_water Standard Names to cover the waters of all area types.

@roy-lowry
Copy link

@Hag I think you have misunderstood. The intention of introducing generic terminology like 'water' or 'water_body' is to describe ALL water bodies, not just those on land.

@StefanHagemann
Copy link

Do I? You wrote above that 'sea_water' Standard Names have been extensively used over the past 20-odd years. Hence, a renaming of those sea_water names does not seem to be an acceptable option.
You also wrote " All that would be required is the addition of a line of text in the definitions along the lines of 'sea_water means the water of sea of ocean unless associated with another type of water body through area_type."
This was the option I was objecting to in my statement above.
Consequently, the introduction of a new set of standard names with respect to inland-water seems a practical solution to me. First, it avoids names for different types of water bodies (such as lakes, rivers). Second, it is a clear separation of inland water variables from ocean variables. Usually, land and ocean models are clearly separated. They may be connected by couplers but their outputs will still be separated. Hence, providing a set of mixed variables may not be favorable by each of the two (land and ocean) related modelling communities.

@japamment
Copy link
Member

Dear All,

I support the proposal to introduce 'lake' and 'river' as area types. We already have some standard names for river quantities, mostly related to water flow and transport. We also make mention of ice on lakes in the definitions of floating_ice and quantities for ice_and_snow_on_land. It's clear, therefore, that these areas need to be accounted for in land surface and hydrology models and observations.

Regarding the original proposal for lake standard names, Roy's comments 1 and 2 set me thinking:

So, a couple of questions to discuss

1. Do we start down this road of Standard Name propagation for multiple water body types?

2. If not, do we:
   a) Alias large numbers of Standard Names to replace sea_water by something more generic?
   b) Use some form of weasel words in the description field to indicate that by 'sea water' CF means the water in everything from puddles through to oceans?

It's certainly true that we work to avoid unnecessary proliferation of standard names, but I think it is also important to consider that sometimes it makes more sense to introduce more names for reasons of clarity. For example, we have many standard names for different types of precipitation: rain, snow, hail, graupel, solid, liquid, stratiform, convective, etc. We also have the umbrella term 'precipitation' for cases when we mean all of the above. These distinctions are useful and I don't think we'd contemplate redefining 'rain' for example, to mean 'rain and snow and graupel' simply to keep down the number of standard names.

To answer Roy's questions (and apologies for the delay) I don't think it makes sense to do either option a or option b! The sea_water and ocean names are in wide use in the oceanography community and I don't think we would be serving that community well by insisting that they change all their sea names to 'water body' names. Also, I think it would be very convoluted and confusing if we suddenly change the meaning of 'sea' to refer to all water bodies. As with the precipitation names, for clarity and to allow us to accurately describe the variables (which is after all the main aim), I think it is preferable to introduce separate names for sea, lake, river and so on. This would be in addition to introducing the new area_types. But then there is the case that Roy originally raised regarding a dataset that begins in the sea, moves through an estuary and ends up a long way inland in a river. To cope with this particular instance, we should also allow 'water body' names, and this would be an umbrella term for 'sea', 'river', 'lake', etc. in the same way that 'precipitation' is an umbrella term for 'rain', 'snow', etc. I believe this would address everyone's use cases while at the same time keeping the terms used in standard names clear and understandable (and of course we need appropriate definitions for them all).

Best wishes,
Alison

@GeyerB
Copy link
Author

GeyerB commented Sep 22, 2020

@GeyerB suggestion requires the creation of hundreds of aliases, but no new Standard Names.

was not what I wanted to express, my intension with

I suggest allowing for all existing sea_water variables to avoid the need of extra coordinates for them.

was to keep all sea_water variables and create new (inland_)water variables.

As the comment by @japamment supports the proposal of @StefanHagemann to use inland_water standard_names additional to the sea_water ones I would like to discuss the thoughts of @JonathanGregory further.
These are fine with me:
eastward_inland_water_velocity, freezing_temperature_of_inland_water, mass_concentration_of_oxygen_in_inland_water, inland_water_salinity, inland_water_temperature.
But for the water depth we should use:
inland_water_floor_depth_below_water_surface,
and for the height above (or below) geoid I would propose inland_water_surface_altitude
inland_water_mixed_layer_thickness

Do we agree to go ahead with inland_water?

@cothel
Copy link

cothel commented Sep 22, 2020

Dear all,

After being away for a while, I was a bit behind in this conservation. Looking at all suggestions, I tried to figure out what they would imply for our organization since we are both data providers and data users. I have to agree with @japamment because it gives the flexibility to merge files from different area types into one or to split files according to their area type coordinate. Either way, those needs exist and require clarity. So Alison's suggestion work best for us

@roy-lowry
Copy link

@Hag I was considering renaming the sea_water Standard Names through aliasing as a viable, though far from perfect option. I am an advocate of using semantic infrastructure to render what might at first glance seem unacceptable to be acceptable. I think it a little unfair to present my second quote without my comments about how I feel about the proposition ('fudge' and 'bodge'). My aim was to draw sufficient negative feedback to get the idea totally discounted.

@japamment You overlook the area_type discussion. A principle of CF is that Standard Names should wherever possible avoid duplicating the functionality of other constructs in the Conventions. The 'area_type' concept was introduced as a result of a previous Standard Name propagation discussion very similar to this one, only a blind eye seems to have been turned to semantic shortcomings of using 'sea_water' Standard Names for all area types. I suspect that there are lake temperature data out in the wild with 'sea_water_temperature' Standard Names. What happens to 'area_type' if a set of 'inland_water' Standard Names are set up? There seems to be a parallel with the '2m air temperature' issue' here.

Finally, bear in mind the potential scale of proliferation here. I know @Hag gives the impression that just a few new names would be needed. But what if somebody wishes to encode lake water chemistry data into CF. We would end up with two, possibly three, Standard Names per chemical species.

@MathewBiddle
Copy link

MathewBiddle commented Sep 22, 2020

Here is an archived example of lake temperature data using the CF standard name sea_water_temperature .

update Here is a list of archived datasets that follow that construct.

@tobstac
Copy link

tobstac commented Sep 22, 2020

There might be some confusion here because the detail and depth in which the different participants are familiar with the CF convention, its use cases and limitations, might differ somewhat. In my case - being a humble modeller and data analysis - there is the wish to have readily separated fields e.g. for water_depth where I can see at once whether it refers to an ocean or a lake. Whether this is done using an additional dimension like area_type or a prefix to variable name should probably be decided by the experts for this convention.
However, I would strongly advocate to aim for a consistent solution. There are large amounts of variables out there like sea_ice_fraction and having the analogues of lake_ice_fraction or river_ice_fraction would seem to be consistent. I'd also happily would accept an area_type, but wouldn't we then have to have a variable called ice_fraction and the area_type=[sea, lake, river] to indicate the specifics? Wouldn't this mean we'd have to remove all prefixes from variables that already contain an area type in its name for sake of consistency?

@roy-lowry
Copy link

@tobstac Good points. Sadly, inconsistencies have crept into the Conventions - an inevitable consequence of 'design by community'. Trouble is there are very few people (and getting fewer as people retire) with a total overview of the Conventions and those people only have time to get involved with a subset of GitHub threads. I have been around the Conventions for a long time but would only describe my knowledge as patchy. It was the proliferation of 'area_fraction' Standard Names like sea_ice_area_fraction and grounded_ice_sheet_area_fraction that drove the 'area_type' concept with the generic Standard Name 'area_fraction'. Trouble was that it didn't happen before a large number of 'x_area_type' Standard Names had been set up creating a very confusing picture for newbies coming to the Conventions. A golden rule of vocabulary management is never to delete anything so fixing inconsistencies like that isn't possible.

@dblodgett-usgs
Copy link

Just scanned the discussion -- this has wandered significantly and it's pretty clear there is a mix of levels of understanding of the convention and use cases here.

There are two semi-overlapping mechanisms at play: standard names and specialized cell methods. The fact that standard names get used for all sorts of things means they tend to be overloaded -- causing overlap in scope with cell methods.

IMHO, we should be cautious about introducing standard names for inland waters which are spatially bounded within the domain of a given dataset. Since such waterbodies are discrete a data variable will either be extremely sparse, a mix of data types with regard to spatial features, or attributed to spatial geometries which would be typed using some other mechanism.

These specialized cell methods feel problematic to me. The area type list is there to be used, but it's a highly specialized yet general tools that is not going to be applicable to all use cases that require categorical typing of cell geometry or polygon geometry.

Based on this discussion, I agree with @tobstac and pile onto the question:

Wouldn't this mean we'd have to remove all prefixes from variables that already contain an area type in its name for sake of consistency?

I think no. Given that "seas" tend to be the entire spatial domain of a given variable, it's not so much an area type as a keyword that helps tell a client how to visualize the quantity (or some other use cases for standard names). I think we could use the "sea_" standard names as inspiration for general standard names that would require an "area_type" to disambiguate the spatial domain type within a given data variable that has mixed "area_types". i.e. if you leave off "sea_" in your standard name and use an "area_type" of "sea" then you get the same behavior from a client.

@GeyerB
Copy link
Author

GeyerB commented Sep 29, 2020

@dblodgett-usgs

i.e. if you leave off "sea_" in your standard name...

does this mean that you agree to my sugestion from #25
where I proposed to leave all sea_ variables untouched and create new ones where necessary with 'water' instead of 'sea_water' including the hint on the needed coordinate 'area_type'?

At this point all contributors see that the discussion goes round in circles - @japamment @JonathanGregory @feggleton could you convene a panel to take a decision?

@GeyerB
Copy link
Author

GeyerB commented Jan 19, 2021

Dear @japamment - we would like to use the requested variables, do you see a chance to resolve the discrepancies in the discussion above by resolution?

@martinjuckes
Copy link

I've just caught up with this discussion today, so please accept my apologies if I've missed anything. I support @japamment 's proposal to retain sea_water_.. names and also add a new set of more general water body names which can be used flexibly with area types.

@GeyerB has suggested a slightly different approach, with new names specifically for inland water rather than the more general water body term. I would prefer Alison's proposal because it allows the terms to be used for datasets which span, for instance, rivers and oceans.

I'm not sure about the use "floor" to apply to the bed of rivers and lakes. If these terms are intended to apply to all water bodies, a more general term such as "water body bottom" may be better:

  • water_body_area_fraction
  • water_body_depth
  • temperature_at_water_body_bottom
  • water_body_ice_fraction
  • water_body_ice_thickness
  • water_body_mixed_layer_temperature
  • water_body_mixed_layer_thickness

I also support adding river and lake to the area_type table.

There appear to be open questions about exactly what is meant by a "lake" (and perhaps also around questions of when a river stops being a river and becomes a lake or part of the sea). Wikipedia notes that "There is considerable uncertainty about defining the difference between lakes and ponds, and no current internationally accepted definition of either term across scientific disciplines or political boundaries exists." When a term such as surface_temperature is used with ..where lake we would also need to be clear about how this applies to ephemeral lakes. I think this is best dealt with by providing some guidance and advice on how people can include information about their choices, e.g. by making use of the cell_methods comment string as in area: mean where lake (comment: lake here means inland water bodies with an area greater than 40ha). The CMIP6 data uses the comment string to refer to a separate data variable which provides specific information about the masking in the form of the area_fraction occupied by the named area_type (giving people the variable name is useful, because they cannot, with the current generation of search tools, search for the variable with standard name area_fraction and area_type equal to the desired value).

@StefanSimis
Copy link

Hi, data producer here for the Lakes CCI and Copernicus Land Monitoring Service. Glad to have been pointed to this thread by Alison. We would like to start implementing standard names which do not confuse data users, yet for which no 'lake' alternative is yet formalized. Sources of confusion would be to fall back to 'sea_water' as a user would not intuitively accept the naming as relevant to their needs when addressing inland water bodies - this seems to be generally accepted in this thread already. Generalised standard names (water_surface_temperature, ....where area_type = lake) seem useful. 'Water_body' does not seem to add much over just 'water' i.m.o. unless there can be confusion over whether the water resides in the body or overhead.. is that a real risk?

It would be wonderful to come to a conclusion on this during 2022 while we start to produce version 2.1 of the Lakes_cci dataset (we call them lakes, but they include reservoirs, river sections pretending to be lakes, and some lagoons). We will then follow with a new Issue to introduce the proposed names for our ECV domain.

@roy-lowry
Copy link

Reasoning for using 'water body' rather than 'water' when setting up the NERC Parameter Usage Vocabulary was to allow for datasets like chemical analyses of rainwater and sediment pore water. We came across them a lot in BODC but I don't think that they have been encountered to date in CF.

@JonathanGregory
Copy link
Contributor

JonathanGregory commented May 20, 2022 via email

@StefanSimis
Copy link

Hi both, nice to see this thread still being read.
Yes @roy-lowry I see your point there and I do not think water_body is offensive. Would be happy to adopt this in our request.
@JonathanGregory without being pedantic, we should be less interested in how easy the solution is and more in what is accurate or at least functional. Not including a standard_name is an easy solution and can be CF compliant, and is in fact the current situation for some of the variables.

In balance, we would opt to provide just a long name rather than a confusing standard name, because we cannot assume every user to assume sea is jargon for water body.

To consider: GCOS extended the Lakes ECV fairly recently (2016). The appearance of new, global observation datasets seems an awfully good opportunity to introduce an intuitive naming convention rather than force new information into an old jacket..

@JonathanGregory
Copy link
Contributor

Dear @StefanSimis

I am always in favour of CF conventions being clear and functional! As I said, there are hundreds of existing standard name containing the word sea. Many of these are widely used and familiar. They would be less obvious and self-explanatory than they are now if sea were changed to water_body. I understand that sea isn't a good term for lakes and rivers, though. As I also said, we've discussed this question many times before.

We have an "alias" mechanism in CF standard names. This is for changing names retrospectively while still permitting the old versions, not for synonyms. I wonder whether with this problem we do need to allow synonyms, however. Would it be sensible to allow the word sea to be replaced by lake or river in any standard name, with unchanged meaning. For instance, sea_water_temperature and river_water_temperature would be synonyms, and sea_ice_thickness and lake_ice_thickness would be synonyms? To avoid "green dogs" (Roy's term) we would have check that this would make sense for all the existing sea standard names, and remember to consider the same question for any new ones in future.

Best wishes

Jonathan

@MathewBiddle
Copy link

@joe-smithe-glos FYI in case the Great Lakes Observing System has some perspectives.

@joe-smithe-glos
Copy link

Thanks @MathewBiddle. Where GLOS sits currently is something we articulated to a provider recently: 'we actually aren't interested in changing the names to be "fresh_water" because they are in global use and we want our data to be considered part of that and show up on searches for that term. That said-- our front-facing definitions make it clear it can be either or. We use the same technologies (by in large) as the oceanographic communities for real-time data collection.' (CC: @slbrunner)

I think that's in line with the conversation above upon quick scan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frequently asked question This issue or similar has been raised before and it should be considered for inclusion in the FAQ standard name (added by template) Requests and discussions for standard names and other controlled vocabulary
Projects
None yet
Development

No branches or pull requests