Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard names: lake variables #25

Open
GeyerB opened this issue Aug 21, 2020 · 83 comments
Open

Standard names: lake variables #25

GeyerB opened this issue Aug 21, 2020 · 83 comments
Assignees
Labels
frequently asked question This issue or similar has been raised before and it should be considered for inclusion in the FAQ standard name (added by template) Requests and discussions for standard names and other controlled vocabulary

Comments

@GeyerB
Copy link

GeyerB commented Aug 21, 2020

Proposer's names Beate Geyer and Burkhardt Rockel
Date 2020/08/21

In numerical land surface models lake properties are taken into account. Lakes and reservoirs are included whereas rivers are excluded. Several quantities defined for sea water can be adapted. In #119 @taylor13 already expected such a demand - here we go:

Analogue to sea_floor_depth_below_sea_surface we want to apply for the name lake_floor_depth_below_lake_surface. The variable is used for modelling of temperature, stratification and evaporation from inland lakes.

-Term lake_floor_depth_below_lake_surface
-Definition The lake_floor_depth_below_lake_surface is the vertical distance between the lake surface and the lake bed as measured at a given point in space.
or
The lake_floor_depth_below_lake_surface is the vertical distance between the lake or reservoir surface and the lake/reservoir bed as measured at a given point in space.
-Units m

Analogue to sea_water_temperature: lake_water_temperature

-Term lake_water_temperature
-Definition Lake water temperature is the in situ temperature of the lake water. To specify the depth at which the temperature applies use a vertical coordinate variable or scalar coordinate variable.
-Units K

Analogue to ocean_mixed_layer_thickness: lake_mixed_layer_thickness

-Term lake_mixed_layer_thickness
-Definition The lake mixed layer is the upper part of the ocean, regarded as being well-mixed. Various criteria are used to define the mixed layer; this can be specified by using a standard name of lake_mixed_layer_defined_by_X. "Thickness" means the vertical extent of a layer.
-Units m

If agreement is reached on the wording "lake and reservoirs", all proposed descriptions will have to be adapted.

@GeyerB GeyerB added the standard name (added by template) Requests and discussions for standard names and other controlled vocabulary label Aug 21, 2020
@roy-lowry
Copy link

I think now is the time to question whether we start down the road of assigning parallel sets of Standard Names to cover different types of water bodies

Coming from an observational background I am uncomfortable with the different name for different water body type approach. I have handled data sets where a small vessel measuring temperature, salinity, nutrients, etc. in the Humber Estuary system started in the open North Sea and finished in the River Ouse at York many miles inland. Were there separate Standard Names for sea_water_temperature and river_water_temperature would semantic labelling force a single, coherent data set to be split?

Another issue is whether temperature measurements in the Dead Sea would be labelled sea_water_temperature or lake_water_temperature.

This issue affects a huge number of Standard Names - a search for 'sea_water' returns 432 hits and there are many more potential candidates when phrases like sea_surface are considered. The number of new Standard Names in this proposal is relatively small but it sets a precedent for many, many more.

So, a couple of questions to discuss

  1. Do we start down this road of Standard Name propagation for multiple water body types?
  2. If not, do we:
    a) Alias large numbers of Standard Names to replace sea_water by something more generic?
    b) Use some form of weasel words in the description field to indicate that by 'sea water' CF means the water in everything from puddles through to oceans?

@cothel
Copy link

cothel commented Aug 21, 2020

I am glad that Beate and Burkhardt are bringing the lake issue.
Some regional climate models are already coupled with lake model. At Ouranos, we are using CRCM5 coupled with FLake.
To define these new variables, we mimicked "sea" or "ocean" standard name. We are currently using:
lake_area_fraction
lake_depth
water_temperature_at_lake_floor
lake_ice_fraction
lake_ice_thickness
lake_mixed_layer_temperature
lake_mixed_layer_thickness

FLake offers more variables, but these are the ones we decided to archive.
I think we really need a new realm "lake" in ordrer to define were the coupling applies.

I certainly hear Roy's concerns, although don't think that lake -or river- related standard name will be as numerous as for the ocean. At this point, I am more comfortable with the "parallel" naming approach. But as the climate models resolution increase, modellers will soon face the same dilemmas raised by Roy from the obervationnal community.

Hélène

@roy-lowry
Copy link

I find the revalation that people are making up their own 'Standard' names without any attempt to get them incorporated into the accepted Standard Names list a concern. When we first attempted Europe-wide metadata interoperability 20 years ago in SEASEARCH every data centre had a copy of the 'standard' vocabulary set as locally-held Excel spreadsheets. Many extended their copy without telling anybody else as that was the 'easy' way. They then wondered why their metadata records were rejected by the ingestion tools developed for the project. Unless we work together and follow the rules ALL of the time the resulting standards are useless!

@cothel
Copy link

cothel commented Aug 21, 2020

I agree with you Roy.
In my experience in regional climate modelling, variables officially requested in internationnal initatives (MIPs, CORDEX, etc) are carefully handled but for all the others we often have to find a quick solution in order to avoid delays in simulation production. This is espacially true for RCMs since CORDEX requires a much shorter list of variables than CMIPs. By doing so, people in charge of data management are perfectly aware that extra efforts will be needed later to reach conformity. So planning ahead save us precious resources. Interoperability is a journey.
Hélène

@roy-lowry
Copy link

Can I suggest in future that if a quick fix is needed then place the draft names in the long_name attribute leaving the standard_name null, but not until the submission process for the draft names has been initiated. That way the files will always be CF-compliant (they are not compliant if there is something not in the list in the standard_name attribute). The problem with extra efforts later is that later never comes.

You are spot on about the need for thinking ahead. I have been managing vocabulary content and infrastructure for over 30 years and if I had a penny for every time I made that point I would be rich!

@taylor13
Copy link

good suggestion to place the proposed standard_name in long_name. As I recall we might have done something like that for one of the phases of CMIP. I'm not sure whether having an "empty" standard_name attribute would be considered compliant. Might be though. I think it's possible with the nco utility to add a global attribute to an existing netCDF file. Again not positive about this.

@roy-lowry
Copy link

@taylor13 CF rules state that either the standard_name or the long_name needs to be present. By 'null' standard_name I was thinking of no attribute rather than having the attribute there but set blank: maybe that wasn't clear. Some communities using CF have declared the standard_name to be mandatory but with very good reason this has never been endorsed by CF (just think of the pressure it would put on Standard Names maintenance). BTW these are parameter attributes not global attributes.

@taylor13
Copy link

yes, of course, standard_name and long_name are not a global attribute. Thanks for correcting this. To be sure, I thought neither was required by CF (although they're recommended). Do I have this wrong?

@roy-lowry
Copy link

My understanding is that neither is required but one of them must be present. It doesn't matter which one.

@tobstac
Copy link

tobstac commented Aug 25, 2020

I’d like to support Hélènes proposal to clearly separate sea and land-based water-related variables. One the one hand, this separation is necessary if data shall be interpolated between different grids (e.g. in the coupling of compartments) so that the origin of the data is still obvious. In satellite-based remote sensing, this separation was not done for a long time. However, after repeated requests by the land surface modeling and observing community, this has now been recognized as important.

Furthermore, it is always easier to merge several datasets into one (if necessary) instead of splitting it, as the latter requires much more additional information. While agreeing with Roy about the nuisance of splitting a comprehensive datasets into sea and land components, depending on the analysis users might want to do this anyway. In such cases, prescribing different variable names would require the data producer doing the separation, who is the best qualified person anyway, instead of having end users to guess.

@roy-lowry
Copy link

My understanding is that the mechanism in CF for differentiating between a temperature measurement in the sea or in a lake should be the area_type. This is an ancillary variable with the Standard Name 'area_type' populated from a controlled vocabulary (http://cfconventions.org/Data/area-type-table/current/build/area-type-table.html). rather than the Standard Name. A quick eyeball indicates that 'land' and 'sea' are covered, but not 'lake' or 'river'. However, these could easily be added through a GitHub ticket.

The area_type can be a scalar variable for data from a single area type, which would be best for your use case where you clearly want to classify data by area type. However, it can also be a vector as in the example below from the Conventions document. This suits my use case better providing a mechanism for me to have a data from multiple area types.

dimensions:
lat=73;
lon=96;
maxlen=20;
ls=2;
variables:
float surface_temperature(lat,lon);
surface_temperature:cell_methods="area: mean where land";
float surface_upward_sensible_heat_flux(ls,lat,lon);
surface_upward_sensible_heat_flux:coordinates="land_sea";
surface_upward_sensible_heat_flux:cell_methods="area: mean where land_sea";
char land_sea(ls,maxlen);
land_sea:standard_name="area_type";
data:
land_sea="land","sea";

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Aug 25, 2020 via email

@tobstac
Copy link

tobstac commented Aug 25, 2020

What about cases on a coarse model grid, where a given coastal grid cell may contain ocean values and land (lake) values. Can this also be separated using the area_type?

@roy-lowry
Copy link

What you would have to do there is have two variables with the same standard name to indicate water temperature (perfectly legal), one with the area_type 'sea' and one with the area_type 'land' (or 'lake' if that's set up). For sanity's sake I would also use the long_name to store the concatenation of Standard Name and area type.

@GeyerB
Copy link
Author

GeyerB commented Aug 25, 2020

@roy-lowry I tried to apply your example to our variables because I’m not sure whether my understanding is correct - at the same time it would answer Tobias’ question about the tail approach.
In your example

surface_upward_sensible_heat_flux:cell_methods="area: mean where land_sea";

I do not understand what 'where land_sea' means here.

float FR_LAKE(lat,lon);
FR_LAKE: standard_name=”area_fraction”
FR_LAKE: long_name=”lake area fraction”
FR_LAKE: coordinates="land_sea";
FR_LAKE: cell_methods="area: sum where lake";
float DEPTH_LK (lat,lon);
DEPTH_LK: standard_name =”water_floor_depth_below_water_surface”
DEPTH_LK: long_name=”lake depth”
DEPTH_LK: coordinates="land_sea";
DEPTH_LK: cell_methods="area: mean where lake";
float FR_RIVER(lat,lon);
FR_RIVER: standard_name=”area_fraction”
FR_RIVER: long_name=”river area fraction”
FR_RIVER: coordinates="land_sea";
FR_RIVER: cell_methods="area: sum where river";

char land_sea(ls,maxlen);
land_sea:standard_name="area_type";
data:
land_sea="land","sea",”lake”,”river”;`

@roy-lowry
Copy link

In the example (from CF document not mine!) the area_type is only applicable to the surface_upward_sensible_heat_flux. The surface_temperatures are only for land - hence their dimension is (lat, lon), not (ls,lat,lon) and the area type is given verbatim in the cell method. So, the fluxes are in a 3D array. For cell Y,X there will be a one flux in element (1,Y,X) and a another flux in element (2,Y,X). The 'where land_sea' tells us that to find out which is land and which is sea we need to look at land_sea(1) for element (1,Y,X) and see that it is the land flux.

Have a read through section 7.3.3 of the Conventions, which should help you understand. The important point is that there are TWO conventions. One where the area_type controlled term is included as verbatim text in the cell method and the other where several controlled terms are placed in a co-ordinate variable. You could just use the second convention and point it at a scalar variable (see below)

Now for your data. First packing your two area fractions in a single array with a generic Standard Name (my preference). Note this requires two new entries in the area_type controlled vocabulary for lake and river

atypes = 2
maxlen=20
lat= 50
lon = 50
float LATITUDE(lat)
float LONGITUDE(lon)
float AFRAC (atypes,lat,lon);
AFRAC: standard_name=”area_fraction”
AFRAC: long_name=”area fraction”
AFRAC: coordinates="area_type latitude longitude";
AFRAC: cell_methods="area: sum where area_type";
float DEPTH_LK (lat,lon);
DEPTH_LK: standard_name =”water_floor_depth_below_water_surface”
DEPTH_LK: long_name=”lake depth”
DEPTH_LK: coordinates="latitude longitude";
DEPTH_LK: cell_methods="area: mean where lake";
char area_type (atypes,maxlen)
area_type:standard_name="area_type";
data:
area_type='lake','river'

Or, as you had them in separate arrays with the area_types verbatim in the cell_method attributes.

lat= 50
lon = 50
float LATITUDE(lat)
float LONGITUDE(lon)
float FR_LAKE(lat,lon);
FR_LAKE: standard_name=”area_fraction”
FR_LAKE: long_name=”lake area fraction”
FR_LAKE: coordinates="latitude longitude";
FR_LAKE: cell_methods="area: sum where lake";
float DEPTH_LK (lat,lon);
DEPTH_LK: standard_name =”water_floor_depth_below_water_surface”
DEPTH_LK: long_name=”lake depth”
DEPTH_LK: coordinates="latitude longitude";
DEPTH_LK: cell_methods="area: mean where lake";
float FR_RIVER(lat,lon);
FR_RIVER: standard_name=”area_fraction”
FR_RIVER: long_name=”river area fraction”
FR_RIVER: coordinates="latitude longitude";
FR_RIVER: cell_methods="area: sum where river";

Note that a horde of x_area_fraction Standard Names got through before somebody noticed and came up with the generic method based on area_type, which is what I'm trying to stop happening again.

@taylor13
Copy link

I second the suggestion that a read through section 7.3.3 might help. And like Jonathan and Roy I would prefer the first suggested approach in #25 because it is absolutely clear that the same variable is being measured (but for different portions of grid cells).

@GeyerB
Copy link
Author

GeyerB commented Aug 27, 2020

OK, do I have open a new issue
CF Area Type Table: entries for lake and river
An area type of "lake" means a body of (usually fresh) water surrounded by land.
An area type of "river" means a natural stream of water of considerable volume. (see http://glossary.ametsoc.org/wiki/River)

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Aug 27, 2020 via email

@roy-lowry
Copy link

They make sense to me as well. @japamment or @feggleton should be able to advise if they can take them forward from this ticket or if a new one is needed. I would also like their opinion on Question 2 in my first posting in this thread.

@feggleton
Copy link
Collaborator

Hi all,

Thanks for discussing this thoroughly. To confirm, we are scrapping the 3 lake terms and adding lake and river to the area type table instead? In which case this is fine and I will update the editor to reflect this. From the discussion we have this for the area type table:

CF Area Type Table: entries for lake and river
An area type of "lake" means a body of (usually fresh) water surrounded by land.
An area type of "river" means a natural stream of water of considerable volume. (see http://glossary.ametsoc.org/wiki/River)

It should be ok to just take them from this ticket, @japamment can you confirm this is ok and the definitions are ok for you?

Thanks

@taylor13
Copy link

Can we clarify in our definition what distinguishes a lake from an inland sea? I've seen descriptions that say a sea is at sea level and connected to an ocean while a lake may be above or below sea level and if connected to the ocean is not a "sea". (Under this definition the "Dead Sea" is a lake.)

Also we have "sea" as an area type that includes oceans. (There is no "ocean" area type.) In some AOGCMs some (inland) seas may be resolved as an area of water, but without any dynamical treatment (i.e., treated like a lake). Perhaps these should be described as lakes in CF, not as seas??? There may be some bays that are handled similarly, but perhaps an ocean modeler can verify this and provide some advice on what should be done.

@tobstac
Copy link

tobstac commented Sep 14, 2020

It seems there is no clear distinction between lakes and inland seas as inland seas are not a class by its own, but rather a different name for very large lakes (e.g. the caspian sea) (Bootsma, 2018, Oceans, Lakes, and Inland Seas: A Virtual Issue on the Large Lakes of the World, https://doi.org/10.1002/lob.10230).
From this viewpoint, a distinction between sea and lake seems to be sufficient. There might be inconsistencies in the end, because of the individual land masks the modelling groups design for their models that might disagree in attributing a given water area either to the land or the ocean model. But I guess this cannot be solved by the CF area type definition.

@GeyerB
Copy link
Author

GeyerB commented Sep 21, 2020

Dear all,
From the discussion, we were lead to from the proposed lake variables to the general variables for water. I suggest allowing for all existing sea_water variables to avoid the need of extra coordinates for them.

An example:
-Term water_floor_depth_below_water_surface
-Definition The water_floor_depth_below_water_surface is the vertical distance between the water surface and the bed of the water body as measured at a given point in space. To specify which water body is described by a variable with standard name water_floor_depth_below_water_surface, provide a coordinate variable or scalar coordinate variable with standard name area_type. In case of area_type ‘sea’, use the more specific standard name of sea_floor_depth_below_sea_surface.
-Units m

If we need further discussion on this we could transfer it to a new issue and close this one. @taylor13 – is it ok for you when we follow @tobstac and keep the ‘easy’ definition for the new area_types given last by @feggleton?

@feggleton Should I open a new issue for water variables?

Best regards
Beate

@JonathanGregory
Copy link
Contributor

Dear all

I think the proposal to use area_type to distinguish sea and lake works well for quantities with standard_name such as area_fraction and surface_upward_sensible_heat_flux which are obviously applicable to all area-types. The simplest solution would be to do the same with "sea" quantities. For example if sea_floor_depth_below_sea_surface has an area_type of lake it means lake floor depth below lake surface. Thus we would need no new standard names. Would that be acceptable, even though not ideal?

Quite a few times before we have had discussed the alternative of introducing something generic, such as "water body". Personally I think that possibility is unattractive. It is correct, but it's not a common phrase and it's cumbersome e.g. water_body_water_temperature would sound awkward to me. In many context "water" alone would be insufficient since water exists in the atmosphere and on land as well. Moreover if we followed that approach we would probably want to make all the ~400 existing "sea" names e.g. sea_water_temperature into aliases for "water body" names.

Jonathan

@roy-lowry
Copy link

Thanks @GeyerB for pushing this forward. I asked this question back in August and offered two alternatives, which have now been proposed by yourself and @JonathanGregory . From a backwards-compatability perspective, Jonathan's proposal is the less disruptive as the hundreds of 'sea_water' Standard Names have been extensively used over the past 20-odd years. All that would be required is the addition of a line of text in the definitions along the lines of 'sea_water means the wates of sea of ocean unless associated with another type of water body through area_type.

@GeyerB suggestion requires the creation of hundreds of aliases, but no new Standard Names. There is back-office semantic infrastructure in place (mappings) that have the potential for software agents to automatically realise that 'water' and 'sea_water' are semantically equivalent. Only problem is that I'm not aware of any software with AI that actively uses these mappings. Consequently, there would be pressure to change the Standard Names in existing file stock.

I led the charge to introduce the phrase 'water_body' into Standard Names a long while ago, but failed to get it accepted. It is used extensively in the 'P01' vocabulary that I set up over 20 years ago, which has been a part of my life for so long that I consider 'water_body' to be natural language,. However, I appreciate that I may be a long way from the norm here.

Unusually, I find myself sitting firmly on the fence here. My strong belief in backwards compatibility make the definition change attractive, but its inelegance makes me shudder. The words it brings to my mind are 'fudge' and 'bodge'. Set against this is my fear of the reaction of CF user communities to the introduction of such widescale changes through aliases, potentially invalidating huge numbers of files and possibly breaking application software that uses CF operationally.

I'd love to know how others in the CF community feel.

@StefanHagemann
Copy link

I suggest to clearly separate water related variables over land from those over the ocean/sea. However, I agree that using the term 'water body' is not very common, and the pure meaning of the term refers to any water body, not just those over land. Consequently, I suggest to use the term 'inland_water' to separate variables related to lakes (or rivers) from those over the ocean/sea.

Actually, separating ocean water from land water in land cover maps has been a longstanding issue in the remote sensing community requested by modellers. This, e.g., has been brought forward by Alexander Löw and myself within the ESA CCI projects CMUG and LandCover, respectively. Eventually, this has been realised by the latest version of the ESA LandCover CCi water body product CCI WB v4.0 (Lamarche et al. 2017) which now comprises "a static map with the distinction between ocean and inland water is now available at 150 m spatial resolution. "(http://maps.elie.ucl.ac.be/CCI/viewer/download.php).

Having a text somewhere that explains that a sea water variable is not sea-water related under certain aspects is a bad solution. Not everybody will search for such a text (or even expect tha tsuch a text exists) that explains a variable is not what its name suggests.

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Sep 22, 2020 via email

@StefanHagemann
Copy link

Yes. Currently, I think that this would be sufficient for most of the variables as they are still related to the water in those inland water bodies (such as lake sand rivers). There might be a few variables that may not occur in inland water bodies, where the wording seems wrong. However, I assume that these cases are neglible.

@JonathanGregory
Copy link
Contributor

Dear Luke @lhmarsden

Regarding the main point, perhaps we could propose to change the standard name search to enable "Also search help text" initially, like "Also search aliases text". I can imagine that might be helpful in other situations as well, where a user guesses that a word might appear, but actually we have used a different word in the standard name they're looking for. Often we mention alternatives in the descriptions for that reason.

Regarding your side point, I think it would be very useful to have an AI tool which could answer the question, "What standard name should I use for such-and-such?" Perhaps ChatGPT can already give useful answers to that question?

Best wishes

Jonathan

@JonathanGregory
Copy link
Contributor

Dear all

Prompted by Luke's question, I reread this issue, which is concerned with the problem we have had for many years that most of our water-body names are for sea_water, but there are use-cases for similar sets of standard names for inland water bodies (lakes, rivers, reservoirs, etc.) Points that have been raised include

  • If we can avoid it, we don't want to proliferate standard names by duplicating the sea_water names, or more than duplicating if we need distinct river names, lake names, etc. as well.

  • We would rather not replace sea_water names with aliases in the standard name table, because there are so many of them, and they're very widely used.

  • area_type can be used to specify lake, river etc. and thus make use of sea_water names, if we agree that it's OK to generalise sea_water names in that way, but although that would be workable, it would be liable to cause confusion.

  • Some applications may have water in more than one of these kinds of water body, and shouldn't be forced to make arbitrary distinctions between them if they're actually connected continuously.

  • There isn't a handy obvious word in English with the general meaning. "Water body" is possible and correct, but not familiar and perhaps cumbersome.

Given all the above, I'd like to make a slightly different suggestion, which would require a new mechanism, but not a difficult one, I think. We could introduce inland_water (previously discussed) as an acceptable alternative to sea_water in any standard name, as if there were an alias, but not actually defining the 528 aliases that would currently require. You could regard exchanging sea_water and inland_water as a translation process.

Some practical implications would be:

  • The standard name HTML page would need a statement that sea_water and inland_water are exchangeable, and a switch for the user to choose which one ought to be presented.

  • It wouldn't matter which variant you used when writing a dataset, since they'd be synonymous, but you could distinguish among different sea or inland water bodies with the area_type, as previously discussed.

  • Any program which checks a given standard name against the table to see if it was valid would have to replace inland_water with sea_water (converting it to a "canonical form") before making the check. Similarly that would be necessary before compiling a list of unique standard names in a dataset. For use by any such applications, we could add this "translation" specification as an extra tag in the XML version of the table.

  • Any new sea_water standard name which is added to the table would automatically be available for inland_water as well.

Is this workable? Is it too complicated? Would it deal with the problems?

Best wishes

Jonathan

@lhmarsden
Copy link

I think using inland_water is a neat solution. I suppose in some cases (e.g. the Caspian "Sea") people might be unsure which to use, but as you say, the terms could be used interchangeably so there would be no wrong answer.

Practically speaking, people should always be including geospatial coordinates anyway.

I think this solution would solve a lot of problems.

Cheers

Luke

@lhmarsden
Copy link

What about ice?

There are many sea_ice terms. There are fewer land_ice terms. Could the same logic be applied here?

One criticism I have heard of the land_ice terms is what do you do when a glacier extends over the sea?

Your idea of being able to use these interchangeably could also address this issue.

Luke

@DocOtak
Copy link
Member

DocOtak commented Jan 22, 2025

I'd be resistant to a solution that requires software changes such as the one proposed, especially if we aren't maintaining that software for the community. We should just use the alias system that already exists. There is nothing really technically wrong with adding 500 aliases, but I wouldn't want to add 500+ aliases in anticipation of some need, as that doesn't seem very "CF" philosophically to me. I would prefer a case by case addition as limnologists requests names.

For my own trying to understand, is "sea" or "inland" actually important here, why not just plain water? In my own work, the fact that the water came from the ocean is less important than the fact I am measuring things that are dissolved (or suspended) in it as water. The specific where is encoded in coordinates.

sea_ice vs land_ice refers to the formation process. sea_ice being formed from freezing seawater (haha... ambiguous in light of above), glaciers, shelfs, even ice bergs are all "land_ice"

@lhmarsden
Copy link

I wouldn't want to add 500+ aliases in anticipation of some need, as that doesn't seem very "CF" philosophically to me.

In my opinion, CF needs to be more proactive here, and this sounds like low hanging fruit. Whilst CF has been well adopted in physical oceanography, it is less used in the terrestrial realm. Someone new to CF is unlikely to propose a new standard_name. This process is not straightforward and perhaps data managers like me should play the middle man between scientists and the conventions in proposing these terms, which I currently do on a need-to-do basis. A more proactive approach would be a much more efficient use of time and resources.

I don't really mind whether we have 500+ aliases or whether the terms are reworded to 'water' instead of 'sea_water'. I do, however, think it should be easier for new users to use CF. I think this should be a key part of the CF philiosphy, even it requires some change from how things have traditionally been done.

sea_ice vs land_ice refers to the formation process. sea_ice being formed from freezing seawater (haha... ambiguous in light of above), glaciers, shelfs, even ice bergs are all "land_ice"

There are a few caveats to this.

  • If someone is creating a gridded dataset of sea ice concentration, for example, they will include the ice regardless of where it formed, bergs and all.
  • Sea ice can thicken due to precipitation falling on its surface and freezing, so this portion of the ice is not formed from freezing sea water - and you see this in salinity profiles.
  • There may be some scenarios where someones dataset encompasses both land and sea ice.

I am being purposely pedantic here, but my point is there is always going to be some grey area, so I would be in support of simplifying.

@JonathanGregory
Copy link
Contributor

Dear Luke and Barna

From previous discussions, I tend to agree with Luke that the insistence on sea_water for inland water is an obstacle to the adoption of standard names by terrestrial hydrologists, and we should do something about it. "water" alone wouldn't be specific enough, because there is water in the atmosphere and below ground too. Inland water and sea water are contained in "water bodies" (seas, oceans, lakes, rivers), by which we mean large volumes of usually liquid water beneath the atmosphere, whose area and volume are delimited by the topography of their solid lower surface. Since lakes, rivers and the ocean are connected, there are applications in which you really don't want to distinguish them. I think that this fact also implies that we don't need to distinguish them.

If there was in English a convenient and obvious term for "water body water", I expect we would have replaced sea_water by this term a long time ago, as soon as we started to need to name inland water properties. In the absence of such a term, what I'm suggesting is that we recognise that sea_water and inland_water are synonyms for the same geophysical concept.

CF aliases are not synonyms; they're needed when we change our mind about the best choice of words, and we don't provide them just to allow variety. That's why my suggestion would be a new mechanism, that would permit sea_water and inland_water as equivalent phrases in standard names. You can tell that this is different from aliases because it would also apply, without any specific decision, to any new sea_water name that we add to the table. That is, we're confident that replacing sea_water with inland_water in the new name will not create nonsense. If we do find cases in future where sea_water and inland_water have to be distinguished as different substances, we will have to use another term which isn't either of those.

I don't think land ice and sea ice are synonymous in the same way. They are distinguished in their composition, how they are formed, and how they move, not just by where they are (like seas and lakes). However, there are some circumstances in which they are aggregated, as Luke says, in a melange for instance, or because icebergs and sea-ice floes behave similarly. We have two existing standard names for floating_ice for that purpose, and of course more could be added if needed.

I agree that sea ice, lake ice and river ice are similar, like sea water and inland water. This hasn't been raised as a difficulty as often as sea_water has been. If my suggestion for sea_water is adopted, I agree we could consider a similar thing with sea_ice.

Best wishes

Jonathan

@taylor13
Copy link

(Not sure I've followed all of the above, but I have concerns.) If someone proposed mass_concentration_of_plastic_in_inland_water in order to characterize pollution of lakes, then we would likely approve it and for that case the user would include "where lake" in the cell_methods. Now If I wanted to characterize plastic pollution in the ocean, my variable would have the standard name and cell methods mass_concentration_of_plastic_in_inland_water and "where sea". I find that very confusing because I've always thought of "where sea" to be synonymous with "where ocean and any large body of salt water directly connected to it (like the Mediterranean Sea). I don't think of any body of water included in "where sea" as being "inland_water".

I also think it would be difficult for the average user to grasp that "sea_water" includes fresh water as well as salt water, and they would be surprised that sea_water_pressure_at_sea_floor might refer to the pressure at the bottom of a lake.

I looked but did not find where we define what is meant by "where sea". I'm pretty sure that originally it was meant to be limited to those bodies of water that at some resolution would be included in an ocean general circulation model (OGCM). Inland lakes and rivers were always thought to be part of "land", not "sea". If that is correct, then I don't think defining "sea_water" and "inland_water" as being synonymous is a good idea. If we wanted to define a new area_type to be used to specify "where inland_water" (including rivers, lakes, ponds, etc.), I don't think anyone would want to include the oceans.

Regarding "substances", as opposed to "bodies", I think sea water as a substance might best be defined as salty water and that fresh water as not salty water. I don't think that considering "sea water" as a substance provides any clarity. I wouldn't include the substance comprising a fresh water lake in "sea water".

Unfortunately, although I strongly oppose making sea_water and inland_water synonyms, I can't think of some alternative I like.

@JonathanGregory
Copy link
Contributor

Dear Karl

My idea in suggesting synonyms is that you would use whichever variant you thought appropriate. We would put only one canonical form (the sea_water one, I guess) in the XML, but allow it to be displayed as inland_water as an option in the HTML, and make searches on the table show both of them. Therefore, if we accepted someone's proposal for mass_concentration_of_plastic_in_inland_water, we would add it to the XML as mass_concentration_of_plastic_in_sea_water, and this would at once define both of them, since they're synonymous. Presumably you would prefer the inland_water version for lakes, and the sea_water version for oceans. The substance of which a lake is composed would most naturally be called inland_water. If there's an appropriate name with sea_water defined already, it could be used for the lake, either as it is, or with inland_water substituted for sea_water. If you have a dataset with rivers discharging into the ocean, you could use either, with area_type to distinguish if you needed to.

I'm sure you're right that initially we were happy with sea_water alone because we were dealing with GCM data. But even fairly low-resolution GCMs might include active ocean gridboxes for the Great Lakes, which are fresh and flow into the ocean, or the Caspian Sea, which is a brackish lake with no outlet. We are stretching a point to use sea_water for these, perhaps, and the higher the resolution, the more numerous the problems!

Best wishes

Jonathan

@DocOtak
Copy link
Member

DocOtak commented Jan 24, 2025

@JonathanGregory What, in your proposal here, would go into the actual netCDF files under the standard_name attribute?

@JonathanGregory
Copy link
Contributor

The standard_name attribute could contain (for instance) either sea_water_temperature or inland_water_temperature, according to what the data-writer thought was more appropriate for the dataset. Only sea_water_temperature would appear in the XML of the standard name table, and software checking for valid standard names would need to be aware that any standard name containing the phrase inland_water is synonymous with the standard name you get by replacing inland_water with sea_water. Therefore inland_water_temperature would be validated because sea_water_temperature is valid.

You could also regard this synonym mechanism as a sort of implicit alias. It differs from an alias because

  • The single rule that inland_water and sea_water are synonymous would affect all standard names containing sea_water. One synonym to rule them all, instead of one alias for each affected standard name.
  • It would affect any future sea_water name as well, whereas we do not create aliases in advance of standard names.

Best wishes

Jonathan

@JonathanGregory
Copy link
Contributor

There's another difference from aliases:

  • Synonyms are equally valid, whereas aliases have been superseded.

I fear I've made this sound too complicated by proposing a technical way to do it as the same time as the idea to be achieved. I'd like to distinguish these.

The idea is more important. I propose that we should regard any pair of standard names as synonymous if they are the same except that sea_water in one is replaced by inland_water in the other. If two netCDF variables have all the same CF metadata except for the standard name, and their standard names are a pair like this, then the metadata should be treated as exactly the same.

Is this idea reasonable and acceptable? It means checking that it's OK for all existing sea_water names, and not creating any in future for which it wouldn't be OK. If accepted, would it solve this long-running problem?

The way I proposed to implement it is not essential to the idea. Here's another way it could be done, which would mean more work when adding new names to the standard name table, but no change to the way we use or present the table:

  1. In Appendix B on the format of the standard name table, define a new attribute e.g. purpose of the <alias> element in the standard name table.

  2. Give all existing <alias> elements in the standard name table the attribute e.g. purpose="replacement". For example

<alias id="specific_potential_energy" purpose="replacement">
<entry_id>specific_gravitational_potential_energy</entry_id>
</alias>
  1. Create an inland_water alias with purpose="synonym" for all existing sea_water entries e.g.
<entry id="sea_water_density">
<canonical_units>kg m-3</canonical_units>
<description>Sea water density is the in-situ density (not the potential density). If 1000 kg m-3 is subtracted, the standard name sea_water_sigma_t should be chosen instead.</description>
</entry>
<alias id="inland_water_density" purpose="synonym">
<entry_id>sea_water_density</entry_id>
</alias>
  1. Modify the <description> text of the existing names to make it suitable for both inland water and sea water.

  2. Every time a sea_water name is added in future, also add a corresponding inland_water alias with purpose="synonym" and a description suitable for both synonyms. If a new inland_water name is needed, add it a sea_water entry with an inland_water alias.

Any program which treats aliases as valid standard names would thus accept inland_water synonyms without modification. They would be found by searching the standard name table with aliases included, which is the default. The purpose attribute indicates that the synonyms are equally valid, not preferred replacements like the existing aliases.

Happy weekend.

Jonathan

@DocOtak
Copy link
Member

DocOtak commented Jan 24, 2025

I really don't like this. Having "valid" CF standard names that do not actually appear in the xml as either an entry or alias is not a thing that should be supported or done. It needs to remain a simple lookup table without requiring logic or substitutions.

Where is the the idea that aliases aren't still valid coming from? I do see in the conventions document that:

It is not intended that the alias elements be used to accommodate the use of local naming conventions in the standard_name attribute strings.

But I'm not sure what "local naming conventions" means here. As far as I can tell, aliases are just synonyms with no explicit intent (e.g. aliases are superseded by the non alias), they are simply a pointer to the name that has the definition.


There is a lot of discussion about area type above? Has that idea been abandoned? It seems to have quite a bit of support (e.g. #25 (comment))

@StefanHagemann
Copy link

Dear All
after reading the more-than-4-years-lasting discussion again, I think that the following seems to be a valid solution.

  1. Using the area type (incl. sea, lake, river) is a sensible approach as discussed in 2020/21.
  2. In order to allow users to expect these area types also in variables with 'sea' in the standard names, having sea_water and inland_water as synonyms would be great. (Jonathan's suggestion).
  3. Only if the latter is not feasible, then the alias approach is suitable, where inland_water is made equivalent to sea_water. However, the synonym approach seems to be more straight forward.

I sincerely hope that this issue finally comes to a viable solution.

Best regards
Stefan

@lhmarsden
Copy link

lhmarsden commented Jan 27, 2025

2. having sea_water and inland_water as synonyms would be great. (Jonathan's suggestion)

What about having water_body as a synonym/alias instead of inland_water? Then we have a term that can be used in all cases.

This at least gives us the chance of having most people using the same term for all water bodies in 10, 20, 30 years time.

@JonathanGregory
Copy link
Contributor

@DocOtak. I understand why you don't like the idea of valid standard names that don't appear in the table. I didn't like it either on reflection! That was my first suggestion of how to implement synonyms, to make sea_water and inland_water equivalent. What do you think of my later suggestion to use aliases to implement the same idea? This would be a bit more work for the maintainers, but a lot less work for the users, so it's better overall, I think.

I think the important explanation is just before the bit you quoted from Appendix B:

The purpose of the alias elements are to provide a means for maintaining the table in a backwards compatible fashion. For example, if more than one id string was found to correspond to identical definitions, then the redundant definitions can be converted into aliases.

The aliases are valid for use as standard names, but they're supported only for backwards compatibility. This implies they're deprecated for new data, although we don't explicitly say that. I agree that the following sentence, about "local naming conventions", is not clear enough. I'd suggest it should be a more general statement, such as

It is not intended that the alias elements be used to accommodate alternative and equivalent standard names, for instance for use in different datasets or by different communities of users.

However, the second suggestion I made is exactly to use aliases as synonyms. Because it's a new use for aliases, I'd propose to distinguish them with the extra attribute, and obviously the text would have to be changed. Does that make sense?

@JonathanGregory
Copy link
Contributor

@lhmarsden

What about having water_body as a synonym instead of inland_water? Then we have a term that can be used in all cases.

This at least gives us the chance of having most people using the same term for all water bodies in 10, 20, 30 years time.

I almost wrote a postscript on Friday suggesting we could allow water_body_water as well as inland_water and sea_water. It might be suitable to use "water body" in an application which includes both seas and inland water bodies. If you only have "sea", I feel it would be less self-explanatory.

@StefanHagemann
Copy link

Yes, I fully agree with having both synonyms. Having only the synonym water_body but not inland_water was previously somewhat disregarded because many people may not be familiar with the term water body, However, with allowing both synonyms, we should be able to take everyone on board.

@DocOtak
Copy link
Member

DocOtak commented Jan 31, 2025

@JonathanGregory I agree with some clarification in the conventions document, but I cannot see the need for modifying how the standard name xml works technically. That is, once you do sort out what terms you want to add, just add them as an alias, or, if not actually synonymous to some existing one, as a new standard name.

I'm also kinda torn here, standard names are supposed to answer the question of "are these values comparable" and I imagine right now that is currently possible (mostly) without access to the full standard name table, i.e. only having two data files. To that end, I wouldn't actually care if the standard names are opaque strings as long as the string values indicate that the variable data values are comparable. I don't know if lake and ocean (*_sea_water) variable are actually comparable from a scientific perspective. The imperfections and ambiguities feel somewhat "normal" to me given things like the confusion caused by canonical units.

@davidhassell
Copy link
Collaborator

Hello,

I'm also resistant to a change that says "sea_water" and "inland_water" are in some sense the same. That is clearly not always the case to a human reading, and can be confusing to software (as noted by @DocOtak highlighted).

I do favour the creation of a general term to replace "sea_water" in new standard names ("water_body" is fine for me), which comes along with a requirement of using a cell method with area_type.

For existing datasets we could preserve backwards compatibility for "sea_water" names, by introducing a new area type of "unknown", and if the dataset has not set the area_type for a "sea_water" standard name (i.e. probably most existing datasets) then the area type will default to "unknown". This is similar to the mechanism that we devised to preserving backwards compatibility for temperature quantities for which old datasets could not differentiate between on-scale and difference measurements (discussed at some length in #125). The principle at stake in the temperature issue were different to those being discussed here (e.g. we are currently talking about "medium" rather than "quantity", units are not a concern here, etc.), but many of the ideas on how to address the ambiguity of standard names in a backwards compatible manner are the same.

New uses of the "sea_water" should still be allowed, with their current definitions, but with the addition of a note that sea water may include all sorts of things, and a suggestion (not a requirement, nor a recommendation) to set the area_type.

Thanks,
David

@JonathanGregory
Copy link
Contributor

Dear Barna and David

Thanks for your comments.

My suggestion depends on our deciding that "inland water body water" and "sea water" are truly the same, except for their geographical location, or area_type. (For "inland water body water", the area_type could be more specific e.g. river or lake.) To make that decision, we would have to

  • Check that all existing sea_water names would make sense and have the same geophysical meaning with inland_water instead.

  • Ensure that any new name we subsequently add with either phrase likewise words with the other phrase just as well.

With those conditions, the answer will be Yes to the question of "Are these values comparable", for any sea_water/inland_water pair of standard names. I think that it is actually the case, but we should check carefully.

The standard names themselves are not the same, because one is an alias. To know that, you have to inspect the standard name table, but that is the same as for any alias. It is already the case that you can't depend on two standard name strings being identical as a test that they identify the same geophysical quantity, because of the existing aliases.

I think that water_body_water would be a far less obvious phrase to use for sea_water in any existing or new "sea water" standard name, and I wouldn't want to propose such a name. Likewise, water_body_water is less self-explanatory than inland_water (though not as obvious as river_water or lake_water would be). However, water_body_water would be useful in any dataset that involves both sea water and inland water, to avoid unintentionally implying a distinction by using different standard names for them in the same dataset.

If my suggestion of aliases as synonyms isn't acceptable, then I think we should add an inland_water standard name as an independent entry corresponding to every sea_water standard name in all cases that are requested now and in future. However, I prefer the synonyms because I think they really are synonymous, and we can save ourselves some work and improve clarity by not making that unnecessary distinction of meaning.

Best wishes

Jonathan

@taylor13
Copy link

taylor13 commented Feb 3, 2025

Like @davidhassell, I really don't think the average user would think that "inland water" and "sea water" are synonymous. Rather, I think most folks think of sea water as being quite salty (as opposed to the "fresh water" found in many inland water bodies), and sea water, as a body of water, is usually thought of as an ocean (or sea connecting to an ocean and at sea level), not a relatively small inland reservoir of water surrounded by considerable land.

My guess is that in the past, standard names including the term "sea water" were rarely used to describe freshwater or any other inland water body (probably because it was the best we had to offer). In the cases where "sea water" standard names were used to describe lake conditions, for gridded datasets, it would be obvious from the location of the data itself that "sea water" had been interpreted to include "lakes". I think, we should therefore not be overly concerned if for future datasets we specifically recommend against lake water conditions being described with standard names that include "sea water".

So, I'm against Jonathan's suggestion to define a new element inland_water as being synonymous with sea_water. Instead I would suggest that we

  • indicate in the "description" of any current standard names that includes "sea_water" that although in the past this standard name could apply to any body of water (including inland lakes), it is now recommended that <the generic "water_body" version of the name be used instead (if it exists). If no generic name exists, one should be proposed if it is needed.
  • for the generic version of "sea water" standard names, we might in some cases have to restructure the name a bit. For example, if an generic version of "sea_water_temperature" were needed, it might be reworded as "water_temperature_of_water body" (or perhaps even "temperature of water body").
  • the generic version of the "sea water" standard names would apply to all water bodies (including oceans), so to include only one type of water body, the user should specify in cell_methods the type of body to include (e.g., "where lake" or "where river" or "where sea"). [By the way, I don't see either "lake" or "river" in the list of area types. I thought they were supposed to have been added a few years ago.]

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Feb 4, 2025

Dear Karl et al.

You suggest that instead of new sea_water names we should instead add a generic name which could apply to both sea water and inland water. Doesn't that mean that you (like me) think there isn't really a distinction to draw between them as regards the substance, and that it's just location or area_type which distinguishes them?

As you know, I think they're the same stuff. People may think sea water is salty, but the Baltic Sea, which is part of the world ocean, is quite fresh, for instance (salinity <10/1000, cf. 35/1000 for the ocean). The Dead Sea is a lake, but ten times saltier than the world ocean (about 340/1000). The Caspian Sea is a very large body of water, but actually a lake. There are fjords in Scotland usually referred to as "sea lochs" e.g. Loch Fyne, indicating they're not clearly differentiated from freshwater lakes e.g. Loch Ness.

We have suggested in that past that sea_water names should be used for inland water bodies, but that's evidently and understandably a barrier to some users adopting CF standard names, so we ought to do something about it. However, sea_water is the obvious phrase for the world ocean and marginal seas. I feel that adopting a generic term instead of sea_water (e.g. water_temperature_of_water_body instead of sea_water_temperature), as you suggest, would make the names less self-explanatory and more obscure in meaning . Perhaps my view is unusual.

Luke @lhmarsden and @StefanHagemann like the idea of synonyms, while you (Karl) @taylor13, Barna @DocOtak and @davidhassell dislike it. Here are two more options to consider:

  • The simplest option: add inland_water standard names as independent entries, whenever requested, with the same form as existing sea_water standard names, and vice-versa. By using inland_water we avoid needing to define distinct standard names for lakes, rivers, reservoirs and other shapes and sizes of inland water body.

  • Devise names that indicate both uses more explicitly than water_body_water does. The simplest option of this kind would be sea_or_inland_water_temperature, I think. Would that work? Could we adopt some abbreviation that would still be clear enough? Unfortunately we shouldn't use /, | or & because all of those characters would be problematic if users have filenames containing standard names (in Linux, at least). Perhaps sea+inland_water_temperature, sea%inland_water_temperature, sea!inland_water_temperature or sea@inland_water_temperature might be sufficiently obvious?

Best wishes

Jonathan

@roy-lowry
Copy link

roy-lowry commented Feb 4, 2025

Dear All,

Been off e-mail for 4 weeks but thought I should reiterate my views on this issue. I adopted the term 'water body' to cover any volume of fresh or salt water on the Earth's surface for the measurement description vocabulary (P01) being set up for BODC usage in the 1990s and subsequently adopted by many other projects in the oceanographic domain such as SeaDataNet. The reason for this was a use case from the NERC Land Ocean Interaction (LOIS) project where a small research vessel sailed a course from the fresh waters of the River Ouse along the Humber Estuary and then out into the open North Sea. The vessel included a system that continuously pumped surface water through a number of instruments and the data were logged every couple of minutes. I needed to provide a single label for each parameter measured without splitting the data and terms like 'Temperature of the water body' provided a solution that worked well.

I tried to sell this idea to CF in the early 2000s, but it after much discussion - I particularly remember debates at a meeting in Seattle in 2008 - it was rejected.

There are three concepts - inland_water, sea_water and water_body that have arisen in the discussion. To me these have the following semantic relationships:

water_body broader than sea_water
water_body broader than inland_water
sea_water different to inland_water (NOT a synonym)
sea_water + inland_water = water_body

Jonathan's latest post has added 'sea_or_inland_water' with various suggestions for syntax (why bother to save three characters per Standard Name?). This I really like because it is an exact synonym for 'water body' as used in P01. Having an exact match eases semantic interoperability between P01 and Standard Names.

The $64,000 dollar question is 'How should change be implemented?'. At the moment, there is a large number of 'sea_water' Standard Names. One approach would be to consider these as being applicable to both sea and inland waters (the thinking behind using 'sea_water' for inland waters mooted in the past) and create a set of 'sea_or_inland_water' Standard Name aliases for the existing 'sea_water' names. 'sea_or_inland_water' would then be used for new names created from now on. Should anybody require Standard Names for inland waters that specifically exclude open sea the this could be accommodated by creating new 'inland_water' Standard Names. However, if people have used the 'sea_water' names in existing data sets with a specific requirement to exclude inland waters then we have a problem that I can't see how to solve. However, in 30 years managing the P01 vocabulary in the oceanographic domain I can never recall anybody objecting to 'water body' because it didn't exclude inland waters so maybe we can get away with it.

Cheers, Roy.

@taylor13
Copy link

taylor13 commented Feb 5, 2025

Hi all,

Both of the above comments are quite helpful I think. First regarding Jonathan's:

I agree we shouldn't attempt to distinguish water as a substance based on salinity.

In the standard names, we want Roy's definition of a 'water_body' to refer to only the liquid found in bodies of liquid water lying atop an underlying solid surface and interfacing with either the atmosphere or layer of ice above; we exclude water in the atmosphere or underground aquifers (I think), I like the suggestion to use the term 'sea_or_inland_water' for these water bodies, and if only bodies of water associated with a certain area type should be considered, that should be specified using a "where directive" in the cell_methods (e.g., "where land" would indicate only inland bodies are considered and "where sea" would indicate only areas of the sea should be considered).

Regarding Roy's last paragraph above: in CMIP, sea_water was meant to refer to only the water that an ocean component of a coupled climate model simulated. Of the 2062 variables requested from models in CMIP6, about 17% included sea_water in their standard name, and only a few of those omitted specifying "where" the variable should be reported; most indicated "where sea", but for river transports into the sea from land "where land" was specified and for certain fluxes into and out of sea ice, "where sea_ice" was specified. And in a few cases there was no "where" specification given, but this will likely be corrected in CMIP7.

So, I'm coming around to the notion that we can interpret "sea_water" in all datasets written prior to version 1.13 as possibly referencing both inland water and sea water. Going forward, I would favor:

  1. for datasets compliant with versions of CF prior to 1.13, "sea_water" appearing in a standard name should by default be interpreted as a water body (found either on land or sea)
  2. for datasets compliant with versions of CF post 1.12,
  • "sea_water" in a standard name refers to water "where sea"
  • "sea_or_inland_water" in a standard name refers to water anywhere. (The old "sea_water" name should no longer be used for this purpose. Instead, a new standard name should be proposed replacing "sea_water" with "sea_or_inland_water".)
  • no standard names should be created with only "inland_sea_water"; if the quantity described by the standard name should be reported only over land areas, then the standard name including the phrase "sea_or_inland_water" should be used and the cell_methods should include "where land".
  • it is recommended that when considering a variable "where sea", a standard name with "sea_water" be used in preference to one with "sea_or_inland_water".

There are two reasons I'm so keen to retire the current interpretation that "sea_water" can refer to water found anywhere on earth:

  1. I would claim that the typical person reading CF variable standard names will assume that "sea_water" roughly refers to "water in the oceans" (and excludes inland bodies of water).
  2. I am attempting to clarify the definition of area_type "sea" here and in that context limit "sea" to bodies of water (roughly at sea level) that interact directly with an ocean. This would exclude lakes and rivers over land and so would only be consistent with the use of "sea_water" in standard names if the standard names also were limited in this way.

@JonathanGregory
Copy link
Contributor

Dear Roy and Karl

I'm glad you think sea_or_inland_water would be acceptable; thanks to @StefanHagemann for suggesting "inland water". I hope that others agree that this would make sense.

I suggested the abbreviations less for reasons of length (as Roy says, it saves only three characters), but more to make it clearer that sea_or_inland is to be understood as a unit. Of the ones I suggested, I think sea+inland_water would look most sensible, since we can't use sea/inland_water.

I agree with these points of Karl's:

  • sea_or_inland_water in a standard name refers to any body of liquid water whose lower boundary is the ground and whose upper boundary is the atmosphere or floating ice. What about meltwater ponds on sea ice and meltwater lakes on ice sheets?

  • no standard names should be created with only inland_water. (Karl wrote inland_sea_water, which should also not be allowed, I think.)

  • The typical person reading CF variable standard names will assume that "sea_water" roughly refers to "water in the oceans" (and excludes inland bodies of water).

However, although I agree with this last point, I think we should be cautious with declaring that for datasets compliant with versions of CF prior to 1.13, sea_water appearing in a standard name should by default be interpreted as sea_or_inland_water. That's most likely what the data-writer meant, but we don't know for sure. It depends what you mean by "default". It would be reasonable, as guidance, to say that this is the most likely interpretation, in the absence of any other information.

Regarding a transition to the new situation, I would award the $64,000 dollars to Roy for his plan, that we should make all 528 existing sea_water names into aliases for sea_or_inland_water names. After that, we should not create any more sea_water standard names. I think this would be the cleanest approach. If in future we need standard names specifically for sea water or inland water, and area_type isn't suitable, we ought to use different phrases to avoid confusion.

Best wishes

Jonathan

@taylor13
Copy link

taylor13 commented Feb 5, 2025

I'll add my 2 cents to the $64,000 and support downgrading sea_water to alias status meaning sea_or_inland_water. If it is an alias, does that mean sea_water versions of standard names become deprecated in favor of sea_or_inland_water status, or may they continue to be used as in the past?

If we expect sea_water to continue to be used in new datasets, I think we should recommend that: "sea_water" should only be used going forward when cell methods indicates the water is constrained to be "where sea" or "where sea_ice" or "where floating_ice_shelf", and the like (i.e. excluding inland water). We should also note that if a dataset has been written for a variable and its standard name includes "sea_water", one cannot assume that this water might not be located inland (unless the "where" directive in a cell_method indicates otherwise).

I think we can clarify in the descriptions that sea_or_inland is meant to refer to all water bodies on earth (including oceans and inland seas, rivers, and lakes). I think we should generally avoid special characters (like +, which often means "and" anyway).

@JonathanGregory
Copy link
Contributor

I believe (as in a previous comment to @DocOtak) that making a standard name into an alias implies that it should not be used for new data. We haven't said that explicitly, but we have said that aliases are defined for backward compatibility. I think we ought to decide this in another issue, and clarify it one way or the other.

If my understanding is correct, that an alias does imply deprecation, and if we would like to suggest that sea_water names continue to be used for new data, then I would reintroduce my suggestion: Define a new attribute for aliases, to indicate those aliases which aren't deprecated, but regarded as equally acceptable to the main entry for new data.

@roy-lowry
Copy link

A quick watchpoint if creating sea_water aliases. Some Standard Names would require more changes than a simple replacement of 'sea_water' by 'sea_or_inland_water'. For example 'sea_or_inland_water_salinity_at_sea_floor' would maybe have to go to something like 'sea_or_inland_water_salinity_at_bed'.

@JonathanGregory
Copy link
Contributor

Thanks, @roy-lowry. That's a good point. We should check carefully. Seas have floors and rivers have beds. What do lakes lie on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frequently asked question This issue or similar has been raised before and it should be considered for inclusion in the FAQ standard name (added by template) Requests and discussions for standard names and other controlled vocabulary
Projects
None yet
Development

No branches or pull requests