fix table/freq ambiguity for multi-frequency variables #252

TonyB9000 · 2024-03-22T19:13:15Z

Description

Closes [Bug]: E2C info mode returns ambiguous results (mode 2). #251
In summary, we have 3 "info modes":

(mode 1) exists largely to save the user typing -f mon, supplying cmor tables, and returns ALL of monthly handlers, but also occasional "day" handlers when a variable has multiple frequencies - and the user must be prepared to deal with it.
(mode 2) that works except where ambiguous formulas for the SAME variable and frequency exist, requiring one apply mode 3.
(mode 3) which involves supplying a native file so that the available variables can be examined. Still possible to have ambiguity if the native data contains multiple "solutions".

We need to eliminate ambiguity up-front. Specifying "freq mon" should not additionally return results for "freq day", and the user should not need to know which modes will fail on which variables. If "model-version" is a distinguishing criterion, it should be an attribute of the handlers.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

tomvothecoder · 2024-03-25T16:03:18Z

Thanks @TonyB9000. I'll try reviewing this at some point in the next week or two.

chengzhuzhang · 2024-03-25T16:29:21Z

I'm glad this change can get the data post-processing working. Since I'm will be ooo next week, I removed myself as a reviewer and will defer to Tom.

tomvothecoder

Here is my initial review. I left some questions and comments. I think @chengzhuzhang will need to verify some of the changes because I'm not sure about them.

tomvothecoder · 2024-03-25T20:51:34Z

e3sm_to_cmip/cmor_handlers/handlers.yaml

- name: rsut
-  units: W m-2
-  raw_variables: [SOLIN, FSNTOA]
-  table: CMIP6_Amon.json
-  unit_conversion: null
-  formula: SOLIN - FSNTOA
-  positive: up
-  levels: null


rsut was added through this recent PR by @chengzhuzhang in Sep/2023.

#210

FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC

Not sure if it should be removed?

@TonyB9000 comment on the reason for this change: #251 (comment)

@tomvothecoder @chengzhuzhang The issue here is that, for the very same table (CMIP6_Amon.json), there are two distinct formulas for computing the variable rsut:

- CMIP6 Name: rsut CMIP6 Table: CMIP6_Amon.json CMIP6 Units: W m-2 E3SM Variables: SOLIN, FSNTOA

and

- CMIP6 Name: rsut CMIP6 Table: CMIP6_Amon.json CMIP6 Units: W m-2 E3SM Variables: FSUTOA

If you need to generate Amon.rsut, which native variables will you select and why?

If model version (v1 vs v2 vs v3) is a criterion, it needs to be an attribute of the handler.

If both formulas are required, we cannot use only the "E3SM variables" to distinguish them, since we rely upon these handlers to know which E3SM variables to seek for a given CMIP6 variable (eg rsut). If we proceed by waiting until the native data is examined to see which of several combinations of variable may be appropriate, then we are effectively reducing "info mode" to mode 3 alone. We cannot have modes that work only for some variables and not others. And if more than one formula could be satisfied, does one toss a coin to decide which to apply?

I used https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/925304036/Amon+variable+conversion+table as my guide in which to keep. It listed only the latter formula.

I somehow missed these comments. @tomvothecoder thank you for catching this. We should not remove these additions in order to support v3 output. Because

FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC

@TonyB9000 I updated the confluence page for Amon to reflect this change for v3

@chengzhuzhang OK, but we have a problem if there is no way to know which formula to select.

e3sm_to_cmip --info mode queries the handlers. If it is asked "What native vars are needed to generate Amon.rsut", and the answer is "Either (FSUTOA) or (SOLIN, FSNTOA), take your pick", then we have a problem. How does e2c know which handler to use? (v2 versus v3 of course, but where is this determined?)

The process of how handlers are selected is documented here:https://github.com/E3SM-Project/e3sm_to_cmip/blob/0f91fb4abc839e2ec04fe8e770ad7308a41c6620/docs/source/var-handlers.rst#how-e3sm_to_cmip-derives-atmosphereland-handler. This is a process to loop over handlers of a same variable is involved.

Yes, I see now. It says:

derives the appropriate variable handlers to use based on the available E3SM variables in the input datasets.

What this means is that we cannot use "info mode 1" or "info mode 2" to select the native variables to employ for a given CMIP6 variable, (when doing "variable at a time") since they provide no "input datasets". We MUST supply a raw dataset as well (mode 3).

ASIDE: What would happen if the native data had variables that satisfied more than one formula (handler)? Seem the outcome would be indeterminate.

I will have to conduct thorough "info mode 3" testing, to see if ambiguous returns still occur.

tomvothecoder · 2024-03-25T20:51:46Z

e3sm_to_cmip/cmor_handlers/handlers.yaml

-  unit_conversion: null
-  formula: SOLIN - FSNTOAC
-  positive: up
-  levels: null


rsutcs was added through this recent PR by @chengzhuzhang in Sep/2023.

#210

FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC

Not sure if it should be removed?

@TonyB9000 comment on the reason for this change: #251 (comment)

No, we should not remove this entry.

@chengzhuzhang Same issue: If there are multiple ways (v2 versus v3) to calculate rsutcs how is the correct handler selected?

I cannot believe the user (batch process) must seek a selection of variable-sets in the native data, just to discover if there is a match. That is just wrong. If the handlers are "model-version specific", they need to have the model-version as a searchable attribute.

tomvothecoder · 2024-03-25T20:53:47Z

e3sm_to_cmip/__main__.py

+                    if self.freq == "mon" and handler['table'] == "CMIP6_day.json":
+                        continue
+                    if ( self.freq == "day" or self.freq == "3hr" ) and handler['table'] == "CMIP6_Amon.json":
+                        continue


From what I understand, this skips trying to print handlers that have CMIP tables that don't match the selected freq? If so, what is the purpose of skipping (is it intentional and/or should we print some message?).

We skip it because of the nature of the search method (look at them all, see which is a match). If you allow a user to specify freq, you should not report the non-matching values. Otherwise, why allow the user to specify freq?

If you wrote "grep", and look for lines in a file that contain "ERROR", you don't print a message regarding the lines that don't contain "ERROR".

tomvothecoder · 2024-03-25T20:54:44Z

e3sm_to_cmip/__main__.py

+        # TONY SAYS: Why does this mode even exist?  And why does it depend upon "freq = mon"?
+        if self.freq == "mon" and not self.input_path and not self.tables_path: # info mode 1


Suggested change

# TONY SAYS: Why does this mode even exist? And why does it depend upon "freq = mon"?

if self.freq == "mon" and not self.input_path and not self.tables_path: # info mode 1

if self.freq == "mon" and not self.input_path and not self.tables_path:

Carried over from original, non-refactored code (with poor documentation, shown below).

e3sm_to_cmip/e3sm_to_cmip/util.py

Lines 288 to 301 in e95a74b

# if the user just asked for the handler info

if freq == "mon" and not inpath and not tables:

for handler in handlers:

msg = {

"CMIP6 Name": handler["name"],

"CMIP6 Table": handler["table"],

"CMIP6 Units": handler["units"],

"E3SM Variables": ", ".join(handler["raw_variables"]),

}

if handler.get("unit_conversion"):

msg["Unit conversion"] = handler["unit_conversion"]

if handler.get("levels"):

msg["Levels"] = handler["levels"]

messages.append(msg)

I'm guessing this is the most basic version of "info" mode using the default freq=mon" that just prints out all of the matching defined handlers in e3sm_to_cmip based on the user-selected variable list. If the user passes a custom freq (e.g., "day" or "3hr"), this statement gets ignored.

e3sm_to_cmip/e3sm_to_cmip/util.py

Lines 159 to 164 in e95a74b

parser.add_argument(

"-f",

"--freq",

help="The frequency of the data (default is monthly). Accepted values are mon, day, 6hrLev, 6hrPlev, 6hrPlevPt, 3hr, 1hr.",

default="mon",

)

Also you can make comments related to PR reviews directly in the file diff under the Files changed tab, rather than in the code.

@tomvothecoder You guess correctly (the most basic version of "info" mode using the default freq = mon). The problem (I believe) is that if your variable-list contains (say) "pr", it returns multiple handlers, including "day". This is an entire "mode" dedicated to the cause of saving the user the great burden of having to type "freq = mon".

# noqa: C901

@tomvothecoder @chengzhuzhang

In summary, we have 3 "info modes":

(mode 1) exists largely to save the user typing -f mon, supplying cmor tables, and returns ALL of monthly handlers, but also occasional "day" handlers when a variable has multiple frequencies - and the user must be prepared to deal with it.

(mode 2) that works except where ambiguous formulas for the SAME variable and frequency exist, requiring one apply mode 3.

(mode 3) which involves supplying a native file so that the available variables can be examined. Still possible to have ambiguity if the native data contains multiple "solutions".

We need to eliminate ambiguity up-front. Specifying "freq mon" should not additionally return results for "freq day", and the user should not need to know which modes will fail on which variables. If "model-version" is a distinguishing criterion, it should be an attribute of the handlers.

SHEESH. By "committing the suggested change" - the entire conversation was eliminated!

@chengzhuzhang @tomvothecoder I have restored the ambiguous definitions for Amon rsut and rsutcs. I will conducts extensive "info mode 3" testing to see if ambiguous results remains per native source data. This means testing the "info mode" for different model versions and experiments, providing a source datafile for each. Not hard to program, but will take several hours to run. This process will need to be repeated when v3 data becomes available.

The script will be:

/p/user_pub/e3sm/bartoletti1/Operations/9_Testing/e3sm_to_cmip/exhaustive_info_mode_3_testing.sh

Thank you for restoring the variables..

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

fix table/freq ambiguity for multi-frequency variables

af271a3

TonyB9000 requested review from chengzhuzhang and tomvothecoder March 22, 2024 19:14

chengzhuzhang removed their request for review March 25, 2024 16:27

tomvothecoder reviewed Mar 25, 2024

View reviewed changes

TonyB9000 and others added 3 commits March 26, 2024 09:35

Update e3sm_to_cmip/__main__.py

ec5f217

Co-authored-by: Tom Vo <tomvothecoder@gmail.com>

Restored ambiguous definitions for Amon rsut and rsutcs

87138b6

Restore handler defs to original ordering for ease of diff

112bedd

TonyB9000 merged commit 3b06eee into master Apr 16, 2024
3 checks passed

TonyB9000 deleted the address_info_mode_ambiguity branch April 16, 2024 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix table/freq ambiguity for multi-frequency variables #252

fix table/freq ambiguity for multi-frequency variables #252

TonyB9000 commented Mar 22, 2024 •

edited by chengzhuzhang

Loading

tomvothecoder commented Mar 25, 2024

chengzhuzhang commented Mar 25, 2024

tomvothecoder left a comment

tomvothecoder Mar 25, 2024

tomvothecoder Mar 25, 2024

TonyB9000 Mar 26, 2024 •

edited

Loading

chengzhuzhang Mar 28, 2024 •

edited

Loading

TonyB9000 Mar 28, 2024

chengzhuzhang Mar 29, 2024

TonyB9000 Mar 29, 2024 •

edited

Loading

tomvothecoder Mar 25, 2024

tomvothecoder Mar 25, 2024

chengzhuzhang Mar 28, 2024

TonyB9000 Mar 28, 2024

tomvothecoder Mar 25, 2024

TonyB9000 Mar 26, 2024

tomvothecoder Mar 25, 2024 •

edited

Loading

tomvothecoder Mar 25, 2024

TonyB9000 Mar 26, 2024

TonyB9000 Mar 26, 2024

TonyB9000 Mar 26, 2024

TonyB9000 Mar 29, 2024

chengzhuzhang Mar 29, 2024

		# TONY SAYS: Why does this mode even exist? And why does it depend upon "freq = mon"?
		if self.freq == "mon" and not self.input_path and not self.tables_path: # info mode 1

	# if the user just asked for the handler info
	if freq == "mon" and not inpath and not tables:
	for handler in handlers:
	msg = {
	"CMIP6 Name": handler["name"],
	"CMIP6 Table": handler["table"],
	"CMIP6 Units": handler["units"],
	"E3SM Variables": ", ".join(handler["raw_variables"]),
	}
	if handler.get("unit_conversion"):
	msg["Unit conversion"] = handler["unit_conversion"]
	if handler.get("levels"):
	msg["Levels"] = handler["levels"]
	messages.append(msg)

	parser.add_argument(
	"-f",
	"--freq",
	help="The frequency of the data (default is monthly). Accepted values are mon, day, 6hrLev, 6hrPlev, 6hrPlevPt, 3hr, 1hr.",
	default="mon",
	)

fix table/freq ambiguity for multi-frequency variables #252

fix table/freq ambiguity for multi-frequency variables #252

Conversation

TonyB9000 commented Mar 22, 2024 • edited by chengzhuzhang Loading

Description

Checklist

tomvothecoder commented Mar 25, 2024

chengzhuzhang commented Mar 25, 2024

tomvothecoder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TonyB9000 Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

chengzhuzhang Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TonyB9000 Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomvothecoder Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TonyB9000 commented Mar 22, 2024 •

edited by chengzhuzhang

Loading

TonyB9000 Mar 26, 2024 •

edited

Loading

chengzhuzhang Mar 28, 2024 •

edited

Loading

TonyB9000 Mar 29, 2024 •

edited

Loading

tomvothecoder Mar 25, 2024 •

edited

Loading