Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix table/freq ambiguity for multi-frequency variables #252

Merged
merged 4 commits into from
Apr 16, 2024

Conversation

TonyB9000
Copy link
Contributor

@TonyB9000 TonyB9000 commented Mar 22, 2024

Description

  • Closes [Bug]: E2C info mode returns ambiguous results (mode 2). #251
    In summary, we have 3 "info modes":

    (mode 1) exists largely to save the user typing -f mon, supplying cmor tables, and returns ALL of monthly handlers, but also occasional "day" handlers when a variable has multiple frequencies - and the user must be prepared to deal with it.
    (mode 2) that works except where ambiguous formulas for the SAME variable and frequency exist, requiring one apply mode 3.
    (mode 3) which involves supplying a native file so that the available variables can be examined. Still possible to have ambiguity if the native data contains multiple "solutions".

We need to eliminate ambiguity up-front. Specifying "freq mon" should not additionally return results for "freq day", and the user should not need to know which modes will fail on which variables. If "model-version" is a distinguishing criterion, it should be an attribute of the handlers.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

@tomvothecoder
Copy link
Collaborator

Thanks @TonyB9000. I'll try reviewing this at some point in the next week or two.

@chengzhuzhang chengzhuzhang removed their request for review March 25, 2024 16:27
@chengzhuzhang
Copy link
Collaborator

I'm glad this change can get the data post-processing working. Since I'm will be ooo next week, I removed myself as a reviewer and will defer to Tom.

Copy link
Collaborator

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my initial review. I left some questions and comments. I think @chengzhuzhang will need to verify some of the changes because I'm not sure about them.

Comment on lines 657 to 664
- name: rsut
units: W m-2
raw_variables: [SOLIN, FSNTOA]
table: CMIP6_Amon.json
unit_conversion: null
formula: SOLIN - FSNTOA
positive: up
levels: null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rsut was added through this recent PR by @chengzhuzhang in Sep/2023.

#210

FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC

Not sure if it should be removed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TonyB9000 comment on the reason for this change: #251 (comment)

Copy link
Contributor Author

@TonyB9000 TonyB9000 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder @chengzhuzhang The issue here is that, for the very same table (CMIP6_Amon.json), there are two distinct formulas for computing the variable rsut:

- CMIP6 Name: rsut
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: SOLIN, FSNTOA

and

- CMIP6 Name: rsut
  CMIP6 Table: CMIP6_Amon.json
  CMIP6 Units: W m-2
  E3SM Variables: FSUTOA		

If you need to generate Amon.rsut, which native variables will you select and why?

If model version (v1 vs v2 vs v3) is a criterion, it needs to be an attribute of the handler.

If both formulas are required, we cannot use only the "E3SM variables" to distinguish them, since we rely upon these handlers to know which E3SM variables to seek for a given CMIP6 variable (eg rsut). If we proceed by waiting until the native data is examined to see which of several combinations of variable may be appropriate, then we are effectively reducing "info mode" to mode 3 alone. We cannot have modes that work only for some variables and not others. And if more than one formula could be satisfied, does one toss a coin to decide which to apply?

I used https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/925304036/Amon+variable+conversion+table as my guide in which to keep. It listed only the latter formula.

Copy link
Collaborator

@chengzhuzhang chengzhuzhang Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somehow missed these comments. @tomvothecoder thank you for catching this. We should not remove these additions in order to support v3 output. Because

FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC

@TonyB9000 I updated the confluence page for Amon to reflect this change for v3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang OK, but we have a problem if there is no way to know which formula to select.

e3sm_to_cmip --info mode queries the handlers. If it is asked "What native vars are needed to generate Amon.rsut", and the answer is "Either (FSUTOA) or (SOLIN, FSNTOA), take your pick", then we have a problem. How does e2c know which handler to use? (v2 versus v3 of course, but where is this determined?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process of how handlers are selected is documented here:https://github.com/E3SM-Project/e3sm_to_cmip/blob/0f91fb4abc839e2ec04fe8e770ad7308a41c6620/docs/source/var-handlers.rst#how-e3sm_to_cmip-derives-atmosphereland-handler. This is a process to loop over handlers of a same variable is involved.

Copy link
Contributor Author

@TonyB9000 TonyB9000 Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see now. It says:

derives the appropriate variable handlers to use based on the available E3SM variables in the input datasets.

What this means is that we cannot use "info mode 1" or "info mode 2" to select the native variables to employ for a given CMIP6 variable, (when doing "variable at a time") since they provide no "input datasets". We MUST supply a raw dataset as well (mode 3).

ASIDE: What would happen if the native data had variables that satisfied more than one formula (handler)? Seem the outcome would be indeterminate.

I will have to conduct thorough "info mode 3" testing, to see if ambiguous returns still occur.

unit_conversion: null
formula: SOLIN - FSNTOAC
positive: up
levels: null
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rsutcs was added through this recent PR by @chengzhuzhang in Sep/2023.

#210

FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC

Not sure if it should be removed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TonyB9000 comment on the reason for this change: #251 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should not remove this entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang Same issue: If there are multiple ways (v2 versus v3) to calculate rsutcs how is the correct handler selected?

I cannot believe the user (batch process) must seek a selection of variable-sets in the native data, just to discover if there is a match. That is just wrong. If the handlers are "model-version specific", they need to have the model-version as a searchable attribute.

Comment on lines +670 to +673
if self.freq == "mon" and handler['table'] == "CMIP6_day.json":
continue
if ( self.freq == "day" or self.freq == "3hr" ) and handler['table'] == "CMIP6_Amon.json":
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, this skips trying to print handlers that have CMIP tables that don't match the selected freq? If so, what is the purpose of skipping (is it intentional and/or should we print some message?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We skip it because of the nature of the search method (look at them all, see which is a match). If you allow a user to specify freq, you should not report the non-matching values. Otherwise, why allow the user to specify freq?

If you wrote "grep", and look for lines in a file that contain "ERROR", you don't print a message regarding the lines that don't contain "ERROR".

Comment on lines 654 to 655
# TONY SAYS: Why does this mode even exist? And why does it depend upon "freq = mon"?
if self.freq == "mon" and not self.input_path and not self.tables_path: # info mode 1
Copy link
Collaborator

@tomvothecoder tomvothecoder Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TONY SAYS: Why does this mode even exist? And why does it depend upon "freq = mon"?
if self.freq == "mon" and not self.input_path and not self.tables_path: # info mode 1
if self.freq == "mon" and not self.input_path and not self.tables_path:

Carried over from original, non-refactored code (with poor documentation, shown below).

# if the user just asked for the handler info
if freq == "mon" and not inpath and not tables:
for handler in handlers:
msg = {
"CMIP6 Name": handler["name"],
"CMIP6 Table": handler["table"],
"CMIP6 Units": handler["units"],
"E3SM Variables": ", ".join(handler["raw_variables"]),
}
if handler.get("unit_conversion"):
msg["Unit conversion"] = handler["unit_conversion"]
if handler.get("levels"):
msg["Levels"] = handler["levels"]
messages.append(msg)

I'm guessing this is the most basic version of "info" mode using the default freq=mon" that just prints out all of the matching defined handlers in e3sm_to_cmip based on the user-selected variable list. If the user passes a custom freq (e.g., "day" or "3hr"), this statement gets ignored.

parser.add_argument(
"-f",
"--freq",
help="The frequency of the data (default is monthly). Accepted values are mon, day, 6hrLev, 6hrPlev, 6hrPlevPt, 3hr, 1hr.",
default="mon",
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you can make comments related to PR reviews directly in the file diff under the Files changed tab, rather than in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder You guess correctly (the most basic version of "info" mode using the default freq = mon). The problem (I believe) is that if your variable-list contains (say) "pr", it returns multiple handlers, including "day". This is an entire "mode" dedicated to the cause of saving the user the great burden of having to type "freq = mon".

# noqa: C901

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder @chengzhuzhang

In summary, we have 3 "info modes":

  • (mode 1) exists largely to save the user typing -f mon, supplying cmor tables, and returns ALL of monthly handlers, but also occasional "day" handlers when a variable has multiple frequencies - and the user must be prepared to deal with it.
  • (mode 2) that works except where ambiguous formulas for the SAME variable and frequency exist, requiring one apply mode 3.
  • (mode 3) which involves supplying a native file so that the available variables can be examined. Still possible to have ambiguity if the native data contains multiple "solutions".

We need to eliminate ambiguity up-front. Specifying "freq mon" should not additionally return results for "freq day", and the user should not need to know which modes will fail on which variables. If "model-version" is a distinguishing criterion, it should be an attribute of the handlers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHEESH. By "committing the suggested change" - the entire conversation was eliminated!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chengzhuzhang @tomvothecoder I have restored the ambiguous definitions for Amon rsut and rsutcs. I will conducts extensive "info mode 3" testing to see if ambiguous results remains per native source data. This means testing the "info mode" for different model versions and experiments, providing a source datafile for each. Not hard to program, but will take several hours to run. This process will need to be repeated when v3 data becomes available.

The script will be:

/p/user_pub/e3sm/bartoletti1/Operations/9_Testing/e3sm_to_cmip/exhaustive_info_mode_3_testing.sh

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for restoring the variables..

@TonyB9000 TonyB9000 merged commit 3b06eee into master Apr 16, 2024
3 checks passed
@TonyB9000 TonyB9000 deleted the address_info_mode_ambiguity branch April 16, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: E2C info mode returns ambiguous results (mode 2).
3 participants