-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix table/freq ambiguity for multi-frequency variables #252
Conversation
Thanks @TonyB9000. I'll try reviewing this at some point in the next week or two. |
I'm glad this change can get the data post-processing working. Since I'm will be ooo next week, I removed myself as a reviewer and will defer to Tom. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my initial review. I left some questions and comments. I think @chengzhuzhang will need to verify some of the changes because I'm not sure about them.
- name: rsut | ||
units: W m-2 | ||
raw_variables: [SOLIN, FSNTOA] | ||
table: CMIP6_Amon.json | ||
unit_conversion: null | ||
formula: SOLIN - FSNTOA | ||
positive: up | ||
levels: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rsut
was added through this recent PR by @chengzhuzhang in Sep/2023.
FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC
Not sure if it should be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 comment on the reason for this change: #251 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder @chengzhuzhang The issue here is that, for the very same table (CMIP6_Amon.json), there are two distinct formulas for computing the variable rsut
:
- CMIP6 Name: rsut
CMIP6 Table: CMIP6_Amon.json
CMIP6 Units: W m-2
E3SM Variables: SOLIN, FSNTOA
and
- CMIP6 Name: rsut
CMIP6 Table: CMIP6_Amon.json
CMIP6 Units: W m-2
E3SM Variables: FSUTOA
If you need to generate Amon.rsut, which native variables will you select and why?
If model version (v1 vs v2 vs v3) is a criterion, it needs to be an attribute of the handler.
If both formulas are required, we cannot use only the "E3SM variables" to distinguish them, since we rely upon these handlers to know which E3SM variables to seek for a given CMIP6 variable (eg rsut). If we proceed by waiting until the native data is examined to see which of several combinations of variable may be appropriate, then we are effectively reducing "info mode" to mode 3 alone. We cannot have modes that work only for some variables and not others. And if more than one formula could be satisfied, does one toss a coin to decide which to apply?
I used https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/925304036/Amon+variable+conversion+table as my guide in which to keep. It listed only the latter formula.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow missed these comments. @tomvothecoder thank you for catching this. We should not remove these additions in order to support v3 output. Because
FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC
@TonyB9000 I updated the confluence page for Amon to reflect this change for v3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang OK, but we have a problem if there is no way to know which formula to select.
e3sm_to_cmip --info mode
queries the handlers. If it is asked "What native vars are needed to generate Amon.rsut", and the answer is "Either (FSUTOA) or (SOLIN, FSNTOA), take your pick", then we have a problem. How does e2c know which handler to use? (v2 versus v3 of course, but where is this determined?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process of how handlers are selected is documented here:https://github.com/E3SM-Project/e3sm_to_cmip/blob/0f91fb4abc839e2ec04fe8e770ad7308a41c6620/docs/source/var-handlers.rst#how-e3sm_to_cmip-derives-atmosphereland-handler. This is a process to loop over handlers of a same variable is involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see now. It says:
derives the appropriate variable handlers to use based on the available E3SM variables in the input datasets.
What this means is that we cannot use "info mode 1" or "info mode 2" to select the native variables to employ for a given CMIP6 variable, (when doing "variable at a time") since they provide no "input datasets". We MUST supply a raw dataset as well (mode 3).
ASIDE: What would happen if the native data had variables that satisfied more than one formula (handler)? Seem the outcome would be indeterminate.
I will have to conduct thorough "info mode 3" testing, to see if ambiguous returns still occur.
unit_conversion: null | ||
formula: SOLIN - FSNTOAC | ||
positive: up | ||
levels: null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rsutcs
was added through this recent PR by @chengzhuzhang in Sep/2023.
FSUTOA and FSUTOAC are used in e3sm_to_cmip for directly getting rsut and rsutcs. For v3 output, both FSUTOA and FSUTOAC are removed. This PR is to update formula to use available variables, i.e., rsut = SOLIN - FSNTOA, and rsutcs = SOLIN - FSNTOAC
Not sure if it should be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TonyB9000 comment on the reason for this change: #251 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should not remove this entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang Same issue: If there are multiple ways (v2 versus v3) to calculate rsutcs
how is the correct handler selected?
I cannot believe the user (batch process) must seek a selection of variable-sets in the native data, just to discover if there is a match. That is just wrong. If the handlers are "model-version specific", they need to have the model-version as a searchable attribute.
if self.freq == "mon" and handler['table'] == "CMIP6_day.json": | ||
continue | ||
if ( self.freq == "day" or self.freq == "3hr" ) and handler['table'] == "CMIP6_Amon.json": | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, this skips trying to print handlers that have CMIP tables that don't match the selected freq
? If so, what is the purpose of skipping (is it intentional and/or should we print some message?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We skip it because of the nature of the search method (look at them all, see which is a match). If you allow a user to specify freq
, you should not report the non-matching values. Otherwise, why allow the user to specify freq
?
If you wrote "grep", and look for lines in a file that contain "ERROR", you don't print a message regarding the lines that don't contain "ERROR".
e3sm_to_cmip/__main__.py
Outdated
# TONY SAYS: Why does this mode even exist? And why does it depend upon "freq = mon"? | ||
if self.freq == "mon" and not self.input_path and not self.tables_path: # info mode 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TONY SAYS: Why does this mode even exist? And why does it depend upon "freq = mon"? | |
if self.freq == "mon" and not self.input_path and not self.tables_path: # info mode 1 | |
if self.freq == "mon" and not self.input_path and not self.tables_path: |
Carried over from original, non-refactored code (with poor documentation, shown below).
e3sm_to_cmip/e3sm_to_cmip/util.py
Lines 288 to 301 in e95a74b
# if the user just asked for the handler info | |
if freq == "mon" and not inpath and not tables: | |
for handler in handlers: | |
msg = { | |
"CMIP6 Name": handler["name"], | |
"CMIP6 Table": handler["table"], | |
"CMIP6 Units": handler["units"], | |
"E3SM Variables": ", ".join(handler["raw_variables"]), | |
} | |
if handler.get("unit_conversion"): | |
msg["Unit conversion"] = handler["unit_conversion"] | |
if handler.get("levels"): | |
msg["Levels"] = handler["levels"] | |
messages.append(msg) |
I'm guessing this is the most basic version of "info" mode using the default freq=mon"
that just prints out all of the matching defined handlers in e3sm_to_cmip
based on the user-selected variable list. If the user passes a custom freq
(e.g., "day" or "3hr"), this statement gets ignored.
e3sm_to_cmip/e3sm_to_cmip/util.py
Lines 159 to 164 in e95a74b
parser.add_argument( | |
"-f", | |
"--freq", | |
help="The frequency of the data (default is monthly). Accepted values are mon, day, 6hrLev, 6hrPlev, 6hrPlevPt, 3hr, 1hr.", | |
default="mon", | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you can make comments related to PR reviews directly in the file diff under the Files changed tab, rather than in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomvothecoder You guess correctly (the most basic version of "info" mode using the default freq = mon
). The problem (I believe) is that if your variable-list contains (say) "pr", it returns multiple handlers, including "day". This is an entire "mode" dedicated to the cause of saving the user the great burden of having to type "freq = mon".
# noqa: C901
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In summary, we have 3 "info modes":
- (mode 1) exists largely to save the user typing
-f mon
, supplying cmor tables, and returns ALL of monthly handlers, but also occasional "day" handlers when a variable has multiple frequencies - and the user must be prepared to deal with it. - (mode 2) that works except where ambiguous formulas for the SAME variable and frequency exist, requiring one apply mode 3.
- (mode 3) which involves supplying a native file so that the available variables can be examined. Still possible to have ambiguity if the native data contains multiple "solutions".
We need to eliminate ambiguity up-front. Specifying "freq mon" should not additionally return results for "freq day", and the user should not need to know which modes will fail on which variables. If "model-version" is a distinguishing criterion, it should be an attribute of the handlers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHEESH. By "committing the suggested change" - the entire conversation was eliminated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang @tomvothecoder I have restored the ambiguous definitions for Amon rsut and rsutcs. I will conducts extensive "info mode 3" testing to see if ambiguous results remains per native source data. This means testing the "info mode" for different model versions and experiments, providing a source datafile for each. Not hard to program, but will take several hours to run. This process will need to be repeated when v3 data becomes available.
The script will be:
/p/user_pub/e3sm/bartoletti1/Operations/9_Testing/e3sm_to_cmip/exhaustive_info_mode_3_testing.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for restoring the variables..
Co-authored-by: Tom Vo <tomvothecoder@gmail.com>
Description
Closes [Bug]: E2C info mode returns ambiguous results (mode 2). #251
In summary, we have 3 "info modes":
(mode 1) exists largely to save the user typing -f mon, supplying cmor tables, and returns ALL of monthly handlers, but also occasional "day" handlers when a variable has multiple frequencies - and the user must be prepared to deal with it.
(mode 2) that works except where ambiguous formulas for the SAME variable and frequency exist, requiring one apply mode 3.
(mode 3) which involves supplying a native file so that the available variables can be examined. Still possible to have ambiguity if the native data contains multiple "solutions".
We need to eliminate ambiguity up-front. Specifying "freq mon" should not additionally return results for "freq day", and the user should not need to know which modes will fail on which variables. If "model-version" is a distinguishing criterion, it should be an attribute of the handlers.
Checklist