Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug in calculating accumulated fields (24/240 hours averaged) when using a smaller timestep #1789

Closed
Duseong opened this issue Jun 29, 2022 · 18 comments · Fixed by #1802
Closed
Assignees
Labels
bug something is working incorrectly science Enhancement to or bug impacting science

Comments

@Duseong
Copy link

Duseong commented Jun 29, 2022

Brief summary of bug

It looks like there's a bug in calculating accumulated fields (e.g., TV24, TV240, PAR24_sun, PAR240_sun, etc.) when using a smaller timestep.

General bug information

CTSM version you are using:
ctsm5.1.dev019 (cam6_3_018)

Does this bug cause significantly incorrect results in the model's science?
Yes

Configurations affected:
Regional refinement simulations or any configurations that use a much smaller timestep than the global 1-degree model.

Details of bug

Please see the attached file to see the time series of variables (TV, TV24, TV240, PAR_sun, PAR24_sun, PAR240_sun) at Manaus point over Amazon. Black dots are from the ne30 simulation, and red dots are from the same ne30 simulation but with a different timestep (3.75 minutes -> ATM_NCPL = LND_NCPL of 384).
There are unexpected big fluctuations in 24/240 hours averaged fields in the smaller timestep run, so I guess there is a bug when calculating those fields.
Since regional refinement simulations use smaller timestep, results from those models are heavily affected by these fields. I found that global biogenic emissions, OH, and other chemical fields like CO were changed substantially. It may also affect other atmospheric fields.
Accumulated_fields_time_series.pdf

Important details of your setup / configuration so we can reproduce the bug

I changed ATM_NCPL (=LND_NCPL) from 48 to 384 to see the effects of the timestep on those fields. The spatial resolution was the same for both (ne30).

@billsacks billsacks added bug something is working incorrectly tag: bug - impacts science next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Jun 29, 2022
@billsacks
Copy link
Member

billsacks commented Jun 29, 2022

Thank you for opening this issue. I agree that this appears to be a significant bug!

From looking through the code, I think I see what's happening here: the PERIOD of accumulated fields is read from the initial conditions file, but I don't think it needs to be, and doing so is problematic if the run you're doing uses a different time step than the time step used in creating the initial conditions file originally.

Can you try redoing your test after rebuilding the code with this block of code deleted:

if (accum(nf)%old_name /= "") then
varname = trim(accum(nf)%name) // '_PERIOD:' // trim(accum(nf)%old_name) // '_PERIOD'
else
varname = trim(accum(nf)%name) // '_PERIOD'
end if
call restartvar(ncid=ncid, flag=flag, varname=varname, xtype=ncd_int, &
long_name='', units='time steps', &
imissing_value=ispval, ifill_value=huge(1), &
interpinic_flag='copy', &
data=accum(nf)%period, readvar=readvar)

If that gives other problems, then a simpler but safer experiment would be to redo your test after setting ./xmlchange CLM_FORCE_COLDSTART=on so that the model doesn't use a restart file at all. (That won't be good for science, but if my hypothesis is right, then it should get around this issue. I'd like to see if that's true.)

@billsacks
Copy link
Member

(Note to self and @samsrabin : usually my experience is that it's a bad idea to get side-tracked going down tangential rabbit holes, but apparently this one #1684 (comment) was worth following up on....)

@Duseong
Copy link
Author

Duseong commented Jun 29, 2022

Thank you for opening this issue. I agree that this appears to be a significant bug!

From looking through the code, I think I see what's happening here: the PERIOD of accumulated fields is read from the initial conditions file, but I don't think it needs to be, and doing so is problematic if the run you're doing uses a different time step than the time step used in creating the initial conditions file originally.

Can you try redoing your test after rebuilding the code with this block of code deleted:

if (accum(nf)%old_name /= "") then
varname = trim(accum(nf)%name) // '_PERIOD:' // trim(accum(nf)%old_name) // '_PERIOD'
else
varname = trim(accum(nf)%name) // '_PERIOD'
end if
call restartvar(ncid=ncid, flag=flag, varname=varname, xtype=ncd_int, &
long_name='', units='time steps', &
imissing_value=ispval, ifill_value=huge(1), &
interpinic_flag='copy', &
data=accum(nf)%period, readvar=readvar)

If that gives other problems, then a simpler but safer experiment would be to redo your test after setting ./xmlchange CLM_FORCE_COLDSTART=on so that the model doesn't use a restart file at all. (That won't be good for science, but if my hypothesis is right, then it should get around this issue. I'd like to see if that's true.)

Thanks a lot for the quick response! I will give it a spin and will let you know if it's corrected!

@Duseong
Copy link
Author

Duseong commented Jun 30, 2022

Thanks again for your comment!
Accumulated_fields_time_series_bugfix.pdf

Please see the attached file, which now includes another simulation (blue dots) with your suggestion for the bug fix.

I think it solved the problem, now accumulated fields get much closer to the base simulation.

It's only for 5-day simulation results, so I will do a longer run for the final confirmation, but I don't think that a longer run will show different results.

@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jun 30, 2022
@billsacks
Copy link
Member

Thanks a lot for trying that out and posting the results! I'm not too surprised to see that it takes some time for the accumulation fields to adjust to the new time step. It looks to me like after this initial period they appear to be doing the right thing. I'm planning to move ahead with this fix, but please do let us know if you see other problems.

@billsacks billsacks self-assigned this Jun 30, 2022
@adamrher
Copy link
Contributor

adamrher commented Jul 5, 2022

@Duseong says

Since regional refinement simulations use smaller timestep, results from those models are heavily affected by these fields. I found that global biogenic emissions, OH, and other chemical fields like CO were changed substantially. It may also affect other atmospheric fields.

I just want to verify that this issue is purely a problem w/ diagnostic output, and not something that impacts other fields as indicated by this comment above.

@lkemmons
Copy link

lkemmons commented Jul 5, 2022 via email

@adamrher
Copy link
Contributor

adamrher commented Jul 5, 2022

oy vey. OK thanks.

@Duseong
Copy link
Author

Duseong commented Jul 5, 2022

It's only for 5-day simulation results, so I will do a longer run for the final confirmation, but I don't think that a longer run will show different results.

And Louisa and I conducted several months simulations, and again we confirmed that the method by Bill solved the problem.

@lkemmons
Copy link

lkemmons commented Jul 6, 2022

The accumulMod routines are used in many other parts of CLM as well. We have not investigated the impact on other CLM fields.

@adamrher
Copy link
Contributor

adamrher commented Jul 8, 2022

I'd be interested to know whether our standard CAM6 low-top configuration (not CAM-CHEM) is impacted by this bug. What species are considered MEGAN emissions? Does it refer to aerosol species or chemical species?

In the CAM configuration I'm referring to, these chemical species are dynamically active (i.e., dycore tracers):

DMS, H2O2, H2SO4, SO2, SOAG

... and the aerosol species:

bc_aX, dst_aX, ncl_aX, num_aX, pom_aX, so4_aX, soa_aX

@lkemmons
Copy link

lkemmons commented Jul 8, 2022

CAM alone does not use MEGAN. However these accumulated averages are used throughout CLM for a variety of parameters. I have not yet had a chance to look at the impact on physical parameters in CLM, nor the impact on T, Q, etc. in CAM.

@billsacks
Copy link
Member

@adamrher - just to echo what @lkemmons said, I would (unfortunately) expect this bug to have some impact on any simulation with a non-30-minute time step, but I don't have a sense of how large the impact will be.

@dlawrenncar
Copy link
Contributor

dlawrenncar commented Jul 11, 2022 via email

billsacks added a commit to billsacks/ctsm that referenced this issue Jul 11, 2022
This doesn't need to be on the restart file, and having it read from the
restart file causes wrong behavior when changing the model time step.

See ESCOMP#1789 for details

Resolves ESCOMP#1789
@billsacks
Copy link
Member

I have done some more investigation of this. Based on a quick search through the code (i.e., I may have missed something), it looks like, in an SP (i.e., non-BGC) case, the only accumulation field that impacts aspects of the model other than VOC emissions is T10 - the 10-day running mean air temperature - which appears in a few places in the photosynthesis calculations. In a run whose time step isn't too different from 30 minutes – e.g., a 15-minute time step – I wouldn't expect this bug to make much difference in an SP case. But in a run with a much shorter time step, the differences would become larger. I don't have a feeling for how important this T10 variable is in the photosynthesis calculation, but I do see that it has some impact on the evolution of the model.

More accumulation fields come into play in BGC simulations, e.g., impacting various phenology calculations.

billsacks added a commit that referenced this issue Jul 13, 2022
Fix accumulation variables when changing model time step

Accumulation variables (e.g., 1-day or 10-day averages) were writing and
reading their accumulation period (expressed in time steps) to the
restart file. This caused incorrect behavior when changing the model
time step relative to what was used to create the initial conditions
file (typically a 30-minute time step). So, for example, if you are
using a 15-minute time step with an initial conditions file that
originated from a run with a 30-minute time step (at some point in its
history), then an average that was supposed to be 10-day instead becomes
5-day; an average that was supposed to be 1-day becomes 12-hour, etc.
(The issue is that the number of time steps in the averaging period was
staying fixed rather than the actual amount of time staying fixed.)

For our out-of-the-box initial conditions files, this only impacts runs
that use something other than a 30-minute time step. Typically this
situation arises in configurations with an active atmospheric model that
is running at higher resolution than approximately 1 degree. It appears
that the biggest impacts are on VOC emissions and in BGC runs; we expect
the impact to be small (but still non-zero) in prescribed phenology (SP)
runs that don't use VOC emissions.

This tag fixes this issue by no longer writing or reading accumulation
variables' PERIOD to / from the restart file: this isn't actually needed
on the restart file.

See some discussion in #1789 for
more details, and see
#1802 (comment) for
some discussion of outstanding weirdness that can result for
accumulation variables when changing the model time step. The summary of
that comment is: There could be some weirdness at the start of a run,
but at least for a startup or hybrid run, that weirdness should work
itself out within about the first averaging period. A branch or restart
run could have some longer-term potential weirdness, so for now I think
we should recommend that people NOT change the time step on a branch or
restart run. With (significant?) additional work, we could probably
avoid this additional weirdness, but my feeling is that it isn't worth
the effort right now. In any case, I feel like my proposed fix will
bring things much closer to being correct than they currently are when
changing the time step.

Resolves #1789 (A bug in calculating accumulated fields
(24/240 hours averaged) when using a smaller timestep)
slevis-lmwg added a commit to slevis-lmwg/ctsm that referenced this issue Jul 14, 2022
Fix accumulation variables when changing model time step

Accumulation variables (e.g., 1-day or 10-day averages) were writing and
reading their accumulation period (expressed in time steps) to the
restart file. This caused incorrect behavior when changing the model
time step relative to what was used to create the initial conditions
file (typically a 30-minute time step). So, for example, if you are
using a 15-minute time step with an initial conditions file that
originated from a run with a 30-minute time step (at some point in its
history), then an average that was supposed to be 10-day instead becomes
5-day; an average that was supposed to be 1-day becomes 12-hour, etc.
(The issue is that the number of time steps in the averaging period was
staying fixed rather than the actual amount of time staying fixed.)

For our out-of-the-box initial conditions files, this only impacts runs
that use something other than a 30-minute time step. Typically this
situation arises in configurations with an active atmospheric model that
is running at higher resolution than approximately 1 degree. It appears
that the biggest impacts are on VOC emissions and in BGC runs; we expect
the impact to be small (but still non-zero) in prescribed phenology (SP)
runs that don't use VOC emissions.

This tag fixes this issue by no longer writing or reading accumulation
variables' PERIOD to / from the restart file: this isn't actually needed
on the restart file.

See some discussion in ESCOMP#1789 for
more details, and see
ESCOMP#1802 (comment) for
some discussion of outstanding weirdness that can result for
accumulation variables when changing the model time step. The summary of
that comment is: There could be some weirdness at the start of a run,
but at least for a startup or hybrid run, that weirdness should work
itself out within about the first averaging period. A branch or restart
run could have some longer-term potential weirdness, so for now I think
we should recommend that people NOT change the time step on a branch or
restart run. With (significant?) additional work, we could probably
avoid this additional weirdness, but my feeling is that it isn't worth
the effort right now. In any case, I feel like my proposed fix will
bring things much closer to being correct than they currently are when
changing the time step.

Resolved conflicts:
doc/ChangeLog
doc/ChangeSum
python/conda_env_ctsm_py.txt
python/ctsm/modify_fsurdat/fsurdat_modifier.py
python/ctsm/modify_fsurdat/modify_fsurdat.py
slevis-lmwg added a commit to slevis-lmwg/ctsm that referenced this issue Jul 14, 2022
Fix accumulation variables when changing model time step

Accumulation variables (e.g., 1-day or 10-day averages) were writing and
reading their accumulation period (expressed in time steps) to the
restart file. This caused incorrect behavior when changing the model
time step relative to what was used to create the initial conditions
file (typically a 30-minute time step). So, for example, if you are
using a 15-minute time step with an initial conditions file that
originated from a run with a 30-minute time step (at some point in its
history), then an average that was supposed to be 10-day instead becomes
5-day; an average that was supposed to be 1-day becomes 12-hour, etc.
(The issue is that the number of time steps in the averaging period was
staying fixed rather than the actual amount of time staying fixed.)

For our out-of-the-box initial conditions files, this only impacts runs
that use something other than a 30-minute time step. Typically this
situation arises in configurations with an active atmospheric model that
is running at higher resolution than approximately 1 degree. It appears
that the biggest impacts are on VOC emissions and in BGC runs; we expect
the impact to be small (but still non-zero) in prescribed phenology (SP)
runs that don't use VOC emissions.

This tag fixes this issue by no longer writing or reading accumulation
variables' PERIOD to / from the restart file: this isn't actually needed
on the restart file.

See some discussion in ESCOMP#1789 for
more details, and see
ESCOMP#1802 (comment) for
some discussion of outstanding weirdness that can result for
accumulation variables when changing the model time step. The summary of
that comment is: There could be some weirdness at the start of a run,
but at least for a startup or hybrid run, that weirdness should work
itself out within about the first averaging period. A branch or restart
run could have some longer-term potential weirdness, so for now I think
we should recommend that people NOT change the time step on a branch or
restart run. With (significant?) additional work, we could probably
avoid this additional weirdness, but my feeling is that it isn't worth
the effort right now. In any case, I feel like my proposed fix will
bring things much closer to being correct than they currently are when
changing the time step.

Resolved conflict:
python/ctsm/modify_fsurdat/fsurdat_modifier.py
ekluzek added a commit to ekluzek/CTSM that referenced this issue Jul 14, 2022
Fix accumulation variables when changing model time step

Accumulation variables (e.g., 1-day or 10-day averages) were writing and
reading their accumulation period (expressed in time steps) to the
restart file. This caused incorrect behavior when changing the model
time step relative to what was used to create the initial conditions
file (typically a 30-minute time step). So, for example, if you are
using a 15-minute time step with an initial conditions file that
originated from a run with a 30-minute time step (at some point in its
history), then an average that was supposed to be 10-day instead becomes
5-day; an average that was supposed to be 1-day becomes 12-hour, etc.
(The issue is that the number of time steps in the averaging period was
staying fixed rather than the actual amount of time staying fixed.)

For our out-of-the-box initial conditions files, this only impacts runs
that use something other than a 30-minute time step. Typically this
situation arises in configurations with an active atmospheric model that
is running at higher resolution than approximately 1 degree. It appears
that the biggest impacts are on VOC emissions and in BGC runs; we expect
the impact to be small (but still non-zero) in prescribed phenology (SP)
runs that don't use VOC emissions.

This tag fixes this issue by no longer writing or reading accumulation
variables' PERIOD to / from the restart file: this isn't actually needed
on the restart file.

See some discussion in ESCOMP#1789 for
more details, and see
ESCOMP#1802 (comment) for
some discussion of outstanding weirdness that can result for
accumulation variables when changing the model time step. The summary of
that comment is: There could be some weirdness at the start of a run,
but at least for a startup or hybrid run, that weirdness should work
itself out within about the first averaging period. A branch or restart
run could have some longer-term potential weirdness, so for now I think
we should recommend that people NOT change the time step on a branch or
restart run. With (significant?) additional work, we could probably
avoid this additional weirdness, but my feeling is that it isn't worth
the effort right now. In any case, I feel like my proposed fix will
bring things much closer to being correct than they currently are when
changing the time step.
ekluzek added a commit to ekluzek/CTSM that referenced this issue Jul 14, 2022
Fix accumulation variables when changing model time step

Accumulation variables (e.g., 1-day or 10-day averages) were writing and
reading their accumulation period (expressed in time steps) to the
restart file. This caused incorrect behavior when changing the model
time step relative to what was used to create the initial conditions
file (typically a 30-minute time step). So, for example, if you are
using a 15-minute time step with an initial conditions file that
originated from a run with a 30-minute time step (at some point in its
history), then an average that was supposed to be 10-day instead becomes
5-day; an average that was supposed to be 1-day becomes 12-hour, etc.
(The issue is that the number of time steps in the averaging period was
staying fixed rather than the actual amount of time staying fixed.)

For our out-of-the-box initial conditions files, this only impacts runs
that use something other than a 30-minute time step. Typically this
situation arises in configurations with an active atmospheric model that
is running at higher resolution than approximately 1 degree. It appears
that the biggest impacts are on VOC emissions and in BGC runs; we expect
the impact to be small (but still non-zero) in prescribed phenology (SP)
runs that don't use VOC emissions.

This tag fixes this issue by no longer writing or reading accumulation
variables' PERIOD to / from the restart file: this isn't actually needed
on the restart file.

See some discussion in ESCOMP#1789 for
more details, and see
ESCOMP#1802 (comment) for
some discussion of outstanding weirdness that can result for
accumulation variables when changing the model time step. The summary of
that comment is: There could be some weirdness at the start of a run,
but at least for a startup or hybrid run, that weirdness should work
itself out within about the first averaging period. A branch or restart
run could have some longer-term potential weirdness, so for now I think
we should recommend that people NOT change the time step on a branch or
restart run. With (significant?) additional work, we could probably
avoid this additional weirdness, but my feeling is that it isn't worth
the effort right now. In any case, I feel like my proposed fix will
bring things much closer to being correct than they currently are when
changing the time step.
glemieux added a commit to glemieux/ctsm that referenced this issue Jul 15, 2022
Fix accumulation variables when changing model time step

Accumulation variables (e.g., 1-day or 10-day averages) were writing and
reading their accumulation period (expressed in time steps) to the
restart file. This caused incorrect behavior when changing the model
time step relative to what was used to create the initial conditions
file (typically a 30-minute time step). So, for example, if you are
using a 15-minute time step with an initial conditions file that
originated from a run with a 30-minute time step (at some point in its
history), then an average that was supposed to be 10-day instead becomes
5-day; an average that was supposed to be 1-day becomes 12-hour, etc.
(The issue is that the number of time steps in the averaging period was
staying fixed rather than the actual amount of time staying fixed.)

For our out-of-the-box initial conditions files, this only impacts runs
that use something other than a 30-minute time step. Typically this
situation arises in configurations with an active atmospheric model that
is running at higher resolution than approximately 1 degree. It appears
that the biggest impacts are on VOC emissions and in BGC runs; we expect
the impact to be small (but still non-zero) in prescribed phenology (SP)
runs that don't use VOC emissions.

This tag fixes this issue by no longer writing or reading accumulation
variables' PERIOD to / from the restart file: this isn't actually needed
on the restart file.

See some discussion in ESCOMP#1789 for
more details, and see
ESCOMP#1802 (comment) for
some discussion of outstanding weirdness that can result for
accumulation variables when changing the model time step. The summary of
that comment is: There could be some weirdness at the start of a run,
but at least for a startup or hybrid run, that weirdness should work
itself out within about the first averaging period. A branch or restart
run could have some longer-term potential weirdness, so for now I think
we should recommend that people NOT change the time step on a branch or
restart run. With (significant?) additional work, we could probably
avoid this additional weirdness, but my feeling is that it isn't worth
the effort right now. In any case, I feel like my proposed fix will
bring things much closer to being correct than they currently are when
changing the time step.
@ManYue07
Copy link

Thank you for opening this issue. I agree that this appears to be a significant bug!

From looking through the code, I think I see what's happening here: the PERIOD of accumulated fields is read from the initial conditions file, but I don't think it needs to be, and doing so is problematic if the run you're doing uses a different time step than the time step used in creating the initial conditions file originally.

Can you try redoing your test after rebuilding the code with this block of code deleted:

if (accum(nf)%old_name /= "") then
varname = trim(accum(nf)%name) // '_PERIOD:' // trim(accum(nf)%old_name) // '_PERIOD'
else
varname = trim(accum(nf)%name) // '_PERIOD'
end if
call restartvar(ncid=ncid, flag=flag, varname=varname, xtype=ncd_int, &
long_name='', units='time steps', &
imissing_value=ispval, ifill_value=huge(1), &
interpinic_flag='copy', &
data=accum(nf)%period, readvar=readvar)

If that gives other problems, then a simpler but safer experiment would be to redo your test after setting ./xmlchange CLM_FORCE_COLDSTART=on so that the model doesn't use a restart file at all. (That won't be good for science, but if my hypothesis is right, then it should get around this issue. I'd like to see if that's true.)

Hi, Bill, @billsacks I have q quick question about this bug. Can this bug be interpreted as a time step of fewer than 30 minutes resulting in inconsistent time steps in CLM and CAM? Thus further affecting the simulation results? If so, how to understand the impact of this inconsistency?

@billsacks
Copy link
Member

Can this bug be interpreted as a time step of fewer than 30 minutes resulting in inconsistent time steps in CLM and CAM?

Not exactly. The issue is more subtle: CTSM has a number of accumulation fields that accumulate averages over some period. These accumulation fields weren't properly handling a change in time step (relative to what was used to generate the initial conditions file). So, for example, if you are using a 15-minute time step with an initial conditions file that originated from a run with a 30-minute time step (at some point in its history), then an average that was supposed to be 10-day instead becomes 5-day; an average that was supposed to be 1-day becomes 12-hour, etc. (The issue is that the number of time steps in the averaging period was staying fixed rather than the actual amount of time staying fixed.) It appears that the biggest impacts are on VOC emissions and in BGC runs; we expect the impact to be small (but still non-zero) in prescribed phenology (SP) runs that don't use VOC emissions.

@ManYue07
Copy link

Can this bug be interpreted as a time step of fewer than 30 minutes resulting in inconsistent time steps in CLM and CAM?

Not exactly. The issue is more subtle: CTSM has a number of accumulation fields that accumulate averages over some period. These accumulation fields weren't properly handling a change in time step (relative to what was used to generate the initial conditions file). So, for example, if you are using a 15-minute time step with an initial conditions file that originated from a run with a 30-minute time step (at some point in its history), then an average that was supposed to be 10-day instead becomes 5-day; an average that was supposed to be 1-day becomes 12-hour, etc. (The issue is that the number of time steps in the averaging period was staying fixed rather than the actual amount of time staying fixed.) It appears that the biggest impacts are on VOC emissions and in BGC runs; we expect the impact to be small (but still non-zero) in prescribed phenology (SP) runs that don't use VOC emissions.

Thank you so much for the explanation, it helped a lot.

@samsrabin samsrabin added the science Enhancement to or bug impacting science label Aug 8, 2024
@ekluzek ekluzek moved this to Done (non release/external) in CTSM: Upcoming tags Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly science Enhancement to or bug impacting science
Projects
Status: Done (non release/external)
Development

Successfully merging a pull request may close this issue.

7 participants