Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hourly output broken by recent time manager update #589

Closed
phil-blain opened this issue Apr 1, 2021 · 10 comments · Fixed by #586 or #610
Closed

Hourly output broken by recent time manager update #589

phil-blain opened this issue Apr 1, 2021 · 10 comments · Fixed by #586 or #610

Comments

@phil-blain
Copy link
Member

I've just noticed that hourly outputs does not work anymore since the update to the time manager.

I'm comparing b720380 (Update Time Manager (#566), 2021-03-16) and 5a0a559 (Modified k1 value and some of the doc (#574), 2021-03-11) (the commit just before).

With these modification to the namelist:

diff --git 1/../before-time-manager-update2/ice_in 2/./ice_in
index 1dfb0b6..684c8ed 100644
--- 1/../before-time-manager-update2/ice_in
+++ 2/./ice_in
@@ -34,7 +33,7 @@ &setup_nml
     latpnt(2)      = -65.
     lonpnt(2)      = -45.
     dbug           = .false.
-    histfreq       = 'm','x','x','x','x'
+    histfreq       = 'h','x','x','x','x'
     histfreq_n     =  1 , 1 , 1 , 1 , 1
     hist_avg       = .true.
     history_dir    = './history/'
@@ -413,9 +412,9 @@ &icefields_nml
     f_VGRDb        = .false.
     f_VGRDa        = .true.
     f_bounds       = .false.
-    f_aice         = 'm' 
+    f_aice         = 'h' 
     f_hi           = 'm'
     f_hs           = 'm' 

With 5a0a559:

$ ll -rt $(cice_rundir)/history
total 12M
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_ic.2005-01-01-03600.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-32400.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-28800.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-25200.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-21600.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-18000.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-14400.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-10800.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-07200.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-75600.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-72000.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-68400.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-64800.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-61200.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-57600.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-54000.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-50400.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-46800.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-43200.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-39600.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-36000.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-02-00000.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-82800.nc
-rw-r--r-- 1 me mygroup 503K Apr  1 12:40 iceh_01h.2005-01-01-79200.nc

with b720380:

$ ll -rt $(cice_rundir)/history
total 504K
-rw-r--r-- 1 me mygroup 503K Apr  1 12:41 iceh_ic.2005-01-01-03600.nc

/cc @apcraig

@phil-blain
Copy link
Member Author

According to my cursory reading of ice_calendar, it would mean new_hour is always false, so that would mean hour == hourp (always) and so maybe hour is not set by the new calendar subroutines...

@apcraig
Copy link
Contributor

apcraig commented Apr 1, 2021

I'll do some debugging.

@apcraig
Copy link
Contributor

apcraig commented Apr 1, 2021

I found the problem and will fix it in #586

apcraig added a commit to apcraig/CICE that referenced this issue Apr 1, 2021
Update CI wget implementation CICE-Consortium#588
Create variable in ice_forcing.F90 called mixed_layer_depth_default and use it instead of c20
Fix bug in hourly output, created in time manager update CICE-Consortium#589
Set start year for all runs to 2005 and turn leap years on by default
Update documentation in history output section to add information about hist_avg namelist, noted in CICE-Consortium#566
apcraig added a commit that referenced this issue Apr 6, 2021
…efault value (#586)

* Update gx1 ic

- update set_nml.gx1prod to match current production system
- add apr 1 test case for gx1
- update landice tests, use gx1
- fix hmix default #585
- delete old code in ice_forcing.F90 (should have been done in earlier PR)

* update gx1coreii initial condition

* update icepack

* add gx1 debug test, expected to fail

* Update version number to 6.2.0

* Update gx3 and gx1 input filenames
Update CI wget implementation #588
Create variable in ice_forcing.F90 called mixed_layer_depth_default and use it instead of c20
Fix bug in hourly output, created in time manager update #589
Set start year for all runs to 2005 and turn leap years on by default
Update documentation in history output section to add information about hist_avg namelist, noted in #566

* update ic filenames, tests, and documentation

* update color links in test results wiki

* update hmix initialization
@phil-blain
Copy link
Member Author

Hi Tony, I found another related bug:

If histfreq_n = m is used with histfreq = 'h' and m != 1 , we get output every hour, not every m hours. I think that is due to the computation of elapsed_hours that does not take into account the current hour here:

elapsed_hours = elapsed_days * hours_per_day

This patch seems to do the trick:

diff --git i/cicecore/shared/ice_calendar.F90 w/cicecore/shared/ice_calendar.F90
index 4d7ae37..ebac664 100644
--- i/cicecore/shared/ice_calendar.F90
+++ w/cicecore/shared/ice_calendar.F90
@@ -352,7 +352,7 @@ subroutine calendar()
       hour = (msec+1)/(seconds_per_hour)
       elapsed_months = (myear - year_init)*months_per_year + mmonth - month_init
       elapsed_days = compute_days_between(year_init,month_init,day_init,myear,mmonth,mday)
-      elapsed_hours = elapsed_days * hours_per_day
+      elapsed_hours = elapsed_days * hours_per_day + hour - sec_init / seconds_per_hour
       call calendar_date2time(myear,mmonth,mday,msec,timesecs)
 
       !--- compute other stuff

However, I'm not sure of the +1 in the computation of hour. This would mean that at msec=3599 we would get hour=1 instead of 0, no? why is it computed this way ?


I also noticed that for histfreq = 'y', the variable myear is used in the modulo, but it should be the number of years elapsed since the beginning of the run, no ? Something like this :

diff --git i/cicecore/shared/ice_calendar.F90 w/cicecore/shared/ice_calendar.F90
index 4d7ae37..22b1f74 100644
--- i/cicecore/shared/ice_calendar.F90
+++ w/cicecore/shared/ice_calendar.F90
@@ -328,6 +328,7 @@ subroutine calendar()
       integer (kind=int_kind) :: &
          ns                         , & ! loop index
          yearp,monthp,dayp,hourp    , & ! previous year, month, day, hour
+         elapsed_years              , & ! since beginning this run
          elapsed_days               , & ! since beginning this run
          elapsed_months             , & ! since beginning this run
          elapsed_hours                  ! since beginning this run
@@ -350,7 +351,8 @@ subroutine calendar()
       idate = (myear)*10000 + mmonth*100 + mday ! date (yyyymmdd) 
       yday = daycal(mmonth) + mday            ! day of the year
       hour = (msec+1)/(seconds_per_hour)
-      elapsed_months = (myear - year_init)*months_per_year + mmonth - month_init
+      elapsed_years = myear - year_init
+      elapsed_months = elapsed_years*months_per_year + mmonth - month_init
       elapsed_days = compute_days_between(year_init,month_init,day_init,myear,mmonth,mday)
       elapsed_hours = elapsed_days * hours_per_day
       call calendar_date2time(myear,mmonth,mday,msec,timesecs)
@@ -373,7 +375,7 @@ subroutine calendar()
          select case (histfreq(ns))
          case ("y", "Y")
             if (new_year  .and. histfreq_n(ns)/=0) then
-               if (mod(myear, histfreq_n(ns))==0) &
+               if (mod(elapsed_years, histfreq_n(ns))==0) &
                    write_history(ns) = .true.
             endif
          case ("m", "M")

If you agree I can submit a PR.

@phil-blain phil-blain reopened this May 6, 2021
@apcraig
Copy link
Contributor

apcraig commented May 6, 2021

I'll take a look at the multiple hourly output frequency, thanks @phil-blain.

The other suggestions are a separate question. Should the model output frequencies be relative to the init time or the basic clock. So, if you want output every 3 days, do you get output every 3 days relative to 0000-01-01-00000 or relative to year_init-month_init-day_init-sec_init. My feeling is that it should be relative to the basic clock. I think you are proposing it be relative to the initial time, so that's partly where you want code changes.

The problem with using the initial time is that it creates challenges across a run or multiple runs. Lets say you want output every 3 days and have an initial condition at the first timestep of each month. If you initialize each run and ask for output every 3 days, the output will be generated on the same dates for all runs if you reference the basic clock. If you reference the initial date, the output will be written on different dates depending on the run making clean comparisons more difficult.

Another example. If you are generating monthly averages and you start a run on Jan 15. Do you want the Jan mean, Feb mean, March mean. Or do you want the mean Jan 15-Feb 15, Feb 15-Mar 15, etc.

Another example, if your sec_init is non-zero, do you really want everything (hourly, daily, monthly, yearly) frequency offset by some partial day?

I think the default should be frequencies relative to the basic clock, which is what's implemented. Again, this provides some level of consistency with respect to the output. I guess we could create an option that would allow the frequencies to be relative to the initial date/time, but this is actually much more complicated to do and probably has implications for how we create history filenames and maybe even the basic history implementation among other things. It also may impact how restarts frequencies are defined.

I'm open to discussion about this. @eclare108213 @dabail10

@phil-blain
Copy link
Member Author

I understand what you are saying. Currently the code does this:

hour = (msec+1)/(seconds_per_hour)
elapsed_months = (myear - year_init)*months_per_year + mmonth - month_init
elapsed_days = compute_days_between(year_init,month_init,day_init,myear,mmonth,mday)
elapsed_hours = elapsed_days * hours_per_day

and then uses myear, elapsed_days, elapsed_months and elapsed_hours in the modulo computation with histfreq_n(ns) a few lines below. So from what I understand:

  • for monthly outputs: it does not take into account starting the run mid-month, but is relative to the starting month of the run
  • for daily outputs: it's relative to the starting date. So in your example above with initial conditions on the first of each month and outputs each three days, each run would possibly have outputs on different days. (unless I'm missing something). In fact I think this is the correct behaviour.
  • for yearly output: as I wrote above it's not relative since it uses myear directly.
  • for hourly outputs: with my modification above it would be relative to the beginning of the run. That is exactly what was the behaviour before the update to the time manager, since the old computation was:
    elapsed_hours = int(ttime/3600)

    and ttime held the number of seconds since the beginning of the run.

I think the current behaviour is mostly correct (apart from the multi-hourly output). For yearly output I was just noting what I was reading of the code; the behaviour also changed (I think) with the new time manager since the old computation was using nyr, which at that point held the number of year since the beginning of the run:

if (mod(nyr, histfreq_n(ns))==0) &

I'm not suggesting major modifications to the history implementation.

@apcraig
Copy link
Contributor

apcraig commented May 7, 2021

Ok, thanks @phil-blain for clarifying. I agree that we're mostly matching up with the prior implementation. Maybe that's not so good. Again, in the prior implementation, we didn't really have the ability to start on anything other than Jan 1 0 seconds very easily. So you can almost argue month_init, day_init, and sec_init were more or less fixed as 1, 1, 0 which then puts us in the position that relative time and absolute time are almost the same thing.

If anything, I favor modifying the current implementation so the frequencies are all relative to absolute time, so it's very predictable and consistent for all runs. There really are many ways this could be done. Lets assume you wanted a daily average every 3 days (I'm not sure the code handles that specifically, but lets use it as it's interesting). The 3 day frequency could be computed

  • relative to the 0001-01-01-00000
  • relative to the start year, start month, and start day
  • always relative to 0s regardless of sec_init
  • relative to the second init which would mean that a daily average would be an average across 24 hours over 2 distinct days which might impact how we name the history file

I think you are advocating for the second bullet, I'm advocating for the first, but there really are many ways this could be implemented. We probably need to clean up the implementation and clearly document it. I'd like to hear what others think @eclare108213 @dabail10. We also can support multiple methods of computing the output frequency which might be good too. What would those be and how do we specify them in namelist?

This probably should have been ironed out during the time manager development. Sorry about that, it didn't really show up on my radar. It's obviously not completely correct or documented now, so we'll need to come up with a design and I'll fix it.

@TillRasmussen
Copy link
Contributor

From DMI's point of view I think that both of the two options would work. Here we mainly use snapshots. When running longer test runs I would almost always start from the beginning of a month if I output averages. If option two is used then I would also average according to a none zero sec_init.

@eclare108213
Copy link
Contributor

I vote for relative to the absolute calendar rather than the initial time. The latter could be made an option but I think it will be extremely finicky and difficult to manage.

apcraig added a commit to apcraig/CICE that referenced this issue Jun 19, 2021
- Fix bugs in history/restart frequency associated with new calendar (CICE-Consortium#589)
- Define frequency in absolute terms relative to 0000-01-01-00000 and document (CICE-Consortium#589)
- Update set_nml.histall to include hourly output (CICE-Consortium#589)
- Update test scripts to cleanly abort if run fails where possible (CICE-Consortium#608)
- Update decomp test so it's rerunable, remove restart at start of run (CICE-Consortium#601)
- Add ability to do bfbcomp tests with additional options set on command line (CICE-Consortium#569)
- Update documentation of calendar frequency computation, calendar types, and closed boundaries (CICE-Consortium#541)
- Add optional doabort flag to abort_ice to control whether the method aborts.  This
  is useful for testing and code coverage statistics, although doabort=.false.
  will not call the actual abort method, but we can test the interfaces and
  rest of the code.
@apcraig
Copy link
Contributor

apcraig commented Jun 19, 2021

#610 should take care of this.

In the end, I implemented two new namelist, histfreq_base and dumpfreq_base. That allows the history and restart frequency reference date to be set for history and restart files separately. By default, histfreq_base = 'zero' and dumpfreq_base = 'init'. 'zero' means the reference date is 0000-01-01. 'init' means the reference date is year_init-month_init-day_init.

It turns out, for restarts, we really do need to reference the 'init' time for testing. For instance, writing restarts after 5 days creates chaos in our testing if we use 'zero' because then restarts are at relatively "unpredicatable" times. So for testing sanity, I set the dumpfreq_base default to 'init'.

However, for history output, I think we do want an absolute reference data for consistency across runs as the default. So I've set the default to 'zero' for histfreq_base.

Users can quickly switch either or both to operate in the other mode. There still is no way to adjust the phase of the output. Yearly, monthly, daily, and hourly outputs are always written at the start of each "period". This is a key aspect of the implementation. For instance, you cannot create monthly data on the 10th of each month. Even in "init" mode starting on the 10th of the month (day_init=10), monthly data is always written on the first of the month. I think this was always the intent.

So I think we address the main issues for now. The user guide has been updated in #610. There are so many other things we could consider if we want to extend the capabilities. We could introduce phasing, alternative ways to control the relative date, and we could also introduce separate controls for yearly, monthly, daily, and hourly data. Until we have a request to do so, I think we're good.

Feel free to suggest alternative names for histfreq_base, dumpfreq_base, 'zero', and 'init' if you feel these are not clear, probably in #610.

apcraig added a commit that referenced this issue Jul 2, 2021
* Fix history/restart frequency bugs and update scripts

- Fix bugs in history/restart frequency associated with new calendar (#589)
- Update set_nml.histall to include hourly output (#589)
- Update test scripts to cleanly abort if run fails where possible (#608)
- Update decomp test so it's rerunable, remove restart at start of run (#601)
- Add ability to do bfbcomp tests with additional options set on command line (#569)
- Update documentation of calendar frequency computation, calendar types, and closed boundaries (#541)
- Add optional doabort flag to abort_ice to control whether the method aborts.  This
  is useful for testing and code coverage statistics, although doabort=.false.
  will not call the actual abort method, but we can test the interfaces and
  rest of the code.
- Add histfreq_base and dumpfreq_base ('init' or 'zero') to specify
  reference data for history and restart output.  Defaults are
  'zero' and 'init' respectively for hist and dump.
  Setting histfreq_base to 'zero' allows for consistent output
  across multiple runs.  Setting dumpfreq_base to 'init' allows
  the standard testing which requires restarts be written,
  for example, 5 days after the start of the run.
- Remove extra abort calls in bcstchk and sumchk on runs that
  complete fine but don't pass checks.  These aborts should never
  have been there.
- Update documentation.

- Clean up some of the unit tests to better support regression testing

- modify initial/restart implementation
- restart namelist is deprecated, now computed internally
- modify initial/continue init checks and set restart and use_restart_time as needed
- create compute_relative_elapsed method in ice_calendar to improve code reuse
- update documentation with regard to initial/continue modes
- Set default use_restart_time to false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment