Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ncdump reports times like "2015-03-12 12:19:60.000000" #1928

Closed
rkouznetsov opened this issue Jan 7, 2021 · 18 comments
Closed

Ncdump reports times like "2015-03-12 12:19:60.000000" #1928

rkouznetsov opened this issue Jan 7, 2021 · 18 comments

Comments

@rkouznetsov
Copy link
Contributor

These times are difficult to read for human, and are perfectly capable of breaking many time-parsing routines..
The issue is reproducible with the latest master as of now (08da640) .

Way to reproduce:
$ cat timedump.ncd

netcdf timedump {
dimensions:
        time = UNLIMITED ; // (61 currently)
variables:
        int time(time) ;
                time:units = "seconds since 2015-03-12 12:00:00 UTC" ;
                time:long_name = "time" ;
                time:calendar = "standard" ;
                time:standard_name = "time" ;

// global attributes:
data:

 time = 0, 600, 1200, 1800, 2400, 3000, 3600, 4200, 4800, 5400, 6000, 6600,
    7200, 7800, 8400, 9000, 9600, 10200, 10800, 11400, 12000, 12600, 13200,
    13800, 14400, 15000, 15600, 16200, 16800, 17400, 18000, 18600, 19200,
    19800, 20400, 21000, 21600, 22200, 22800, 23400, 24000, 24600, 25200,
    25800, 26400, 27000, 27600, 28200, 28800, 29400, 30000, 30600, 31200,
    31800, 32400, 33000, 33600, 34200, 34800, 35400, 36000 ;
}

$ ncgen -o timedump.nc timedump.ncd
$ ncdump -t timedump.nc

The output is

....
 time = "2015-03-12", "2015-03-12 00:10:0.000000", 
    "2015-03-12 00:19:60.000000", "2015-03-12 00:30", 
    "2015-03-12 00:40:0.000000", "2015-03-12 00:49:60.000000", 
    "2015-03-12 01", "2015-03-12 01:10:0.000000", 
    "2015-03-12 01:19:60.000000", "2015-03-12 01:30", 
    "2015-03-12 01:40:0.000000", "2015-03-12 01:49:60.000000", 
    "2015-03-12 02", "2015-03-12 02:10:0.000000", 
    "2015-03-12 02:19:60.000000", "2015-03-12 02:30", 
....

Thank you!

@DennisHeimbigner
Copy link
Collaborator

What should it look like?

@rkouznetsov
Copy link
Contributor Author

Sorry, I would say it should look like

....
 time = "2015-03-12", "2015-03-12 00:10",     "2015-03-12 00:20", "2015-03-12 00:30",    "2015-03-12 00:40", "2015-03-12 00:50", 
....

The current representation with "00:49:60.000000" is an invalid specification of time. Consider:

$ date -u -d "2015-03-12 00:10:0.000000"
Thu 12 Mar 2015 12:10:00 AM UTC
$ date -u -d "2015-03-12 00:49:60.000000"
date: invalid date ‘2015-03-12 00:49:60.000000’
$

@czender
Copy link
Contributor

czender commented Jan 13, 2021

There appears to be a second issue not mentioned yet: time:units = "seconds since 2015-03-12 12:00:00 UTC" so all times should be 12 hours later than ncdump reports. This is what ncks produces:

ncks --cal timedump.nc
....
time = "2015-03-12 12:00:00", "2015-03-12 12:10:00", "2015-03-12 12:20:00", "2015-03-12 12:30:00", "2015-03-12 12:40:00", "2015-03-12 12:50:00", "2015-03-12 13:00:00", "2015-03-12 13:10:00",....

Am I missing something?

@Dave-Allured
Copy link
Contributor

@czender, that is a different problem, a bug in parsing of the units string. It is already fixed for next release. See #1417.

@Dave-Allured
Copy link
Contributor

This current problem, printing 60 seconds, was previously reported in e-support WSC-538180. There is some analysis in those messages, identifying a problem with roundoff error.
https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg14699.html

@rkouznetsov
Copy link
Contributor Author

The reason is the same, but in this case the problem can be solved properly.
In case of WSC-538180 the guys use floating-point for time, so rounding errors is something unavoidable. The issue can only be mitigated by various ad-hoc tricks (like 'time == (int) time"). As, probably, this issue will be treated as well. In the example above the time variable is integer, and therefore allows for exact arithmetics. Unfortunately, this fact is neglected by many developers around netcdf, and I have not had yet time to implement it properly e.g. in the netcdftime python module (now acquired by netCDF4).

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Jan 22, 2021

The analysis in the cited support thread is correct:

The actual value of sec is (on my machine) 59.999999986030161
and apparently this is being rounded up to 60.00000... by sprintf.

But apparently the proposed fix was never implemented.
However, my take on the proper solution is different.
Assuming a value of sec = 59.999999986030161 as a double,
I would have said that the correct output is "59.9999999"
So, which solution do you want?

@czender
Copy link
Contributor

czender commented Jan 25, 2021

Actually 59.999999986030161 rounds to 60.0000000 not 59.9999999 (because the next digit in the unrounded double is 8 which exceeds 5 so standard rules dictate rounding up). In other words, sprintf() rounds correctly (no surprise), and the rounded value must be checked for equality to 60.0, and if that condition is true, reset to 0.0 and carry one to the minute value.

@DennisHeimbigner
Copy link
Collaborator

I guess I was not clear. Do we want rounding or do we want truncation at the nth digit
(where n is about 6)?

@czender
Copy link
Contributor

czender commented Jan 25, 2021

I vote for rounding.

@DennisHeimbigner
Copy link
Collaborator

In order to check for rounding, I will need to define a constant
defining at what digit I do the round. e.g. sec+0.5, sec+0.05, etc
The above example rounded at digit 8 of the fraction part, but that was
just an example.
I am guessing that the value should be 0.0000005 (round at digit 7).

@czender
Copy link
Contributor

czender commented Jan 25, 2021

I'm not sure I understood the previous question, and I think there may be a better solution. Please see nco_cln_fmt_dt() in NCO here. The ttx structure was previously filled-in by UDUnits ut_decode_time(). Note the use of modf() in the crucial three lines:

      frc_sec=modf(ttx->sec,&m_sec);
      isec=(int)m_sec;      
      if(frc_sec == 0.0) sprintf(btime," %02d:%02d:%02d",ttx->hour,ttx->min,isec); else sprintf(btime," %02d:%02d:%02.7f",ttx->hour,ttx->min,ttx->sec);

This seems to work fine, and I'm unsure whether you would consider this truncation (because of casting to int) or rounding (because of sprintf()).

@DennisHeimbigner
Copy link
Collaborator

AFAIK libdispatch/nctime.c does not appear to use udunits.

@DennisHeimbigner
Copy link
Collaborator

I just noticed that PR #1864
had some side effects. Specifically for the test data cdl above, the first entry
was
time = "2015-03-12",...
and is now
time = "2015-03-12 12",...

Was this intentional?

rkouznetsov pushed a commit to rkouznetsov/netcdf-c that referenced this issue Jan 26, 2021
Ncdump reports times like "2015-03-12 12:19:60.000000" Unidata#1928
Imposes a microsecond accuracy on dumped time representation
@rkouznetsov
Copy link
Contributor Author

@DennisHeimbigner, It is not a side effect, but rather a fix. time = "2015-03-12",... was wrong. It has been fixed already.

I have filed a pull request that enforces a microsecond accuracy of time dump.
It should fix the issue with minimal harm..

@WardF
Copy link
Member

WardF commented Jan 26, 2021

Fixed in #1937

@WardF WardF closed this as completed Jan 26, 2021
@rkouznetsov
Copy link
Contributor Author

Fixed in #1937

Thank you! Indeed, it is kind-of a dirty fix. I can, probably, construct a case when it fails. The core reason is floating-point data behind the time variable. That is why in Silam we use integers for time output.

I believe the ultimate way would be to use two separate times for two different use cases:

  • Properly-implemented integer-based time, similar to the Python datetime or POSIX time, for Earth calendars (standard, proleptic georgian, julian if someone needs it etc). With these calendars one can make an exact time arithmetics, handle days of week etc..
  • Another one, double-precision-based for alien calendars used by climatologists (360day, 365day, 366day, all_leap, etc) to cover millions of years but without exact time arithmetics, without days of week.

That is definitely a major work, but having it done properly one does not have to deal with time-rounding issues anymore. Half-solution would be to use a double-precision-based time for seconds (currently it is for days, if i got it right), then any not-too-distant time with whole seconds can be represented exactly.

Any thoughts on that would be appreciated.

@DennisHeimbigner
Copy link
Collaborator

The libdispatch/nctime code came from somewhere else. Perhaps there is a newer package
we could use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants