Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getTimeStringForCapabilities() can't handle yearly or monthly intervals properly #145

Open
WeatherGod opened this issue Jul 22, 2022 · 8 comments

Comments

@WeatherGod
Copy link

original bug report at Unidata/ncWMS#41

Not exactly sure if the bug is in this repo, or unidata's fork, but here goes. Note that this report is in the context of using the WMS service through thredds.

The algorithm for getTimeStringForCapabilities() seems to work by joining together contiguous intervals of equal time as measured by milliseconds. This results in "weird" lists for monthly datasets because only December-January and July-August are equal in time.

As an example, for my monthly dataset, I get the following in my capabilities document, which is mostly a list rather than a single time interval like I configured for, which is likely to cause confusion.

<Dimension name="time" units="unknown" multipleValues="true" current="true" default="2022-05-01T00:00:00.000Z">
 2020-01-01T00:00:00.000Z,2020-02-01T00:00:00.000Z,2020-03-01T00:00:00.000Z,2020-04-01T00:00:00.000Z,2020-05-01T00:00:00.000Z,2020-06-01T00:00:00.000Z,
2020-07-01T00:00:00.000Z/2020-09-01T00:00:00.000Z/P31D,2020-10-01T00:00:00.000Z,2020-11-01T00:00:00.000Z,
2020-12-01T00:00:00.000Z/2021-02-01T00:00:00.000Z/P31D,2021-03-01T00:00:00.000Z,2021-04-01T00:00:00.000Z,2021-05-01T00:00:00.000Z,2021-06-01T00:00:00.000Z,
2021-07-01T00:00:00.000Z/2021-09-01T00:00:00.000Z/P31D,2021-10-01T00:00:00.000Z,2021-11-01T00:00:00.000Z,
2021-12-01T00:00:00.000Z/2022-02-01T00:00:00.000Z/P31D,2022-03-01T00:00:00.000Z,2022-04-01T00:00:00.000Z,2022-05-01T00:00:00.000Z
 </Dimension>

I was expecting

<Dimension name="time" units="unknown" multipleValues="true" current="true" default="2022-05-01T00:00:00.000Z">
2020-01-01T00:00:00.000Z/2022-05-01T00:00:00.000Z/P1M
</Dimensions>
@jonblower
Copy link
Member

Yeah, it doesn't surprise me that this happens with monthly data. I'm not sure the logic is built in to detect that a set of intervals (expressed in ms) correspond with calendar month boundaries. And AFAIK there isn't a way to record "calendar months" in a CF-NetCDF file.

@guygriffiths might have some more info on this.

@WeatherGod
Copy link
Author

Assuming that the algorithm used for getTimeStringForCapabilities() is similar to what it was in ncWMS, an idea I had for a possible solution to this is to scan through the list of time diffs and see if all members of the list are integer multiples of the set {28 * ms/day, 29 * ms/day, 30 * ms/day, 31 * ms/day}, and if so, then treat them as monthly intervals. A similar idea could be done for yearly periods by checking the set of {365 * ms/day, 366 ms/day}. Additional sanity logic might be needed to make sure that each millisecond timediff are for the appropriate months, but that might be a bit trickier to do in the general case of N monthly periods.

@jonblower
Copy link
Member

Yes, and the algorithm would also have to be calendar-aware (also strictly I think the ISO "P1M" syntax is probably only valid for the Gregorian calendar)

@guygriffiths
Copy link
Contributor

Yes, we deliberately avoided dealing with months and years precisely because they do not have constant lengths (it's mentioned in the API docs for getPeriodString http://reading-escience-centre.github.io/edal-java/apidocs/uk/ac/rdg/resc/edal/util/TimeUtils.html#getPeriodString(long)).

I'd be reluctant to add months/years unless the algorithm specifically checked every month for the correct length, and as @jonblower says, it would only be valid for the Gregorian calendar (i.e. the ISOChronology). I also think that it's unlikely to add much clarity, since the capabilities document is primarily designed to be machine-read, and the time dimension is generally used to generate a pre-populated list of concrete time values (for either a drop-down or a calendar widget).

It's worth noting that it would not make any difference in this specific case, since the chronology (i.e. <Dimension units=) is specified as "unknown" (which means that the calendar system is not ISO8601, julian, 360_day, noleap, or all_leap)

@WeatherGod
Copy link
Author

I'm fine with a restriction to the gregorian calendar; that makes sense to do. Given that it is the most common calendar, I doubt that restriction is going to be noticed much.

As for clarity, the issue for me was because I knew my data was monthly and I was working to add this dataset to THREDDS, which I was fairly new to using. Given that I was specifically setting a config that was documented to make the time in the capabilities document be reported as an interval, I was quite confused by the result. It seemed like a bug to me.

I have gone back to the WMS Capabilities document we are currently using generated by GeoServer for our full dataset stretching back to 1950. It is interesting to see what it does.

<Dimension name="time" default="2022-05-01T00:00:00Z" units="ISO8601">
1950-12-01T00:00:00.000Z/2022-05-01T00:00:00.000Z/P71Y5M2W2DT22H
</Dimension>

I see that either I've been misunderstanding what "period" means (silly scientist!), or GeoServer is. Also curious that it came up 2 hours short of a full day. But, in any case, GeoServer was able to recognize a single interval.

Thanks for noticing that the units was set to "unknown"; that might be a bug in THREDDS, as my original data has the time units set as "proleptic_gregorian", which I presume should have been translated to "ISO8601".

@jonblower
Copy link
Member

But, in any case, GeoServer was able to recognize a single interval.

I'd be really interested to see the algorithm that came up with an interval of P71Y5M2W2DT22H!

@WeatherGod
Copy link
Author

We would need to tread carefully, as GeoServer is GPL'ed. I could see if I can figure out where it is done in their codebase and provide a high-level description of the algorithm, perhaps?

@jonblower
Copy link
Member

Looking at this again, it seems that Geoserver is only recognising the start and the end date, and the bizarre-looking interval (71 years, 5 months, 2 weeks etc) is maybe just the difference between December 1950 and May 2022?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants