Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_collection: Clarification on temporal_extent #331

Closed
Ardweaden opened this issue Mar 4, 2022 · 18 comments · Fixed by #394
Closed

load_collection: Clarification on temporal_extent #331

Ardweaden opened this issue Mar 4, 2022 · 18 comments · Fixed by #394

Comments

@Ardweaden
Copy link
Contributor

Process ID: load_collection

Describe the issue:
Documentation states:

The second element is the end of the temporal interval. The specified instance in time is excluded from the interval.

Would e.g ["2017-01-01", "2017-01-01"] or ["2017-01-01T00:00:00Z", "2017-01-01T00:00:00Z"] be valid or invalid temporal_extents? If it's valid, how would ["2017-01-01", "2017-01-01"] be converted to a closed datetime interval?

Proposed solution:
Add examples to clarify behaviour when values in temporal_extent are the same.

Additional context:
We noticed VITO's backend considers aforementioned temporal_extents valid.

@m-mohr
Copy link
Member

m-mohr commented Mar 4, 2022

Hey @Ardweaden,

good point. Quickly going through it for clarity. Here is the definition of the parameter and I tried to highlight the relevant parts:
image

The date schema allows e.g. 2017-01-01 and the date-time allows e.g. 2017-01-01T00:00:00Z.
The first instance is included, the last instance is excluded. That means ["2017-01-01", "2017-01-01"] would basically be equivalent to ["2017-01-01T00:00:00Z", "2017-01-01T00:00:00Z"].

Now the issue is that the same instance should be included and excluded, indeed. Strictly speaking, I'd say they are invalid right now and should probably be two distinct dates. Nevertheless, I see that people use the notion of providing the same date times pretty often. So we should probably consider mentioning that "including" beats "excluding" ;-) or something like that. What would be your preference?

@m-mohr m-mohr self-assigned this Mar 4, 2022
@Ardweaden
Copy link
Contributor Author

This was the suggestion from @mkadunc

  • treat each date year and date-time string as a temporal interval, not a time instant ("2018" is commonly interpreted as the whole year, not just midnight when the year starts)
  • example: "2017-01-01" becomes [2017-01-01, 2017-01-02) (note open right boundary)
  • in µs precision, the above example means [2017-01-01T00:00:00.000000, 2017-01-01T23:59:59.999999] (closed right boundary)
  • temporal_extent can then be parsed as [[start.start, start.end), [end.start, end.end))
  • we can compose the effective temporal_extent interval to be used in SH as [start.start, Math.max(start.end, end.start)) (with open right boundary)

I don't really have a preference, we just want to implement the standard as closely as possible to how it was intended so that we don't end up with unnecessary backend-specific quirks. I do think that if this is somewhat unclear, it would probably best to clarify it using the interpretation of VITO's backend, given it is by far the most advanced,

@m-mohr
Copy link
Member

m-mohr commented Mar 7, 2022

For some context, here's the discussion that lead to the current definition: #7

A good argument here is that this follows ISO 8601 wrt the interpretation of time intervals. As such, I'd expect that this is also how libraries implement it and we probably shouldn't derive from it too far. Unfortunately, I can't find any information in 8601 about this special case yet. So following ISO 8601: the duration of giving the same start and end time would always be 0.

All changes here would effectively break the existing definitions, which is not possible. What we could do is that we actually clarify the documentation. Should specifying the same start and end date result in an exception? Not very satisfying though.

By the way, this also influences filter_temporal, load_result, and potentially aggregate_temporal and climatological_normal.

We noticed VITO's backend considers aforementioned temporal_extents valid.

What does valid mean here? What does it result in?

@Ardweaden
Copy link
Contributor Author

Ardweaden commented Mar 7, 2022

What we could do is that we actually clarify the documentation

Of, course, this is what we were asking to begin with. The idea, however, was constructed in a way that it shouldn't violate the current wording of the process, but allows for time intervals with identical start/end.

Should specifying the same start and end date result in an exception? Not very satisfying though.

True, but it sounds to me like it's the most consistent with the current wording. But the most practically useful would probably be "including" beats "excluding" suggestion (which is kind of similar to Miha's interpretation).

What does valid mean here? What does it result in?

It means that e.g. the following doesn't raise an error but returns a valid result:

curl 'https://openeo.vito.be/openeo/result' \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer oidc/egi/token' \
  --data-raw '{"plan":null,"budget":null,"process":{"process_graph":{"1":{"process_id":"load_collection","arguments":{"id":"SENTINEL2_L2A","spatial_extent":{"west":13.891875860776567,"south":46.00883030857273,"east":13.914738275405409,"north":46.03940618224049},"temporal_extent":["2022-02-24T00:00:00Z","2022-03-24T00:00:00Z"],"bands":["B01"]}},"2":{"process_id":"save_result","arguments":{"data":{"from_node":"1"},"format":"PNG"},"result":true}},"parameters":[]}}' \
  --compressed

or

curl 'https://openeo.vito.be/openeo/result' \
  -H 'content-type: application/json' \
  -H 'authorization: Bearer oidc/egi/token' \
  --data-raw '{"plan":null,"budget":null,"process":{"process_graph":{"1":{"process_id":"load_collection","arguments":{"id":"SENTINEL2_L2A","spatial_extent":{"west":13.891875860776567,"south":46.00883030857273,"east":13.914738275405409,"north":46.03940618224049},"temporal_extent":["2022-02-24","2022-03-24"],"bands":["B01"]}},"2":{"process_id":"save_result","arguments":{"data":{"from_node":"1"},"format":"PNG"},"result":true}},"parameters":[]}}' \
  --compressed

@m-mohr
Copy link
Member

m-mohr commented Mar 7, 2022

Should specifying the same start and end date result in an exception? Not very satisfying though.

True, but it sounds to me like it's the most consistent with the current wording. But the most practically useful would probably be "including" beats "excluding" suggestion (which is kind of similar to Miha's interpretation).

I think the "including beats excluding" is also compliant with ISO8601, but it may not exactly do what you expect here.
["2017-01-01", "2017-01-01"] is basically equivalent to ["2017-01-01T00:00:00Z", "2017-01-01T00:00:00Z"] according to ISO8601, which means ["2017-01-01", "2017-01-01"] does not cover the whole day, it's just the first (milli)second basically.
So effectively, users should do [2017-01-01, 2017-01-02] to get what they usually mean with ["2017-01-01", "2017-01-01"]. So in that sense an error would actually be less confusing as we point them at the issue directly...

It means that e.g. the following doesn't raise an error but returns a valid result: [...]

To add to what I wrote in the previous paragraph, this should only return data for the first (milli)second of the day. It seems that there's an implementation detail about VITOs data cubes that may actually return a "composite" of the full day though. I just learned about it today: https://discuss.eodc.eu/t/add-a-time-acquisition/256 So by just merging to the first second of the day, you actually get what you are looking for although the original acquisitions are captured throughout the day. Not really verified that, but it looks like this is true.

@mkadunc
Copy link
Member

mkadunc commented Mar 7, 2022

To add to what I wrote in the previous paragraph, this should only return data for the first (milli)second of the day

Millisecond or second? If the interval bounaries are interpreted as instants, then the precision shouldn't matter and we should assume infinite precision. (i.e. we would only return data that are timestamped with the exact same UTC time, i.e. "2017" or "2017-01" or "2017-01-01" or "2017-01-01T00:00:00Z")

Re ISO 8601 - a copy of the standard that I found has this to say about time interval:

2.1.3 time interval

part of the time axis limited by two instants

[IEC 60050-111]

Note to entry: A time interval comprises all instants between the two limiting instants and, unless otherwise
stated, the limiting instants themselves.

@m-mohr
Copy link
Member

m-mohr commented Mar 7, 2022

Yes, I meant instants and agree with you. The "(milli)seconds" were referring to the "representation" here as in ISO8601 you usually provide second or millisecond precision in the textual representation.

Interesting. Which copy/revision/version is this from @mkadunc? I'll also check our version again...

@mkadunc
Copy link
Member

mkadunc commented Mar 7, 2022

The text is from https://www.sis.se/api/document/preview/905564/

@soxofaan
Copy link
Member

soxofaan commented Mar 8, 2022

FYI: it's a known issue that the VITO back-end indeed does not follow the [include, exclude] rules strictly:

It's not trivial to change/fix the implementation because we don't want to break existing usage that depends on the current behavior, as noted above:

Nevertheless, I see that people use the notion of providing the same date times pretty often.

@jdries knows the history and context better, but I don't think we have a concrete plan of how to get out of this pickle, yet.

Also as noted above, we work with day-composite time resolution for most of our internal collections, practically meaning that observations always have time 00:00:00.0Z. Interestingly, this means that the VITO implementation is accidentally valid for "same day" ranges (e.g. ["2017-01-01", "2017-01-01"]) under the "including beats excluding" interpretation I think.

@m-mohr
Copy link
Member

m-mohr commented Mar 8, 2022

I can't find a full-text of the ISO 8601 PDFs anymore, so the only source is @edzer in #7 for now. Asked for his source.

Anyway, we can't really change the semantics in the processes completely. We may only clarify behavior that is ambiguous. The "including beats excluding" doesn't seem very useful though as it's still just an instance, e.g. ["2017", "2017"] would then simply be "2017-01-01T00:00:00.0000Z". That's not intuitive to users and I think they would expect something different than they get. So I think it would be better to let them know and throw an error in this case, telling them to increase the range. VITO would probably stay incompatible for now until we move to v2 or so. I don't really have a better solution for now.

@Ardweaden
Copy link
Contributor Author

Thank you. If this decisions is final and everyone agrees, we'll implement it like that. But I'd ask for this clarification to be put into the definition of the load_collection process (and others, if necessary).

@m-mohr
Copy link
Member

m-mohr commented Mar 9, 2022

This decision is not final, I'd still like hear the opinions of more implementors. Let me drag some more people in:
@jdries @edzer @mmacata @ValentinaHutter @clausmichele

@Ardweaden
Copy link
Contributor Author

Any progress on this?

@m-mohr
Copy link
Member

m-mohr commented Apr 8, 2022

No, I did not hear any further feedback (e.g. from the people tagged above).

@clausmichele
Copy link
Member

In my opinion from the process description, ["2017-01-01", "2017-01-01"] or ["2017-01-01T00:00:00Z", "2017-01-01T00:00:00Z"] are invalid. However, if we want to specify a different behavior for me it's also fine.

When parsing the load_collection process I compute the "useful" time range with [time_a, time_b - delta_t], were delta_t could be 1 day or 1 ms, depending on the data that we are considering. But if we have time_a = time_b, then subtracting a delta would cause to have a time_b < time_a, which doesn't make much sense.

@Ardweaden
Copy link
Contributor Author

Just letting you know that we implemented in a very similar to what @clausmichele described, except we always subtract a millisecond.

@m-mohr m-mohr added this to the 2.0.0 milestone Oct 12, 2022
m-mohr added a commit that referenced this issue Nov 28, 2022
…ance in time must be after the first instance in time. #331
@m-mohr
Copy link
Member

m-mohr commented Nov 28, 2022

It seems the implementations tend more towards the approach that the two elements in the intervals should not be the same, so I created a PR that proposes this breaking change for v2.0.0: #331

@m-mohr
Copy link
Member

m-mohr commented Mar 31, 2023

  • Stick with left-closed intervals
  • Agreed on replacing the missing time component with 0, and the timezone is UTC (= "Z") for date-only and time-only
  • drop date (time) support from between and add a new process date_between.
  • drop year subtype whenever feasible, check whether the subtype should only allow 4 digits (RFC3339/ISO860) or allow less digits and a negative sign.
  • 24 as the hour is forbidden
  • drop time support from sort and order
  • times in aggregate_spatial can go over night
  • Is there a format time in JSON Schema?

m-mohr added a commit that referenced this issue Apr 29, 2023
* The temporal intervals must always be non-empty, i.e. the second instance in time must be after the first instance in time. #331

* Add uniqueItems, remove mention of 24 as the hour, remove temporal-interval subtype from climatological_normal (as it has inclusive upper boundaries)

* Improved terminology

* Update load_stac

* Updates according to recent discussions #331

* Apply suggestions from code review

Co-authored-by: Stefaan Lippens <soxofaan@users.noreply.github.com>

* Remove timezone from times

---------

Co-authored-by: Stefaan Lippens <soxofaan@users.noreply.github.com>
@m-mohr m-mohr closed this as completed Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants