Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify error states for time-related error codes 1402 through 1405 #97

Closed
jvandegriff opened this issue Aug 24, 2020 · 19 comments · Fixed by #163
Closed

clarify error states for time-related error codes 1402 through 1405 #97

jvandegriff opened this issue Aug 24, 2020 · 19 comments · Fixed by #163
Labels
documentation NovHackathon to be resolved during Nov 2021 session
Milestone

Comments

@jvandegriff
Copy link
Collaborator

The error messages have ambiguities in terms of syntax or times being outside the valid ranges. Especially 1405 - what if there is some overlap, for example.

@rweigel
Copy link
Contributor

rweigel commented Aug 24, 2020

For reference, this is what is in the body of the spec

400 | 1402 | Bad request - error in start time
400 | 1403 | Bad request - error in stop time
400 | 1404 | Bad request - start time equal to or after stop time
400 | 1405 | Bad request - time outside valid range

and in the Appendix

"1402": {"status":{"code": 1402, "message": "HAPI error 1402: error in time.min"}},
"1403": {"status":{"code": 1403, "message": "HAPI error 1403: error in time.max"}},
"1404": {"status":{"code": 1404, "message": "HAPI error 1404: time.min equal to or after time.max"}},
"1405": {"status":{"code": 1405, "message": "HAPI error 1405: time outside valid range"}},

@rweigel
Copy link
Contributor

rweigel commented Aug 24, 2020

I think that there is general agreement that we should change the first three to be

1402 - Syntax error in time.min
1403 - Syntax error in time.max
1404 - time.min equal to or after time.max (no change from current)

My interpretation was "time outside valid range" meant "a time was outside of valid range":

1405 - time.min < startDate and/or time.max > stopDate

I'll let others state what their preference is. @aharonroberts mentioned a different interpretation that I'm not sure I can reproduce.

@aharonroberts
Copy link

aharonroberts commented Aug 24, 2020 via email

@rweigel
Copy link
Contributor

rweigel commented Aug 24, 2020

We have 1201 for no data in the requested time range. The response should be empty and ideally, 1201 appears in the HTTP header if a headerless response is requested.

@aharonroberts
Copy link

aharonroberts commented Aug 24, 2020 via email

@ericthewizard
Copy link

I agree that we don't need 1405; this would require users who are looking at the boundaries of the time series to know exactly where those boundaries are (or do multiple requests to find those boundaries). e.g., if someone requests data for 1March2015-31March2015, and the dataset starts on 3March2015, the server should return the data for March 3-March 31 and not a 400 error.

@rweigel
Copy link
Contributor

rweigel commented Aug 24, 2020

From a user perspective, @supervised's comment is a good argument for removing 1405.

A disadvantage is that it will make caching slightly more complex. Probably not so much that it justifies keeping it.

@jvandegriff jvandegriff added this to the Version 3.0+ milestone Nov 9, 2020
@jvandegriff jvandegriff modified the milestones: Version 3.0+, Version 3.1 May 19, 2021
@jvandegriff jvandegriff added the NovHackathon to be resolved during Nov 2021 session label Nov 1, 2021
@jbfaden
Copy link
Contributor

jbfaden commented Jun 17, 2022

I just implemented the server-side checks for these, and found them clear enough. I agree with @supervised, where I feel guilty putting in these precise checks that are going to be annoying for humans talking to the server. (It would be easy to replace a start which is before the startDate limit with the startDate, rather than throwing an exception, for example.

@jvandegriff jvandegriff modified the milestones: Version 3.2, Version 3.x Dec 20, 2022
@rweigel
Copy link
Contributor

rweigel commented Dec 22, 2022

Here is a proposal for clarification:

Replace

400 | 1402 | Bad request - error in start time
400 | 1403 | Bad request - error in stop time
400 | 1404 | Bad request - start time equal to or after stop time
400 | 1405 | Bad request - time outside valid range

with

1402 - Syntax error in start
1403 - Syntax error in stop
1404 -start equal to or after stop
1405 - start < startDate and/or stop > stopDate

I think that we should keep 1405 as-is for now. It was placed there with the intention of meaning start < startDate and/or stop > stopDate with the justification "specific is better." I'm no longer sure about the use case of the user looking for boundaries, given that start/stop is in the metadata. I'm not opposed to removing 1405, but I'd prefer to do so after there is a compelling need.

I've tried a request to the following servers with start < startDate. Here are the results:

1405

  • SSCWeb
  • AMDA
  • VirES-for-Swarm

No error; serves data starting at startDate

  • LISIRD
  • CDAWeb
  • CCMC_ISWA

@jvandegriff
Copy link
Collaborator Author

jvandegriff commented Jan 9, 2023

The main discussion here is about 1405:
400 | 1405 | Bad request - time outside valid range

It's current form: "time outside valid range" is ambiguous. Does this mean the whole requested time range is outside the valid range? OR just one of the request times (start or stop) is outside. Half the servers interpret this one way, half the other way.

To fix the ambiguity, we have to decide what is the least surprising behavior. We will do a poll to see what hapi-dev people think.

Option 1: make 1405 very strict, so that all requests must always fall within the known time ranges of the dataset. If the start time of the request is before the data start time (as advertised by the info), or if the stop time of the request is after the data end time, then the server must issue a 1405 error.
Implications:
-clients are responsible for knowing the available range and limiting their
-client programming is harder, since clients have to do extra work of keeping requests in bounds
-client debugging is maybe easier, since error messages are more informative (rather than clients just getting empty data, they would get an informative error message)

Option 2: make 1405 be only for when the entire requested time range is outside the valid range for the data; if a user request has any overlap with the valid range, just return the data that is present within the overlap.
Implications:
-could be easier for clients since they don't have to line up requests exactly; this is most helpful when you want the most recent data, since a dataset that is being appended to with the latest data will have a moving end date
-caching code is harder, since requests for different time ranges could result in the same data being returned
-if an end time of a dataset is a moving target, it makes it easier to hit
-since being non-strict on the end time seems somewhat desirable, it would then make more sense to also be non-strict about the start time
-this interacts with the server's requirement to not allow large time requests; if a user asks for 10 years of data, but only 2 days of the request overlaps with the valid time range for the dataset, will the server throw a "too much time requested" error? Or should it go ahead and return the data it does have? Servers need to be a little more complex to handle this.

Option 3: Remove 1405 all together. Don't throw any errors it the time range is non-overlapping or has start or stop outside the valid range. If there is no data just, return an empty response (there is a no-data response already:
200 1201 OK - no data for time range

Principles to follow:

  1. With lots of people implementing their own servers, it's harder for us to make it easy for them if we change the spec. Since we control the clients, we can fix things there and it sort of fixes it for everyone.
  2. cause the least surprise - what will users expect?
  3. being loose about inputs (be forgiving with what users expect), and strict on what you emit
  4. be strict with the spec at first, since it's easier to loosen up later if needed

@rweigel
Copy link
Contributor

rweigel commented Jan 24, 2023

Voting results

  1. (3) make 1405 strict so that if either start or stop time (requested) is outside dataset valid range, it is an error
  2. (1) throw the 1405 error only when there is no overlap of requested time range and valid data range
  3. (1) remove 1405 and just return empty data if there is no data inside any requested time range
  4. (3) remove 1405 but add a 1200 code to let users know clipping occurred

My preference is still for 1. It is simpler, has far fewer side effects that we'll need to deal with, and is what was originally intended for 1405. We can have the verifier check the error message and warn if it does return the allowed time range.

We can always loosen things in the future if there is a user request, which there has not been.

@VoyagerPWS
Copy link

Sorry I didn't notice this thread earlier. Is an internal gap part of a dataset valid range? How precise are the begin and end times required to be for a dataset valid range? If a request begins for hour 00 of a day, but the beginning of the valid range is actually 00:00:00.003, then is that an error? (I guess this is reiterating some of the above comments.) My strong opinion is that option 2 is the only possible way to handle errors. For any time request, a server should be expected to return all available data within the requested interval and, arguably, some reasonable number of samples beyond the boundaries of the request. I see no compelling reason to treat the dataset boundaries as special cases requiring different behavior (than, say, internal gaps). The definition of 1405 should be, imho, "no data in requested interval".

@jvandegriff
Copy link
Collaborator Author

HAPI specifies that servers should strictly only return records that fall within the time requested, with the start time being inclusive and the stop time exclusive. This allows for content from multiple requests to be stitched together seamlessly.

For a dataset that starts at 00:00:00.003, I think the server could go ahead and report a start time of 00:00 exactly.

@jvandegriff
Copy link
Collaborator Author

There was a lot of discussion on this Monday's HAPI developer's telecon. The main problem with option 2 or option 4 is that server behavior is different from server to server. It some servers are allowed to not report an error for a time range outside the request, then if a request is way outside (start time is off by years), then users will get confused about no data, since the server can't report an error on the input.

Having all servers strictly enforce the requirement that all time requests must be fully inside will ensure that there is consistent behavior across all servers.

It dies mean that clients must clip any requests to fit inside the advertised availability range of each datasets. They need to do this anyway once they get the data, so it just shifts where the careful clipping needs to be done.

One concern is the moving end date situation which is common for active missions.

We probably also need a way to specify RelativeStopDate that is an ISO interval (PT1D for one day, for example) to indicate that data is being added.

There was also the suggestion that servers be allowed to be sloppy up to a oint. If there is a cadence value give, servers could decide not to throw an error if the request was outside by only N * cadence, where N is small (maybe 5?)

(We would need a separate ticket for the RelativeStopDate.)

@rweigel
Copy link
Contributor

rweigel commented Feb 1, 2023

The server responds with an empty body if a request is made for a time range within start/stop with no data.


(I was writing the following when I saw Jon's summary above.)

I think that it would be useful for servers to allow start before the actual start and stop after actual stop. But on Monday we discussed several complications.

I recall one of the issues is that if the server does not update the stop date in the metadata, but data are returned when a stop date is given after that given in the metadata because the database is being continuously updated. A user may notice that they can just set the stop date to a time in the distant future. Someone who looks at the info response date will conclude the data are not being updated.

If servers do allow start = 00:00:00 when the actual start date is 00:00:30, what happens if the data are on a 1 ms cadence and the maxRequestDuration is 1 second? The client may conclude that the requested time range is too large unless additional logic is implemented to handle this case. Same for the server (or maybe not - the point is that there are many cases that would need documentation.)

If someone sends me a URL that returns data for a dataset with a start=1970-01-01 and stop=2000-01-01 but data only exists in 1999, I'd be confused.

Do we allow start dates of 0001-01-01T00:00:00Z and stop dates of 4000-01-01T00:00:00Z? If not, what should be allowed? This would require additional documentation and discussion as it may need to depend on the nominal cadence.

We also only touched on the implications for caching.

Given how long this discussion is taking and that the original intent was option 1., I suggest we go with option 1. and consider a modification if there is a strong demand for an alternative. This could be considered with relativeStopDate that we also discussed, which would need to be made in a major release. Both features would be useful, but many additional issues would need to be addressed.

@VoyagerPWS
Copy link

(Just my opinions here.) Be careful not to impose human expectations onto time handling. Human expectations are frequently self-inconsistent and just plain wrong. (E.g., Midnight New Year's Eve occurs at the beginning of Dec 31.) We have instruments that sample at 200000 samples per second; what if I want the second 50 samples? This is complicated by the occasional practice of listing the cadence as that of the repeating records, not of the waveform capture. Since time intervals necessarily must be specified as Tbegin <= t < Tend, then Tend must be allowed to be beyond the last time tag of the last record of a data set. I don't understand why allowing requested Tbegin and Tend outside the range of a data set should be confusing any more than internal gaps would be confusing to a user. You can only return the data that exist.

@jvandegriff
Copy link
Collaborator Author

If the stopDate is changing frequently, the spec could / should recommend that the expire data in the HTTP response should indicate how long the info response would be valid for.

@jvandegriff
Copy link
Collaborator Author

Add to the spec about how to handle real-time date that is being updated. You can have the server set the stopDate forward by a day. Or just have a stopDate that is in the future, so that you don't have to keep updating it.

A request (even a HEAD request) would include the expiry time for the info response.

Eventually allow for a relativeStopDate, this would be for version 4.0, since it is a new feature that affects the info response significantly.

@rweigel
Copy link
Contributor

rweigel commented Feb 7, 2023

Also add suggestion that 1405 errors should include what valid start/stops are.

rweigel added a commit that referenced this issue Feb 20, 2023
@rweigel rweigel linked a pull request Feb 20, 2023 that will close this issue
jvandegriff pushed a commit that referenced this issue Feb 20, 2023
Co-authored-by: Bob Weigel <rweigel@gmu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation NovHackathon to be resolved during Nov 2021 session
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants