Incorrect swagger output for timeseries values #905

MikeNeilson · 2024-10-08T14:03:55Z

Originally from HydrologicEngineeringCenter/cwms-data-api-client-javascript#3 and reported by @jbkolze

https://github.com/HydrologicEngineeringCenter/cwms-data-api-client-javascript/blob/4e1c0c8066986fb55882bdddb1fd978b6021c836/cwms-swagger.json#L9308-L9334

The "values" key in the TimeSeries return object is defined in the OpenAPI as an array of arrays of objects. The actual returned object is just an array of arrays of numbers (with no actual definition/structure). This is causing typing issues when using the cwmsjs library with TypeScript.

MikeNeilson · 2024-10-08T14:06:24Z

So the question here is what is the best way to correct this?

Options

Actually make the output an object?
Properly enforce the array concept

2 is the least breaking, 1 might arguably be the "better" solution.

@DanielTOsborne @krowvin @rma-psmorris @rma-psmorris

rma-rripken · 2024-10-18T17:50:02Z

If you're unhappy with the current objects and serialization the sooner it gets fixed the better. Its only going to get harder to change it.
As I understand what has been described so far the issue is that TimeSeries.values field serializes as TimeSeries.Record[] and TimeSeries.Record is configured to serialize as Object[]. The complaint is that serializing TimeSeries.Record as Object[] isn't correct and that it should be Number[] - is that an accurate summary?
I think that mapping TimeSeries.Record as an array was done because the serialized output of the "values" field looks compact and efficient probably a question for Mike and Daniel b/c I don't remember being a part of those discussions.

For example:
"values": [
[
1624287600000,
0.0,
0
],
[
1624288500000,
1.0,
0
],
[
1624289400000,
2.0,
0
],
[
1624290300000,
3.0,
0
]
]

Is less verbose than:
"values": [
{
"date-time": 1624287600000,
"value": 0.0,
"quality": 0
},
{
"date-time": 1624288500000,
"value": 1.0,
"quality": 0
},
{
"date-time": 1624289400000,
"value": 2.0,
"quality": 0
},
{
"date-time": 1624290300000,
"value": 3.0,
"quality": 0
}
]

Mapping TimeSeries.Record to Object[] isn't right but if the schema was changed to Number[] it wouldn't be right either. The Record object has three fields - Long, Double, int. Forcing those into a JSON array and expecting the clients to get the types right from the schema alone isn't going to work. At least as I understand things.

MikeNeilson · 2024-10-18T20:28:40Z

It also dawns on me that this may just be a documentation problem and people wanting automation to always be perfect. The JSON return includes a nested object that defines the columns. Seen here: https://github.com/HydrologicEngineeringCenter/cwms-data-api-client-javascript/blob/4e1c0c8066986fb55882bdddb1fd978b6021c836/cwms-swagger.json#L9296-L9301

And since we will likely optionally expand that to include the data entry date, version, and text notes, we can't directly pin down the data types.

alternatively we could just have a fixed content-type for each but given the use of the simple array of arrays, that's still on the implementer end to sort out.

@jbkolze @krowvin Does the above statement from me make sense. Basically "the information required is presented, just not directly on those columns" due to the nature of that data.

jbkolze · 2024-10-22T18:56:15Z

Thanks for the responses. Yes, @MikeNeilson, that makes sense. It sounds like there's probably not a "quick fix" to the issue, so I imagine the path of least resistance for now is to just edit the spec in cwmsJS as needed and fall back in line with CDA if/when the spec gets worked out.

I'm not worried too much about what the final answer is, but I do think it would be nice for the OpenAPI spec to be consistent with the data. I wonder if using JSON compression (e.g. gzip) for the response would allow for better data clarity (objects) while maintaining the space savings. That'd obviously be a big change to the response format, though.

MikeNeilson · 2024-10-22T19:02:04Z

Hmm, gziping the data itself could maybe be done. Definitely a different content-type I would think.

I also wonder if there's just a better way in the OpenAPI spec to define this. Sometimes it's the annotations we use in java that cause the issue vs what openapi itself can actually represented.

MikeNeilson · 2024-10-22T19:03:58Z

There's issue with gzipping the entire HTTP request packet due to security concerns, but a field or "the data" should be doable. But that's not going to let us get more data in because the main issue is the speed of the database query, not the return back to the user once it's built.

jbkolze · 2024-10-22T19:10:48Z

My understanding is you can still use Content-Type: application/json, but you can add Content-Encoding: gzip and most modern clients handle it properly. But that's probably a moot point if encoding the whole packet is a security concern, and especially if the bottleneck is more on the database side -- I was assuming it was more of a bandwidth limitation.

As you mentioned, I also wonder if there's a more "elegant" way to handle it in the OpenAPI spec. That's unfortunately out of my bailiwick so I don't have any helpful suggestions there. Seems like something that would likely be supported in some way or another, but I suppose the question at that point will be whether the generator for cwmsJS can interpret it.

MikeNeilson · 2024-10-22T20:10:13Z

That itself is helpful, hard to do anything but assume things work unless people speak up. And honestly that one I didn't even think about. Might be worth looking at https://github.com/HydrologicEngineeringCenter/cwms-data-api-client/tree/main/cwms-radar-model to see how it was dealt with in Java.

rma-rripken · 2024-10-22T20:19:22Z

If you want the types to be correct it would make more sense to have a times array and a quality array and a values array. Then each array could have its own type.

MikeNeilson · 2024-10-22T20:21:40Z

Yeah, that's another way to do it. Given either form has performance implications when you need the other one, perhaps it makes sense to support both.

@Enovotny would it be easier to map a timeseries data set to a pandas frame if it was already different arrays or more or less indifferent?

rma-rripken · 2024-10-22T20:22:59Z

Wrt compression - Most clients and servers transparently support gzip which is more general purpose so I didn't think too hard about it.
The times[] is easy to delta-encode and then run-length-encode the deltas. I had a version of that working in Riverstatus and it works well for times and quality but not so well for values.

Enovotny · 2024-10-22T20:52:12Z

the way I map it right now is a single function pd.Dataframe(json['values']) not sure it can get easier that that. I think separate arrays would make it more difficult. but it probably is just a zip call to it. If you could send me an example I could see if it works, but I would prefer that it stay as is at the moment. If you mess with that format it would likely mess with anyone outside who is grabbing data from CDA.

MikeNeilson · 2024-10-22T21:05:31Z

We'll not worry about it for now. Were just discussing possible formats but they're all complimentary so only things that would be added, not replaced.

MikeNeilson · 2024-10-22T21:18:08Z

I've marked this as low priority. the situation at issue has a current conclusion but there's always room for improvement. If any one comes up with new ideas or finally has a specific need please comment more here.

MikeNeilson mentioned this issue Oct 8, 2024

TimeSeries "values" structured incorrectly HydrologicEngineeringCenter/cwms-data-api-client-javascript#3

Closed

MikeNeilson added priority:medium We care, and will try and get to it soon. approved-W192HQ23F0232-task4 Only valid if set by MikeNeilson, DanielO, CharlesG labels Oct 8, 2024

MikeNeilson added this to CWMS-Data-Api Maintanance Tracking Oct 8, 2024

github-project-automation bot moved this to To triage in CWMS-Data-Api Maintanance Tracking Oct 8, 2024

MikeNeilson added documentation Improvements or additions to documentation priority:low Location within the general TODO list. planned Something we've decided to do after discussion. and removed priority:medium We care, and will try and get to it soon. labels Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect swagger output for timeseries values #905

Incorrect swagger output for timeseries values #905

MikeNeilson commented Oct 8, 2024

MikeNeilson commented Oct 8, 2024

rma-rripken commented Oct 18, 2024

MikeNeilson commented Oct 18, 2024

jbkolze commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

jbkolze commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

rma-rripken commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

rma-rripken commented Oct 22, 2024

Enovotny commented Oct 22, 2024 •

edited

Loading

MikeNeilson commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

Incorrect swagger output for timeseries values #905

Incorrect swagger output for timeseries values #905

Comments

MikeNeilson commented Oct 8, 2024

MikeNeilson commented Oct 8, 2024

rma-rripken commented Oct 18, 2024

MikeNeilson commented Oct 18, 2024

jbkolze commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

jbkolze commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

rma-rripken commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

rma-rripken commented Oct 22, 2024

Enovotny commented Oct 22, 2024 • edited Loading

MikeNeilson commented Oct 22, 2024

MikeNeilson commented Oct 22, 2024

Enovotny commented Oct 22, 2024 •

edited

Loading