Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect swagger output for timeseries values #905

Open
MikeNeilson opened this issue Oct 8, 2024 · 14 comments
Open

Incorrect swagger output for timeseries values #905

MikeNeilson opened this issue Oct 8, 2024 · 14 comments
Labels
approved-W192HQ23F0232-task4 Only valid if set by MikeNeilson, DanielO, CharlesG documentation Improvements or additions to documentation planned Something we've decided to do after discussion. priority:low Location within the general TODO list.

Comments

@MikeNeilson
Copy link
Contributor

Originally from HydrologicEngineeringCenter/cwms-data-api-client-javascript#3 and reported by @jbkolze

https://github.com/HydrologicEngineeringCenter/cwms-data-api-client-javascript/blob/4e1c0c8066986fb55882bdddb1fd978b6021c836/cwms-swagger.json#L9308-L9334

The "values" key in the TimeSeries return object is defined in the OpenAPI as an array of arrays of objects. The actual returned object is just an array of arrays of numbers (with no actual definition/structure). This is causing typing issues when using the cwmsjs library with TypeScript.

@MikeNeilson
Copy link
Contributor Author

So the question here is what is the best way to correct this?

Options

  1. Actually make the output an object?
  2. Properly enforce the array concept

2 is the least breaking, 1 might arguably be the "better" solution.

@DanielTOsborne @krowvin @rma-psmorris @rma-psmorris

@MikeNeilson MikeNeilson added priority:medium We care, and will try and get to it soon. approved-W192HQ23F0232-task4 Only valid if set by MikeNeilson, DanielO, CharlesG labels Oct 8, 2024
@rma-rripken
Copy link
Collaborator

If you're unhappy with the current objects and serialization the sooner it gets fixed the better. Its only going to get harder to change it.
As I understand what has been described so far the issue is that TimeSeries.values field serializes as TimeSeries.Record[] and TimeSeries.Record is configured to serialize as Object[]. The complaint is that serializing TimeSeries.Record as Object[] isn't correct and that it should be Number[] - is that an accurate summary?
I think that mapping TimeSeries.Record as an array was done because the serialized output of the "values" field looks compact and efficient probably a question for Mike and Daniel b/c I don't remember being a part of those discussions.

For example:
"values": [
[
1624287600000,
0.0,
0
],
[
1624288500000,
1.0,
0
],
[
1624289400000,
2.0,
0
],
[
1624290300000,
3.0,
0
]
]

Is less verbose than:
"values": [
{
"date-time": 1624287600000,
"value": 0.0,
"quality": 0
},
{
"date-time": 1624288500000,
"value": 1.0,
"quality": 0
},
{
"date-time": 1624289400000,
"value": 2.0,
"quality": 0
},
{
"date-time": 1624290300000,
"value": 3.0,
"quality": 0
}
]

Mapping TimeSeries.Record to Object[] isn't right but if the schema was changed to Number[] it wouldn't be right either. The Record object has three fields - Long, Double, int. Forcing those into a JSON array and expecting the clients to get the types right from the schema alone isn't going to work. At least as I understand things.

@MikeNeilson
Copy link
Contributor Author

It also dawns on me that this may just be a documentation problem and people wanting automation to always be perfect. The JSON return includes a nested object that defines the columns. Seen here: https://github.com/HydrologicEngineeringCenter/cwms-data-api-client-javascript/blob/4e1c0c8066986fb55882bdddb1fd978b6021c836/cwms-swagger.json#L9296-L9301

And since we will likely optionally expand that to include the data entry date, version, and text notes, we can't directly pin down the data types.

alternatively we could just have a fixed content-type for each but given the use of the simple array of arrays, that's still on the implementer end to sort out.

@jbkolze @krowvin Does the above statement from me make sense. Basically "the information required is presented, just not directly on those columns" due to the nature of that data.

@jbkolze
Copy link

jbkolze commented Oct 22, 2024

Thanks for the responses. Yes, @MikeNeilson, that makes sense. It sounds like there's probably not a "quick fix" to the issue, so I imagine the path of least resistance for now is to just edit the spec in cwmsJS as needed and fall back in line with CDA if/when the spec gets worked out.

I'm not worried too much about what the final answer is, but I do think it would be nice for the OpenAPI spec to be consistent with the data. I wonder if using JSON compression (e.g. gzip) for the response would allow for better data clarity (objects) while maintaining the space savings. That'd obviously be a big change to the response format, though.

@MikeNeilson
Copy link
Contributor Author

Hmm, gziping the data itself could maybe be done. Definitely a different content-type I would think.

I also wonder if there's just a better way in the OpenAPI spec to define this. Sometimes it's the annotations we use in java that cause the issue vs what openapi itself can actually represented.

@MikeNeilson
Copy link
Contributor Author

There's issue with gzipping the entire HTTP request packet due to security concerns, but a field or "the data" should be doable. But that's not going to let us get more data in because the main issue is the speed of the database query, not the return back to the user once it's built.

@jbkolze
Copy link

jbkolze commented Oct 22, 2024

My understanding is you can still use Content-Type: application/json, but you can add Content-Encoding: gzip and most modern clients handle it properly. But that's probably a moot point if encoding the whole packet is a security concern, and especially if the bottleneck is more on the database side -- I was assuming it was more of a bandwidth limitation.

As you mentioned, I also wonder if there's a more "elegant" way to handle it in the OpenAPI spec. That's unfortunately out of my bailiwick so I don't have any helpful suggestions there. Seems like something that would likely be supported in some way or another, but I suppose the question at that point will be whether the generator for cwmsJS can interpret it.

@MikeNeilson
Copy link
Contributor Author

That itself is helpful, hard to do anything but assume things work unless people speak up. And honestly that one I didn't even think about. Might be worth looking at https://github.com/HydrologicEngineeringCenter/cwms-data-api-client/tree/main/cwms-radar-model to see how it was dealt with in Java.

@rma-rripken
Copy link
Collaborator

If you want the types to be correct it would make more sense to have a times array and a quality array and a values array. Then each array could have its own type.

@MikeNeilson
Copy link
Contributor Author

Yeah, that's another way to do it. Given either form has performance implications when you need the other one, perhaps it makes sense to support both.

@Enovotny would it be easier to map a timeseries data set to a pandas frame if it was already different arrays or more or less indifferent?

@rma-rripken
Copy link
Collaborator

Wrt compression - Most clients and servers transparently support gzip which is more general purpose so I didn't think too hard about it.
The times[] is easy to delta-encode and then run-length-encode the deltas. I had a version of that working in Riverstatus and it works well for times and quality but not so well for values.

@Enovotny
Copy link

Enovotny commented Oct 22, 2024

the way I map it right now is a single function pd.Dataframe(json['values']) not sure it can get easier that that. I think separate arrays would make it more difficult. but it probably is just a zip call to it. If you could send me an example I could see if it works, but I would prefer that it stay as is at the moment. If you mess with that format it would likely mess with anyone outside who is grabbing data from CDA.

@MikeNeilson
Copy link
Contributor Author

We'll not worry about it for now. Were just discussing possible formats but they're all complimentary so only things that would be added, not replaced.

@MikeNeilson MikeNeilson added documentation Improvements or additions to documentation priority:low Location within the general TODO list. planned Something we've decided to do after discussion. and removed priority:medium We care, and will try and get to it soon. labels Oct 22, 2024
@MikeNeilson
Copy link
Contributor Author

I've marked this as low priority. the situation at issue has a current conclusion but there's always room for improvement. If any one comes up with new ideas or finally has a specific need please comment more here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved-W192HQ23F0232-task4 Only valid if set by MikeNeilson, DanielO, CharlesG documentation Improvements or additions to documentation planned Something we've decided to do after discussion. priority:low Location within the general TODO list.
Projects
Status: To triage
Development

No branches or pull requests

4 participants