Proposal: New response structure #8450

jsternberg · 2017-06-03T01:19:48Z

This issue is directly tied to #8330 and really can't be separated from it. If accepted, both of these would be implemented at the same time.

Feature Request

I think we should reconsider the response structure that we want to be used for the client. I do not think the current one is user-friendly towards either users or API writers. To be clear, when I say response structure, I mean the way the data is structured as it comes back. I do not mean the format like JSON, MessagePack, Protocol Buffers, etc. I think the format itself is broken and based on concepts that haven't existed in the query engine since before 1.0 (or at least not how I think of them while working on the query engine). Changing the format will not fix underlying issues that get run into when programming clients.

I've seen issues that indicate people have difficulty with a large amount of data and I have experienced the problem myself while trying to work on the experimental client.

Large amounts of data means I want to be able to process the stream iteratively, but the current format makes this difficult because we return "results" that contain "series" (which we call "rows" in the code and "series" everywhere that is user facing) and inside of those series are the "values" (which should be called rows).

When we need to break down the data so it gets returned in chunks, we use the concept of a partial result or a partial series. The partial result says that the next result is actually part of the current result and a partial series means the next series is part of the current series. But, when you get into nesting and actually reading from the cursor, you end up with this absurdly complicated code to correctly find which result the next series is in. As an example, here is the code that I created for the experimental client that fully handles partials in a way so the user does not have to worry about partials: https://github.com/influxdata/influxdb-client/blob/2d000abc81dd6588d94ee5578810e9fc64bec5ab/json_cursor.go

I consider this code to be unreadable and I believe it would hamper adoption since you cannot realistically use chunking. I say this as the person who added that feature as an answer to large amounts of data.

But, the idea behind chunking was to make it a default so data would be streamed rather than forcing a potentially large amount of memory to be allocated on the server. It was supposed to solve problems. While it has not solved those problems, that doesn't mean the original problems went away. We still need a way to stream points back to the user instead of returning it as a JSON blob.

With that, I think we should discuss ways that we can potentially change the format. Internal to the query engine, we already return the data in a way that would work. But then we aggregate it and return it in our monster format that's difficult to impossible to read correctly. We will obviously have to maintain the same format with JSON and CSV, but I think we should consider a structure that works better with streams for MessagePack before we merge that and then start using that for any future clients.

Sineos · 2017-06-03T16:35:53Z

Maybe according to the HTTP standard in RFC2616. It leves this up to the client. See e.g. https://benramsey.com/blog/2008/05/206-partial-content-and-range-requests/

jsternberg · 2017-06-05T05:24:16Z

Thank you @Sineos, but I don't think the problem isn't as simple as something that can be solved as part of the HTTP standard. Partial content and range requests is a good idea and I've considered it before, but it doesn't necessarily match with the query workflow.

To expand, the issue is mostly with how things are encoded. For each chunk that is sent back to the client, it has to be parseable. So you can't just split a JSON response between two chunks as you won't be able to read it iteratively. Because of that, we need to send JSON chunks instead. But, we send back the chunks with nested structures like {"results":[{"statement_id":0,"series":[...]}]}. If a series inside of the result is split, we don't have the ability to split an array within a map in JSON so we had to have "partial": true to signal to the client that we were doing that. But while that makes it possible to reconstruct the full set of results, it doesn't make actually iterating through those series or rows easier.

So the idea is to instead find some way to split the array within results and the array within series so that they can be split between different chunks within the HTTP response without causing any issues. The working example I have in my head at the moment is something like this (and mostly just stolen from how Transfer-Encoding: chunked is implemented). Each of the following would be sent as a separate JSON or MessagePack blob (format doesn't matter).

response header: {} or {"error": "error message"}
|- number of results in chunk (0 terminates this block)
|- result header: {"statement_id":0, "columns":["time", ...], "messages": [...]}
|--- number of series in this chunk
...

While the idea isn't fully fleshed out, the idea is that the server doesn't have to close the nested structure for the result because a nested structure doesn't exist. Instead, there's a format of blobs in a specified structure that is meant to be read iteratively. In that way, it makes it easier to read each entry one by one.

I hope that helps explain the idea behind the issues well enough. I haven't fleshed out the format fully, but the current way I envision it would involve it working well with MessagePack.

Sineos · 2017-06-06T05:24:03Z

I was more thinking about the client side to handle the chunks. Like the HTTP Range Request but built into Influx:

Client: Do query. Request ranged
Server: For that query I will return 10,000 results with this {...} structure
Client: Send the first 1,000 results
Server: Sends the first 1,000 as fully contained JSON or blobs
Client: Processes first 1,000 then requests next 1,000

This means the client can decide (depending its processing power and type of of application) at which speed it will request the data.
The fact if this is encapuseled in a chunked HTTP stream or not is up to the HTTP protocol layer and mostly transparent to the client.

jsternberg added area/queries kind/feature-request proposed labels Jun 3, 2017

jsternberg mentioned this issue Jun 12, 2017

Refactor how results are returned to the client #8477

Closed

5 tasks

jsternberg self-assigned this Jun 12, 2017

jsternberg added review and removed proposed labels Jun 12, 2017

tagyoureit mentioned this issue Jun 19, 2017

Query can return both strings and parseable JSON #8508

Closed

jsternberg closed this as completed Apr 5, 2018

ghost removed the review label Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: New response structure #8450

Proposal: New response structure #8450

jsternberg commented Jun 3, 2017

Sineos commented Jun 3, 2017

jsternberg commented Jun 5, 2017

Sineos commented Jun 6, 2017

Proposal: New response structure #8450

Proposal: New response structure #8450

Comments

jsternberg commented Jun 3, 2017

Feature Request

Sineos commented Jun 3, 2017

jsternberg commented Jun 5, 2017

Sineos commented Jun 6, 2017