-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: New response structure #8450
Comments
Maybe according to the HTTP standard in RFC2616. It leves this up to the client. See e.g. https://benramsey.com/blog/2008/05/206-partial-content-and-range-requests/ |
Thank you @Sineos, but I don't think the problem isn't as simple as something that can be solved as part of the HTTP standard. Partial content and range requests is a good idea and I've considered it before, but it doesn't necessarily match with the query workflow. To expand, the issue is mostly with how things are encoded. For each chunk that is sent back to the client, it has to be parseable. So you can't just split a JSON response between two chunks as you won't be able to read it iteratively. Because of that, we need to send JSON chunks instead. But, we send back the chunks with nested structures like So the idea is to instead find some way to split the array within
While the idea isn't fully fleshed out, the idea is that the server doesn't have to close the nested structure for the result because a nested structure doesn't exist. Instead, there's a format of blobs in a specified structure that is meant to be read iteratively. In that way, it makes it easier to read each entry one by one. I hope that helps explain the idea behind the issues well enough. I haven't fleshed out the format fully, but the current way I envision it would involve it working well with MessagePack. |
I was more thinking about the client side to handle the chunks. Like the HTTP Range Request but built into Influx:
This means the client can decide (depending its processing power and type of of application) at which speed it will request the data. |
This issue is directly tied to #8330 and really can't be separated from it. If accepted, both of these would be implemented at the same time.
Feature Request
I think we should reconsider the response structure that we want to be used for the client. I do not think the current one is user-friendly towards either users or API writers. To be clear, when I say response structure, I mean the way the data is structured as it comes back. I do not mean the format like JSON, MessagePack, Protocol Buffers, etc. I think the format itself is broken and based on concepts that haven't existed in the query engine since before 1.0 (or at least not how I think of them while working on the query engine). Changing the format will not fix underlying issues that get run into when programming clients.
I've seen issues that indicate people have difficulty with a large amount of data and I have experienced the problem myself while trying to work on the experimental client.
Large amounts of data means I want to be able to process the stream iteratively, but the current format makes this difficult because we return "results" that contain "series" (which we call "rows" in the code and "series" everywhere that is user facing) and inside of those series are the "values" (which should be called rows).
When we need to break down the data so it gets returned in chunks, we use the concept of a partial result or a partial series. The partial result says that the next result is actually part of the current result and a partial series means the next series is part of the current series. But, when you get into nesting and actually reading from the cursor, you end up with this absurdly complicated code to correctly find which result the next series is in. As an example, here is the code that I created for the experimental client that fully handles partials in a way so the user does not have to worry about partials: https://github.com/influxdata/influxdb-client/blob/2d000abc81dd6588d94ee5578810e9fc64bec5ab/json_cursor.go
I consider this code to be unreadable and I believe it would hamper adoption since you cannot realistically use chunking. I say this as the person who added that feature as an answer to large amounts of data.
But, the idea behind chunking was to make it a default so data would be streamed rather than forcing a potentially large amount of memory to be allocated on the server. It was supposed to solve problems. While it has not solved those problems, that doesn't mean the original problems went away. We still need a way to stream points back to the user instead of returning it as a JSON blob.
With that, I think we should discuss ways that we can potentially change the format. Internal to the query engine, we already return the data in a way that would work. But then we aggregate it and return it in our monster format that's difficult to impossible to read correctly. We will obviously have to maintain the same format with JSON and CSV, but I think we should consider a structure that works better with streams for MessagePack before we merge that and then start using that for any future clients.
The text was updated successfully, but these errors were encountered: