-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Stop using JSON as the standard response format #8330
Comments
/cc @stuartcarnie @rbetts |
@jsternberg Kapacitor suffers from this because it cannot tell whether queried data is an int64 or a float64. I do not have a strong opinion which serialization format to use just that it can correctly communicate type information. My preference would be a self describing format since that makes writing clients easier, no need for an IDL or generated code. Of the list you provide I believe only MessagePack is self describing. As for changing the schema I don't think we should. Lets tackle one problem at a time, and the current problem is type information loss in our serialization format. NOTE: It is possible for the columns per group to be different if the fields are scoped to a tag set. I actually added a test case for this to Kapacitor this morning see influxdata/kapacitor#1320 |
I'm in favor of this. We moved away from JSON in write path for similar reasons. I like that we can add additional formats while maintaining backwards compatibility. |
If we want a quicker fix for this, and since we can support multiple formats anyway, we can always go towards merging #7154 (after a rebase) and have it in by 1.3. I think flatbuffers is likely the way to go in the future, but message pack would definitely be an improvement and I don't think supporting both is a negative. |
+1 for adding msgpack sooner rather than later. It is just an serialization protocol change so it doesn't effect existing clients or behavior. Clients (like Kapacitor) that want to take advantage of it can. |
There are set of related issues that we have gathered for shortly after the 1.3.0 milestone - this is in that list. |
The project I'm currently working on involves an in-memory IoT analysis tool tightly connected to a single instance disk database. InfluxDB provided good features to fit most of the requirements I had, but the improved performance of one of the proposed formats, concerning both memory usage and time would be very useful. I hadn't seen the msgpack implementation, so I also ran some tests with a protobuf encoder, just to have an overall idea of the possible improvements, and the results were similar to the ones obtained. I'd be interested in having a similar option too. |
@jsternberg added a msgpack response format: #8897 |
Feature Request
I think we should stop using JSON as the standard recommended response format. JSON is just not a suitable technology for our use case and provides some rather negative limitations that prevent the system from operating in all of the cases we claim to support.
Proposal: Support an alternative format and recommend all clients use that format instead of JSON. JSON will still and always be supported, but that doesn't mean we have to actively use it.
Current behavior: JSON is the only practically usable format. While CSV exists, it is there for exporting and not there for generic usage in a marshaling/unmarshaling capability.
Desired behavior: Support an alternative format that acts as the standard marshaling format used by client libraries.
JSON will still be the default, but not the recommended. So if a client doesn't ask for a specific format, it gets JSON and all of its flaws. But client libraries will be modified and encouraged to use the alternative format instead.
Use case: JSON provides a number of limitations for clients that unmarshal the response (which would be all of them).
cpu value=2i
and the client will have no idea if it's dealing with integers or floats so it will return[float64(2)]
. If you try to use the decimal point to determine if something is an integer or a float, you can end up withcpu value=2
(where it should be a float) and it will return that as an integer. That means the client could return something like this[int64(2), float64(2.5)]
. For a user-friendly client, we need to tell the client which type it is instead of forcing it to guess.The last one isn't that big of a concern. While performance is always a concern, we have not found marshaling to be a limiting factor. Developer usability comes first. Some possible formats that need to be investigated:
All of these should meet the minimum requirements. Since JSON is still supported and still acts as the global interchange format, we can use more complex unmarshaling since this aspect is focused on supported clients.
Additionally, we should also consider if the current schema for returned values is still appropriate. While the JSON format should never change, the current schema format doesn't make that much sense. One of the big issues is that the columns for the statement are returned with each series. So if you have something like
GROUP BY *
, you will get thecolumns
repeated for every series, but the columns should be included with the result and not the series. We should also evaluate if there's a better way to represent the response output other than chunking since chunking is fairly confusing and also hard to implement from the client side.The text was updated successfully, but these errors were encountered: