Proposal: Stop using JSON as the standard response format #8330

jsternberg · 2017-04-27T16:47:46Z

Feature Request

I think we should stop using JSON as the standard recommended response format. JSON is just not a suitable technology for our use case and provides some rather negative limitations that prevent the system from operating in all of the cases we claim to support.

Proposal: Support an alternative format and recommend all clients use that format instead of JSON. JSON will still and always be supported, but that doesn't mean we have to actively use it.

Current behavior: JSON is the only practically usable format. While CSV exists, it is there for exporting and not there for generic usage in a marshaling/unmarshaling capability.

Desired behavior: Support an alternative format that acts as the standard marshaling format used by client libraries.

JSON will still be the default, but not the recommended. So if a client doesn't ask for a specific format, it gets JSON and all of its flaws. But client libraries will be modified and encouraged to use the alternative format instead.

Use case: JSON provides a number of limitations for clients that unmarshal the response (which would be all of them).

JSON does not support integers above 2^53 because it represents everything as a float and floats cannot reliably represent numbers above 2^53. That means that storing values as a large integer will commonly lead to the database returning the wrong value. This isn't the fault of the storage engine or the query engine, but the marshaling format that loses the precision.
JSON cannot tell the difference between a float and an integer because it represents them the same way. This means that you can do an insert like cpu value=2i and the client will have no idea if it's dealing with integers or floats so it will return [float64(2)]. If you try to use the decimal point to determine if something is an integer or a float, you can end up with cpu value=2 (where it should be a float) and it will return that as an integer. That means the client could return something like this [int64(2), float64(2.5)]. For a user-friendly client, we need to tell the client which type it is instead of forcing it to guess.
JSON has limited performance characteristics. Marshaling and unmarshaling commonly use a lot of memory allocations and there's a lot of extra bytes for just the structure. While not as bad as XML, it's not nearly as good as an interchange format should be. The JSON serializers that allow you to compile the code used to marshal aren't nearly as fast as some of the other languages.

The last one isn't that big of a concern. While performance is always a concern, we have not found marshaling to be a limiting factor. Developer usability comes first. Some possible formats that need to be investigated:

MessagePack
Protocol Buffers
Thrift
Flat Buffers

All of these should meet the minimum requirements. Since JSON is still supported and still acts as the global interchange format, we can use more complex unmarshaling since this aspect is focused on supported clients.

Additionally, we should also consider if the current schema for returned values is still appropriate. While the JSON format should never change, the current schema format doesn't make that much sense. One of the big issues is that the columns for the statement are returned with each series. So if you have something like GROUP BY *, you will get the columns repeated for every series, but the columns should be included with the result and not the series. We should also evaluate if there's a better way to represent the response output other than chunking since chunking is fairly confusing and also hard to implement from the client side.

The text was updated successfully, but these errors were encountered:

jsternberg · 2017-04-27T16:47:58Z

/cc @stuartcarnie @rbetts

nathanielc · 2017-04-27T19:46:39Z

@jsternberg Kapacitor suffers from this because it cannot tell whether queried data is an int64 or a float64.
See influxdata/kapacitor#1338

I do not have a strong opinion which serialization format to use just that it can correctly communicate type information. My preference would be a self describing format since that makes writing clients easier, no need for an IDL or generated code. Of the list you provide I believe only MessagePack is self describing.

As for changing the schema I don't think we should. Lets tackle one problem at a time, and the current problem is type information loss in our serialization format.

NOTE: It is possible for the columns per group to be different if the fields are scoped to a tag set. I actually added a test case for this to Kapacitor this morning see influxdata/kapacitor#1320

jwilder · 2017-04-27T20:09:14Z

I'm in favor of this. We moved away from JSON in write path for similar reasons. I like that we can add additional formats while maintaining backwards compatibility.

jsternberg · 2017-04-27T20:11:55Z

If we want a quicker fix for this, and since we can support multiple formats anyway, we can always go towards merging #7154 (after a rebase) and have it in by 1.3. I think flatbuffers is likely the way to go in the future, but message pack would definitely be an improvement and I don't think supporting both is a negative.

nathanielc · 2017-04-27T20:16:17Z

+1 for adding msgpack sooner rather than later. It is just an serialization protocol change so it doesn't effect existing clients or behavior. Clients (like Kapacitor) that want to take advantage of it can.

rbetts · 2017-04-27T23:45:21Z

There are set of related issues that we have gathered for shortly after the 1.3.0 milestone - this is in that list.

mcbortolozzo · 2017-05-12T08:42:54Z

The project I'm currently working on involves an in-memory IoT analysis tool tightly connected to a single instance disk database. InfluxDB provided good features to fit most of the requirements I had, but the improved performance of one of the proposed formats, concerning both memory usage and time would be very useful. I hadn't seen the msgpack implementation, so I also ran some tests with a protobuf encoder, just to have an overall idea of the possible improvements, and the results were similar to the ones obtained. I'd be interested in having a similar option too.

rbetts · 2017-10-26T15:20:13Z

@jsternberg added a msgpack response format: #8897

jsternberg added area/queries kind/feature-request labels Apr 27, 2017

rbetts self-assigned this Apr 27, 2017

rbetts added the proposed label Apr 27, 2017

jsternberg mentioned this issue Jun 3, 2017

Proposal: New response structure #8450

Closed

bbczeuz mentioned this issue Aug 16, 2017

[bug] Inconsistent data type (int/float) within single field/column in results #8707

Closed

rbetts closed this as completed Oct 26, 2017

ghost removed the proposed label Oct 26, 2017

rbetts removed their assignment Oct 26, 2017

lxhoan mentioned this issue Jul 5, 2018

Use Jackson (instead of Moshi) as the JSON parser library influxdata/influxdb-java#464

Closed

lovasoa mentioned this issue Jul 10, 2019

Add support for messagepack influxdata/influxdb-python#733

Closed

zhangjianweibj mentioned this issue Dec 1, 2020

Fix influx response format apache/skywalking#5932

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Stop using JSON as the standard response format #8330

Proposal: Stop using JSON as the standard response format #8330

jsternberg commented Apr 27, 2017

jsternberg commented Apr 27, 2017

nathanielc commented Apr 27, 2017 •

edited

Loading

jwilder commented Apr 27, 2017

jsternberg commented Apr 27, 2017

nathanielc commented Apr 27, 2017

rbetts commented Apr 27, 2017

mcbortolozzo commented May 12, 2017

rbetts commented Oct 26, 2017

Proposal: Stop using JSON as the standard response format #8330

Proposal: Stop using JSON as the standard response format #8330

Comments

jsternberg commented Apr 27, 2017

Feature Request

jsternberg commented Apr 27, 2017

nathanielc commented Apr 27, 2017 • edited Loading

jwilder commented Apr 27, 2017

jsternberg commented Apr 27, 2017

nathanielc commented Apr 27, 2017

rbetts commented Apr 27, 2017

mcbortolozzo commented May 12, 2017

rbetts commented Oct 26, 2017

nathanielc commented Apr 27, 2017 •

edited

Loading