Investigate json-iterator as a replacement for encoding/json #3254

tsandall · 2021-03-11T18:38:04Z

https://github.com/json-iterator/go looks like a promising replacement for the encoding/json package from the standard library (which we use all over the place). It would be worthwhile to investigate how we could incorporate it into OPA and what kind of performance wins it could provide.

charlieegan3 · 2023-07-11T15:49:33Z

I ran two tests with this library after modifying the implementation of readInputPostV1 to use json-iterator. I used k6.

diff --git a/server/server.go b/server/server.go
index 0b60c2a7c..57152f554 100644
--- a/server/server.go
+++ b/server/server.go
@@ -25,6 +25,8 @@ import (
        "sync"
        "time"
 
+       jsoniter "github.com/json-iterator/go"
+
        serverEncodingPlugin "github.com/open-policy-agent/opa/plugins/server/encoding"
 
        "github.com/gorilla/mux"
@@ -2806,7 +2808,8 @@ func readInputPostV1(r *http.Request) (ast.Value, error) {
                        }
                }
        } else {
-               dec := util.NewJSONDecoder(body)
+               var j = jsoniter.ConfigCompatibleWithStandardLibrary
+               dec := j.NewDecoder(body)
                if err := dec.Decode(&request); err != nil && err != io.EOF {
                        return nil, fmt.Errorf("body contains malformed input document: %w", err)

Test 1: Large Input Example

I generated some large input JSON to see if the library was particularly suited to large JSON files:

$ opa eval '{key: value| value := numbers.range(1,100000)[_]; key := sprintf("%d", [value])}' --format=raw > data.json

test.js

import { check } from 'k6';
import http from 'k6/http';
import { Trend, Gauge } from 'k6/metrics';

const endpoint = `http://localhost:8181/v1/data/foobar?metrics`;
const endpointMetrics = `http://localhost:8181/metrics`;

const parsedData = JSON.parse(open('data.json'))

const handlerTimer = new Trend('timer_server_handler_ns');
const handlerInputParse = new Trend('timer_rego_input_parse_ns');
const heapInuseBytes = new Gauge('heap_inuse_bytes');

var initialMeasurement = false;
export default function () {
    const query = Math.floor(Math.random() * Object.keys(parsedData).length);
    const input = {
        "query": query.toString(),
        "numbers": parsedData,
    };

    const r = http.post(endpoint, JSON.stringify({input}), {
        headers: { 'Content-Type': 'application/json' },
    });

    handlerTimer.add(r.json().metrics.timer_server_handler_ns);
    handlerInputParse.add(r.json().metrics.timer_rego_input_parse_ns);

    if (Math.random() <= 0.1 || !initialMeasurement) {
        const met = http.get(endpointMetrics);
        for (const line of met.body.split(/\n/)) {
            if (line.startsWith("go_memstats_heap_inuse_bytes")) {
                const [key, val] = line.split(" ");
                heapInuseBytes.add(val);
            }
        }
        initialMeasurement = true;
    }

    check(r, {
        'expected result': r.json().result[0] === query
    });
}

export function handleSummary(data) {
    const rps = data.metrics.iterations.values.rate;
    const handlerTime = data.metrics.timer_server_handler_ns.values.avg/1000/1000/1000;
    const handlerInputParse = data.metrics.timer_rego_input_parse_ns.values.avg/1000/1000/1000;
    const heapSizeMax = data.metrics.heap_inuse_bytes.values.max/1024/1024/1024
    const output = `
Results:
  total requests:             ${data.metrics.iterations.values.count}
  requests per second (mean): ${round(rps)}
  server handler time (mean): ${round(handlerTime)}s
  server input parse time (mean): ${round(handlerInputParse)}s
  server heap size (max):     ${round(heapSizeMax)}GB
`
    return {
        stdout: output,
    };
}

function round(num) {
    return +(Math.round(num + "e+2")  + "e-2");
}

policy.rego

package foobar

result[value] {
  value := input.numbers[input.query]
}

I didn't see any change in the results:

$ opa run -s --addr=localhost:8181 policy.rego
# vs with changes
$ go run main.go run -s --addr=localhost:8181 policy.rego

$ k6 run -u 1 -d 5s test.js
Results:
  total requests:             32
  requests per second (mean): 6.39
  server handler time (mean): 0.11s
  server input parse time (mean): 0.11s
  server heap size (max):     0.06GB

vs 

$ k6 run -u 1 -d 5s test.js
Results:
  total requests:             33
  requests per second (mean): 6.45
  server handler time (mean): 0.11s
  server input parse time (mean): 0.11s
  server heap size (max):     0.06GB

Test 2: K8s Deployment

This test was meant to represent a 'typical' use case with more realistic input data.

deployment.json

{
  "apiVersion": "apps/v1",
  "kind": "Deployment",
  "metadata": {
    "name": "example-app",
    "labels": {
      "app": "example-app"
    }
  },
  "spec": {
    "replicas": 1,
    "selector": {
      "matchLabels": {
        "app": "example-app"
      }
    },
    "template": {
      "metadata": {
        "labels": {
          "app": "example-app"
        }
      },
      "spec": {
        "initContainers": [
          {
            "name": "proxy-init",
            "image": "openpolicyagent/proxy_init:v5",
            "args": [
              "-p",
              "8000",
              "-u",
              "1111",
              "-w",
              "8282"
            ],
            "securityContext": {
              "capabilities": {
                "add": [
                  "NET_ADMIN"
                ]
              },
              "runAsNonRoot": false,
              "runAsUser": 0
            }
          }
        ],
        "containers": [
          {
            "name": "app",
            "image": "openpolicyagent/demo-test-server:v1",
            "ports": [
              {
                "containerPort": 8080
              }
            ]
          },
          {
            "name": "envoy",
            "image": "envoyproxy/envoy:v1.20.0",
            "env": [
              {
                "name": "ENVOY_UID",
                "value": "1111"
              }
            ],
            "volumeMounts": [
              {
                "readOnly": true,
                "mountPath": "/config",
                "name": "proxy-config"
              },
              {
                "readOnly": false,
                "mountPath": "/run/opa/sockets",
                "name": "opa-socket"
              }
            ],
            "args": [
              "envoy",
              "--log-level",
              "debug",
              "--config-path",
              "/config/envoy.yaml"
            ]
          },
          {
            "name": "opa-envoy",
            "image": "openpolicyagent/opa:latest-envoy",
            "securityContext": {
              "runAsUser": 1111
            },
            "volumeMounts": [
              {
                "readOnly": true,
                "mountPath": "/policy",
                "name": "opa-policy"
              },
              {
                "readOnly": true,
                "mountPath": "/config",
                "name": "opa-envoy-config"
              },
              {
                "readOnly": false,
                "mountPath": "/run/opa/sockets",
                "name": "opa-socket"
              }
            ],
            "args": [
              "run",
              "--server",
              "--config-file=/config/config.yaml",
              "--addr=localhost:8181",
              "--diagnostic-addr=0.0.0.0:8282",
              "--ignore=.*",
              "/policy/policy.rego"
            ],
            "livenessProbe": {
              "httpGet": {
                "path": "/health?plugins",
                "scheme": "HTTP",
                "port": 8282
              },
              "initialDelaySeconds": 5,
              "periodSeconds": 15
            },
            "readinessProbe": {
              "httpGet": {
                "path": "/health?plugins",
                "scheme": "HTTP",
                "port": 8282
              },
              "initialDelaySeconds": 5,
              "periodSeconds": 15
            }
          }
        ],
        "volumes": [
          {
            "name": "proxy-config",
            "configMap": {
              "name": "proxy-config"
            }
          },
          {
            "name": "opa-policy",
            "configMap": {
              "name": "opa-policy"
            }
          },
          {
            "name": "opa-envoy-config",
            "configMap": {
              "name": "opa-envoy-config"
            }
          },
          {
            "name": "opa-socket",
            "emptyDir": {
            }
          }
        ]
      }
    }
  }
}

test.js

import { check } from 'k6';
import http from 'k6/http';
import { Trend, Gauge } from 'k6/metrics';

const endpoint = `http://localhost:8181/v1/data/foobar?metrics`;
const endpointMetrics = `http://localhost:8181/metrics`;

const parsedData = JSON.parse(open('data.json'))

const handlerTimer = new Trend('timer_server_handler_ns');
const handlerInputParse = new Trend('timer_rego_input_parse_ns');
const heapInuseBytes = new Gauge('heap_inuse_bytes');

var initialMeasurement = false;
export default function () {
    const input = {
        "deployment": parsedData,
    };

    const r = http.post(endpoint, JSON.stringify({input}), {
        headers: { 'Content-Type': 'application/json' },
    });

    handlerTimer.add(r.json().metrics.timer_server_handler_ns);
    handlerInputParse.add(r.json().metrics.timer_rego_input_parse_ns);

    if (Math.random() <= 0.1 || !initialMeasurement) {
        const met = http.get(endpointMetrics);
        for (const line of met.body.split(/\n/)) {
            if (line.startsWith("go_memstats_heap_inuse_bytes")) {
                const [key, val] = line.split(" ");
                heapInuseBytes.add(val);
            }
        }
        initialMeasurement = true;
    }

    check(r, {
        'expected result': r.json().result[0] === "envoyproxy/envoy:v1.20.0"
    });
}

export function handleSummary(data) {
    const rps = data.metrics.iterations.values.rate;
    const handlerTime = data.metrics.timer_server_handler_ns.values.avg/1000/1000;
    const handlerInputParse = data.metrics.timer_rego_input_parse_ns.values.avg/1000/1000;
    const heapSizeMax = data.metrics.heap_inuse_bytes.values.max/1024/1024/1024
    const output = `
Results:
  total requests:             ${data.metrics.iterations.values.count}
  requests per second (mean): ${round(rps)}
  server handler time (mean): ${round(handlerTime)}ms
  server input parse time (mean): ${round(handlerInputParse)}ms
  server heap size (max):     ${round(heapSizeMax)}GB
`
    return {
        stdout: output,
    };
}

function round(num) {
    return +(Math.round(num + "e+2")  + "e-2");
}

policy.rego

package foobar

result[value] {
  value := input.deployment.spec.template.spec.containers[1].image
}

I didn't see any change in the results:

$ opa run -s --addr=localhost:8181 policy.rego
# vs with changes
$ go run main.go run -s --addr=localhost:8181 policy.rego

$ k6 run -u 1 -d 5s test.js
Results:
  total requests:             12867
  requests per second (mean): 2573.1
  server handler time (mean): 0.11ms
  server input parse time (mean): 0.08ms
  server heap size (max):     0.01GB

vs 

$ k6 run -u 1 -d 5s test.js
Results:
  total requests:             12868
  requests per second (mean): 2573.46
  server handler time (mean): 0.1ms
  server input parse time (mean): 0.08ms
  server heap size (max):     0.01GB

This seems unexpected, particularly that server input parse time doesn't change... Could someone else sanity check my approach here? 😕

srenatus · 2023-07-11T19:01:06Z

Thanks for investigating this! 🙌

One thing I could imagine is that the performance improvements could depend on type hints of some sort -- if you unmarshal into a struct, the unmarshaler knows something about your data. If you unmarshal into map[string]any, or any, there's no help whatsoever.

Unrelated to that concern, however, perhaps we could see benefits from what's mentioned here?

reuse the underlying Stream or Iterator instance. jsoniter.ConfigFastest.BorrowIterator or jsoniter.ConfigFastest.BorrowStream. Just remember to return them when done.

But from a distance, that might mean we need to sacrifice float precision (which could be acceptable) -- but then we could also benchmark main vs jsoniter.ConfigFastest. 🤔

tsandall added the design label Mar 11, 2021

anderseknert mentioned this issue Mar 23, 2021

add fast count for object size #3310

Open

anderseknert mentioned this issue Jun 30, 2021

Slow http.send cache deserialization #3599

Closed

tsandall added the optimization label Dec 2, 2021

anderseknert mentioned this issue May 27, 2022

Allow for adding objects to storage without round-tripping through JSON #4708

Closed

ashutosh-narkar added the requires-investigation Issues not under active investigation but requires one label Mar 31, 2023

ashutosh-narkar added this to Open Policy Agent Aug 5, 2024

ashutosh-narkar moved this to Backlog in Open Policy Agent Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate json-iterator as a replacement for encoding/json #3254

Investigate json-iterator as a replacement for encoding/json #3254

tsandall commented Mar 11, 2021

charlieegan3 commented Jul 11, 2023

srenatus commented Jul 11, 2023

Investigate json-iterator as a replacement for encoding/json #3254

Investigate json-iterator as a replacement for encoding/json #3254

Comments

tsandall commented Mar 11, 2021

charlieegan3 commented Jul 11, 2023

Test 1: Large Input Example

Test 2: K8s Deployment

srenatus commented Jul 11, 2023