Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate json-iterator as a replacement for encoding/json #3254

Open
tsandall opened this issue Mar 11, 2021 · 2 comments
Open

Investigate json-iterator as a replacement for encoding/json #3254

tsandall opened this issue Mar 11, 2021 · 2 comments
Labels
design optimization requires-investigation Issues not under active investigation but requires one

Comments

@tsandall
Copy link
Member

https://github.com/json-iterator/go looks like a promising replacement for the encoding/json package from the standard library (which we use all over the place). It would be worthwhile to investigate how we could incorporate it into OPA and what kind of performance wins it could provide.

@charlieegan3
Copy link
Contributor

I ran two tests with this library after modifying the implementation of readInputPostV1 to use json-iterator. I used k6.

diff --git a/server/server.go b/server/server.go
index 0b60c2a7c..57152f554 100644
--- a/server/server.go
+++ b/server/server.go
@@ -25,6 +25,8 @@ import (
        "sync"
        "time"
 
+       jsoniter "github.com/json-iterator/go"
+
        serverEncodingPlugin "github.com/open-policy-agent/opa/plugins/server/encoding"
 
        "github.com/gorilla/mux"
@@ -2806,7 +2808,8 @@ func readInputPostV1(r *http.Request) (ast.Value, error) {
                        }
                }
        } else {
-               dec := util.NewJSONDecoder(body)
+               var j = jsoniter.ConfigCompatibleWithStandardLibrary
+               dec := j.NewDecoder(body)
                if err := dec.Decode(&request); err != nil && err != io.EOF {
                        return nil, fmt.Errorf("body contains malformed input document: %w", err)

Test 1: Large Input Example

I generated some large input JSON to see if the library was particularly suited to large JSON files:

$ opa eval '{key: value| value := numbers.range(1,100000)[_]; key := sprintf("%d", [value])}' --format=raw > data.json
test.js
import { check } from 'k6';
import http from 'k6/http';
import { Trend, Gauge } from 'k6/metrics';

const endpoint = `http://localhost:8181/v1/data/foobar?metrics`;
const endpointMetrics = `http://localhost:8181/metrics`;

const parsedData = JSON.parse(open('data.json'))

const handlerTimer = new Trend('timer_server_handler_ns');
const handlerInputParse = new Trend('timer_rego_input_parse_ns');
const heapInuseBytes = new Gauge('heap_inuse_bytes');

var initialMeasurement = false;
export default function () {
    const query = Math.floor(Math.random() * Object.keys(parsedData).length);
    const input = {
        "query": query.toString(),
        "numbers": parsedData,
    };

    const r = http.post(endpoint, JSON.stringify({input}), {
        headers: { 'Content-Type': 'application/json' },
    });

    handlerTimer.add(r.json().metrics.timer_server_handler_ns);
    handlerInputParse.add(r.json().metrics.timer_rego_input_parse_ns);

    if (Math.random() <= 0.1 || !initialMeasurement) {
        const met = http.get(endpointMetrics);
        for (const line of met.body.split(/\n/)) {
            if (line.startsWith("go_memstats_heap_inuse_bytes")) {
                const [key, val] = line.split(" ");
                heapInuseBytes.add(val);
            }
        }
        initialMeasurement = true;
    }

    check(r, {
        'expected result': r.json().result[0] === query
    });
}

export function handleSummary(data) {
    const rps = data.metrics.iterations.values.rate;
    const handlerTime = data.metrics.timer_server_handler_ns.values.avg/1000/1000/1000;
    const handlerInputParse = data.metrics.timer_rego_input_parse_ns.values.avg/1000/1000/1000;
    const heapSizeMax = data.metrics.heap_inuse_bytes.values.max/1024/1024/1024
    const output = `
Results:
  total requests:             ${data.metrics.iterations.values.count}
  requests per second (mean): ${round(rps)}
  server handler time (mean): ${round(handlerTime)}s
  server input parse time (mean): ${round(handlerInputParse)}s
  server heap size (max):     ${round(heapSizeMax)}GB
`
    return {
        stdout: output,
    };
}

function round(num) {
    return +(Math.round(num + "e+2")  + "e-2");
}
policy.rego
package foobar

result[value] {
  value := input.numbers[input.query]
}

I didn't see any change in the results:

$ opa run -s --addr=localhost:8181 policy.rego
# vs with changes
$ go run main.go run -s --addr=localhost:8181 policy.rego
$ k6 run -u 1 -d 5s test.js
Results:
  total requests:             32
  requests per second (mean): 6.39
  server handler time (mean): 0.11s
  server input parse time (mean): 0.11s
  server heap size (max):     0.06GB

vs 

$ k6 run -u 1 -d 5s test.js
Results:
  total requests:             33
  requests per second (mean): 6.45
  server handler time (mean): 0.11s
  server input parse time (mean): 0.11s
  server heap size (max):     0.06GB

Test 2: K8s Deployment

This test was meant to represent a 'typical' use case with more realistic input data.

deployment.json
{
  "apiVersion": "apps/v1",
  "kind": "Deployment",
  "metadata": {
    "name": "example-app",
    "labels": {
      "app": "example-app"
    }
  },
  "spec": {
    "replicas": 1,
    "selector": {
      "matchLabels": {
        "app": "example-app"
      }
    },
    "template": {
      "metadata": {
        "labels": {
          "app": "example-app"
        }
      },
      "spec": {
        "initContainers": [
          {
            "name": "proxy-init",
            "image": "openpolicyagent/proxy_init:v5",
            "args": [
              "-p",
              "8000",
              "-u",
              "1111",
              "-w",
              "8282"
            ],
            "securityContext": {
              "capabilities": {
                "add": [
                  "NET_ADMIN"
                ]
              },
              "runAsNonRoot": false,
              "runAsUser": 0
            }
          }
        ],
        "containers": [
          {
            "name": "app",
            "image": "openpolicyagent/demo-test-server:v1",
            "ports": [
              {
                "containerPort": 8080
              }
            ]
          },
          {
            "name": "envoy",
            "image": "envoyproxy/envoy:v1.20.0",
            "env": [
              {
                "name": "ENVOY_UID",
                "value": "1111"
              }
            ],
            "volumeMounts": [
              {
                "readOnly": true,
                "mountPath": "/config",
                "name": "proxy-config"
              },
              {
                "readOnly": false,
                "mountPath": "/run/opa/sockets",
                "name": "opa-socket"
              }
            ],
            "args": [
              "envoy",
              "--log-level",
              "debug",
              "--config-path",
              "/config/envoy.yaml"
            ]
          },
          {
            "name": "opa-envoy",
            "image": "openpolicyagent/opa:latest-envoy",
            "securityContext": {
              "runAsUser": 1111
            },
            "volumeMounts": [
              {
                "readOnly": true,
                "mountPath": "/policy",
                "name": "opa-policy"
              },
              {
                "readOnly": true,
                "mountPath": "/config",
                "name": "opa-envoy-config"
              },
              {
                "readOnly": false,
                "mountPath": "/run/opa/sockets",
                "name": "opa-socket"
              }
            ],
            "args": [
              "run",
              "--server",
              "--config-file=/config/config.yaml",
              "--addr=localhost:8181",
              "--diagnostic-addr=0.0.0.0:8282",
              "--ignore=.*",
              "/policy/policy.rego"
            ],
            "livenessProbe": {
              "httpGet": {
                "path": "/health?plugins",
                "scheme": "HTTP",
                "port": 8282
              },
              "initialDelaySeconds": 5,
              "periodSeconds": 15
            },
            "readinessProbe": {
              "httpGet": {
                "path": "/health?plugins",
                "scheme": "HTTP",
                "port": 8282
              },
              "initialDelaySeconds": 5,
              "periodSeconds": 15
            }
          }
        ],
        "volumes": [
          {
            "name": "proxy-config",
            "configMap": {
              "name": "proxy-config"
            }
          },
          {
            "name": "opa-policy",
            "configMap": {
              "name": "opa-policy"
            }
          },
          {
            "name": "opa-envoy-config",
            "configMap": {
              "name": "opa-envoy-config"
            }
          },
          {
            "name": "opa-socket",
            "emptyDir": {
            }
          }
        ]
      }
    }
  }
}
test.js
import { check } from 'k6';
import http from 'k6/http';
import { Trend, Gauge } from 'k6/metrics';

const endpoint = `http://localhost:8181/v1/data/foobar?metrics`;
const endpointMetrics = `http://localhost:8181/metrics`;

const parsedData = JSON.parse(open('data.json'))

const handlerTimer = new Trend('timer_server_handler_ns');
const handlerInputParse = new Trend('timer_rego_input_parse_ns');
const heapInuseBytes = new Gauge('heap_inuse_bytes');

var initialMeasurement = false;
export default function () {
    const input = {
        "deployment": parsedData,
    };

    const r = http.post(endpoint, JSON.stringify({input}), {
        headers: { 'Content-Type': 'application/json' },
    });

    handlerTimer.add(r.json().metrics.timer_server_handler_ns);
    handlerInputParse.add(r.json().metrics.timer_rego_input_parse_ns);

    if (Math.random() <= 0.1 || !initialMeasurement) {
        const met = http.get(endpointMetrics);
        for (const line of met.body.split(/\n/)) {
            if (line.startsWith("go_memstats_heap_inuse_bytes")) {
                const [key, val] = line.split(" ");
                heapInuseBytes.add(val);
            }
        }
        initialMeasurement = true;
    }

    check(r, {
        'expected result': r.json().result[0] === "envoyproxy/envoy:v1.20.0"
    });
}

export function handleSummary(data) {
    const rps = data.metrics.iterations.values.rate;
    const handlerTime = data.metrics.timer_server_handler_ns.values.avg/1000/1000;
    const handlerInputParse = data.metrics.timer_rego_input_parse_ns.values.avg/1000/1000;
    const heapSizeMax = data.metrics.heap_inuse_bytes.values.max/1024/1024/1024
    const output = `
Results:
  total requests:             ${data.metrics.iterations.values.count}
  requests per second (mean): ${round(rps)}
  server handler time (mean): ${round(handlerTime)}ms
  server input parse time (mean): ${round(handlerInputParse)}ms
  server heap size (max):     ${round(heapSizeMax)}GB
`
    return {
        stdout: output,
    };
}

function round(num) {
    return +(Math.round(num + "e+2")  + "e-2");
}
policy.rego
package foobar

result[value] {
  value := input.deployment.spec.template.spec.containers[1].image
}

I didn't see any change in the results:

$ opa run -s --addr=localhost:8181 policy.rego
# vs with changes
$ go run main.go run -s --addr=localhost:8181 policy.rego
$ k6 run -u 1 -d 5s test.js
Results:
  total requests:             12867
  requests per second (mean): 2573.1
  server handler time (mean): 0.11ms
  server input parse time (mean): 0.08ms
  server heap size (max):     0.01GB

vs 

$ k6 run -u 1 -d 5s test.js
Results:
  total requests:             12868
  requests per second (mean): 2573.46
  server handler time (mean): 0.1ms
  server input parse time (mean): 0.08ms
  server heap size (max):     0.01GB

This seems unexpected, particularly that server input parse time doesn't change... Could someone else sanity check my approach here? 😕

@srenatus
Copy link
Contributor

Thanks for investigating this! 🙌

One thing I could imagine is that the performance improvements could depend on type hints of some sort -- if you unmarshal into a struct, the unmarshaler knows something about your data. If you unmarshal into map[string]any, or any, there's no help whatsoever.

Unrelated to that concern, however, perhaps we could see benefits from what's mentioned here?

reuse the underlying Stream or Iterator instance. jsoniter.ConfigFastest.BorrowIterator or jsoniter.ConfigFastest.BorrowStream. Just remember to return them when done.

But from a distance, that might mean we need to sacrifice float precision (which could be acceptable) -- but then we could also benchmark main vs jsoniter.ConfigFastest. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design optimization requires-investigation Issues not under active investigation but requires one
Projects
Status: Backlog
Development

No branches or pull requests

4 participants