-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json.match_schema performance #7011
Comments
Hi there! And thanks for filing this issue. Looking into this briefly, and almost all of that time is spent in loading the JSON schema, not actually validating. package main
import (
"fmt"
"os"
"time"
)
import "github.com/xeipuuv/gojsonschema"
func main() {
now := time.Now()
bs, err := os.ReadFile("schema.json")
if err != nil {
panic(err)
}
sl := gojsonschema.NewBytesLoader(bs)
schema, err := gojsonschema.NewSchema(sl)
if err != nil {
panic(err)
}
dl := gojsonschema.NewStringLoader(`{"name": "John", "age": 30}`)
result, err := schema.Validate(dl)
if err != nil {
panic(err)
}
fmt.Println(result.Valid())
fmt.Println(time.Since(now))
now = time.Now()
dl = gojsonschema.NewStringLoader(`{"another": "object", "x": 1}`)
result, err = schema.Validate(dl)
if err != nil {
panic(err)
}
fmt.Println(result.Valid())
fmt.Println(time.Since(now))
} Output
I guess using the inter query cache for this built-in storing loaded schemas across decisions would be the way to go. It wouldn't make your single |
I figured I'd test this out anyway, and this seemed like a good case given that there was an actual issue on this. Testing response times with OPA running as a server, and the first request is ~800 ms while the following ones are ~10 ms. Fixes open-policy-agent#7011 Signed-off-by: Anders Eknert <anders@styra.com>
I figured I'd test this out anyway, and this seemed like a good case given that there was an actual issue on this. Testing response times with OPA running as a server, and the first request is ~800 ms while the following ones are ~10 ms. Fixes open-policy-agent#7011 Signed-off-by: Anders Eknert <anders@styra.com>
I figured I'd test this out anyway, and this seemed like a good case given that there was an actual issue on this. Testing response times with OPA running as a server, and the first request is ~800 ms while the following ones are ~10 ms. Fixes open-policy-agent#7011 Signed-off-by: Anders Eknert <anders@styra.com>
Caching this now as described above. Note that like I mentioned, the first hit will still be expensive, as the schema must be loaded at some point. But subsequent requests are now instantaneous. |
@anderseknert, I accidentally figured out why loading takes so long. The CycloneDX schema has external I think caching is still useful in the cases where remote references must be used. I just wanted to share this new finding as it may help others in the future. |
Ah, yeah, that certainly explains a lot. Thanks for letting me know! Being able to cache the schema is a good change either way, as recomputing that per request is just wasting resources 🙂 |
Short description
The
json.match_schema
function takes much longer when the JSON schema is significantly large.I created a simple reproducer here: https://github.com/lcarva/opa-json-schema-perf
(Schema too large for rego playground)
The reproducer validates a small object against the CycloneDX SBOM JSON Schema (about 5k lines long).
main.rego:10
is thejson.match_schema
call where the CycloneDX schema is being used.main.rego:12
uses a much smaller schema. That's 288,827 vs 63 microseconds.Steps To Reproduce
See description.
Expected behavior
Validation of object should not take longer than 1ms.
The text was updated successfully, but these errors were encountered: