Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cue: json encoding can use O(n^2) space #2470

Open
rogpeppe opened this issue Jul 3, 2023 · 2 comments
Open

cue: json encoding can use O(n^2) space #2470

rogpeppe opened this issue Jul 3, 2023 · 2 comments
Assignees

Comments

@rogpeppe
Copy link
Member

rogpeppe commented Jul 3, 2023

What version of CUE are you using (cue version)?

$ cue version
cue version v0.0.0-20230628162133-7be6224cbc4f

go version devel go1.21-f90b4cd655 Fri May 26 03:21:41 2023 +0000
      -buildmode exe
       -compiler gc
  DefaultGODEBUG panicnil=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
             vcs git
    vcs.revision 7be6224cbc4f99e9caee308a9218f11654dbb621
        vcs.time 2023-06-28T16:21:33Z
    vcs.modified false

Does this issue reproduce with the latest stable release?

Yes

What did you do?

I ran this benchmark:

package main

import (
	"strings"
	"testing"

	"cuelang.org/go/cue/cuecontext"
)

func BenchmarkJSONMarshal(b *testing.B) {
	b.ReportAllocs()
	n := 10000
	c := strings.Repeat("[", n) + "0" + strings.Repeat("]", n)
	ctx := cuecontext.New()
	val := ctx.CompileString(c)
	if err := val.Err(); err != nil {
		b.Fatal(err)
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		data, err := val.MarshalJSON()
		if err != nil {
			b.Fatal(err)
		}
		_ = data
	}
}

What did you expect to see?

Memory usage roughly proportional to the size of the data processed (20KB).

What did you see instead?

cpu: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
BenchmarkJSONMarshal-8   	       2	 846455788 ns/op	330375888 B/op	  180538 allocs/op

The encoding is allocating 315MB for each JSON marshal operation, and it takes 845ms to run.

The reason is that the internal CUE logic for marshaling does not write to a single buffer, but instead
it creates a buffer for each nested value which is then discarded after appending to the outer buffer.
See cue.marshalList for an example of this kind of behaviour.

Although the example above is somewhat pathological in nature, all CUE values with nested structure will suffer from this issue to some extent.

@rogpeppe rogpeppe added NeedsInvestigation Triage Requires triage/attention labels Jul 3, 2023
@rogpeppe
Copy link
Member Author

rogpeppe commented Jul 3, 2023

Here's a graph illustrating allocated bytes vs n for json and yaml encoders:

allocated bytes for list of depth n

Here it is with a log y-axis, which seems to tend towards linear, showing the n^2 behaviour.

allocated bytes for list of depth n (1)

@mvdan
Copy link
Member

mvdan commented Jul 4, 2023

On one hand we're limited by the encoding/json API, since it doesn't allow reusing buffers in any way. On the other hand, most of this is our own fault, since we encode structs and lists via recursive calls to the MarshalJSON method, forcing the use of new buffers. So we could definitely do a lot better, although right now we're working within the constraints of the encoding/json API.

I also think it would be worthwhile to have a benchmark with relatively large cases for each value kind (e.g. long string, long list, big struct, deeply nested list, deeply nested struct) and get memory use numbers for each of our supported encodings. Not just for YAML, but for others we'll likely support in the future.

@mvdan mvdan self-assigned this May 16, 2024
@mvdan mvdan removed the Triage Requires triage/attention label May 16, 2024
cueckoo pushed a commit that referenced this issue Sep 25, 2024
Adapted from a smaller benchmark that Roger provided in
https://cuelang.org/issue/2470 to also include nesting of structs
as well as a long string, a long list, and a long struct.

As can be seen by the current results, we allocate nearly 50MiB,
which is very wasteful for the current benchmark size of 2000.

    cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics
                            │     old     │
                            │   sec/op    │
    LargeValueMarshalJSON-8   106.2m ± 1%

                            │     old      │
                            │     B/op     │
    LargeValueMarshalJSON-8   48.24Mi ± 8%

                            │     old     │
                            │  allocs/op  │
    LargeValueMarshalJSON-8   106.2k ± 2%

For #2470.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I437cc8fb644dd33dd0327860ca04ff66b7ab9275
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201803
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
cueckoo pushed a commit that referenced this issue Sep 25, 2024
cue.Value implements json.Marshaler, so any call of json.Marshal
with a cue.Value is a roundabout way to just call cue.Value.MarshalJSON.
Moreover, we can call the internal API directly, which gives us
more flexibility such as reusing buffers later on.

                            │     old      │                new                 │
                            │    sec/op    │   sec/op     vs base               │
    LargeValueMarshalJSON-8   106.23m ± 1%   12.19m ± 1%  -88.52% (p=0.002 n=6)

                            │     old      │                 new                 │
                            │     B/op     │     B/op      vs base               │
    LargeValueMarshalJSON-8   48.24Mi ± 8%   23.51Mi ± 0%  -51.27% (p=0.002 n=6)

                            │     old      │                new                 │
                            │  allocs/op   │  allocs/op   vs base               │
    LargeValueMarshalJSON-8   106.17k ± 2%   78.36k ± 0%  -26.19% (p=0.002 n=6)

Updates #2470.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I1b0713489a5e888ab997389257772d2249d4389b
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201807
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
cueckoo pushed a commit that referenced this issue Sep 25, 2024
So that we can reuse buffers where possible rather than making new ones
only to copy the data over to the final buffer.

Note that this doesn't happen everywhere yet, as the underlying API
we use to marshal literals is json.Marshal, which is not append-like.

                            │     old      │                new                 │
                            │    sec/op    │   sec/op     vs base               │
    LargeValueMarshalJSON-8   12.240m ± 1%   7.772m ± 1%  -36.50% (p=0.002 n=6)

                            │      old      │                 new                 │
                            │     B/op      │     B/op      vs base               │
    LargeValueMarshalJSON-8   23.508Mi ± 0%   7.158Mi ± 0%  -69.55% (p=0.002 n=6)

                            │     old     │                new                 │
                            │  allocs/op  │  allocs/op   vs base               │
    LargeValueMarshalJSON-8   78.36k ± 0%   70.28k ± 0%  -10.32% (p=0.002 n=6)

Updates #2470.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Idd806ce52f9726ffd87d6bf71c2759131d2553a5
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201808
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
cueckoo pushed a commit that referenced this issue Sep 30, 2024
Create the context once in MarshalJSON and reuse it in appendJSON calls.

                            │     old     │                new                 │
                            │   sec/op    │   sec/op     vs base               │
    LargeValueMarshalJSON-8   7.664m ± 1%   6.484m ± 1%  -15.39% (p=0.002 n=6)

                            │     old      │                 new                 │
                            │     B/op     │     B/op      vs base               │
    LargeValueMarshalJSON-8   7.158Mi ± 0%   4.224Mi ± 0%  -40.98% (p=0.002 n=6)

                            │     old     │                new                 │
                            │  allocs/op  │  allocs/op   vs base               │
    LargeValueMarshalJSON-8   70.28k ± 0%   62.27k ± 0%  -11.40% (p=0.002 n=6)

Updates #2470.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Id24eb441c19e5d6028cc7fa8e1eaf0d1551623e9
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201824
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
cueckoo pushed a commit that referenced this issue Sep 30, 2024
It called getStruct, which built an entirely new adt.OpContext
rather than reusing the one the parent already had.
The method was used only once too, so inline it to simplify.

                            │     old     │                new                │
                            │   sec/op    │   sec/op     vs base              │
    LargeValueMarshalJSON-8   6.370m ± 1%   6.083m ± 1%  -4.50% (p=0.002 n=6)

                            │     old      │                 new                 │
                            │     B/op     │     B/op      vs base               │
    LargeValueMarshalJSON-8   4.178Mi ± 0%   3.445Mi ± 0%  -17.56% (p=0.002 n=6)

                            │     old     │                new                │
                            │  allocs/op  │  allocs/op   vs base              │
    LargeValueMarshalJSON-8   60.26k ± 0%   58.26k ± 0%  -3.33% (p=0.002 n=6)

Updates #2470.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I28bd8b820bf022e48257b445aa2e6d6e628dfe16
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201826
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
cueckoo pushed a commit that referenced this issue Oct 2, 2024
Value.appendJSON already switches on the value kind and applies defaults
so repeating that work in Value.List is a waste.
Moreover, we already have an OpContext we can reuse.

                            │     old     │                new                │
                            │   sec/op    │   sec/op     vs base              │
    LargeValueMarshalJSON-8   6.092m ± 1%   5.488m ± 1%  -9.91% (p=0.002 n=6)

                            │     old      │                 new                 │
                            │     B/op     │     B/op      vs base               │
    LargeValueMarshalJSON-8   3.445Mi ± 0%   2.360Mi ± 0%  -31.49% (p=0.002 n=6)

                            │     old     │                new                 │
                            │  allocs/op  │  allocs/op   vs base               │
    LargeValueMarshalJSON-8   58.26k ± 0%   52.25k ± 0%  -10.31% (p=0.002 n=6)

Updates #2470.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Ie180170d277c99a996a8e38a8a9e1c0b13e28154
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1202101
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
vanhtuan0409 pushed a commit to anduintransaction/cue that referenced this issue Oct 15, 2024
Value.appendJSON already switches on the value kind and applies defaults
so repeating that work in Value.List is a waste.
Moreover, we already have an OpContext we can reuse.

                            │     old     │                new                │
                            │   sec/op    │   sec/op     vs base              │
    LargeValueMarshalJSON-8   6.092m ± 1%   5.488m ± 1%  -9.91% (p=0.002 n=6)

                            │     old      │                 new                 │
                            │     B/op     │     B/op      vs base               │
    LargeValueMarshalJSON-8   3.445Mi ± 0%   2.360Mi ± 0%  -31.49% (p=0.002 n=6)

                            │     old     │                new                 │
                            │  allocs/op  │  allocs/op   vs base               │
    LargeValueMarshalJSON-8   58.26k ± 0%   52.25k ± 0%  -10.31% (p=0.002 n=6)

Updates cue-lang#2470.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: Ie180170d277c99a996a8e38a8a9e1c0b13e28154
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1202101
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants