-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cue: json encoding can use O(n^2) space #2470
Comments
On one hand we're limited by the encoding/json API, since it doesn't allow reusing buffers in any way. On the other hand, most of this is our own fault, since we encode structs and lists via recursive calls to the MarshalJSON method, forcing the use of new buffers. So we could definitely do a lot better, although right now we're working within the constraints of the encoding/json API. I also think it would be worthwhile to have a benchmark with relatively large cases for each value kind (e.g. long string, long list, big struct, deeply nested list, deeply nested struct) and get memory use numbers for each of our supported encodings. Not just for YAML, but for others we'll likely support in the future. |
Adapted from a smaller benchmark that Roger provided in https://cuelang.org/issue/2470 to also include nesting of structs as well as a long string, a long list, and a long struct. As can be seen by the current results, we allocate nearly 50MiB, which is very wasteful for the current benchmark size of 2000. cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ old │ │ sec/op │ LargeValueMarshalJSON-8 106.2m ± 1% │ old │ │ B/op │ LargeValueMarshalJSON-8 48.24Mi ± 8% │ old │ │ allocs/op │ LargeValueMarshalJSON-8 106.2k ± 2% For #2470. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I437cc8fb644dd33dd0327860ca04ff66b7ab9275 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201803 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Roger Peppe <rogpeppe@gmail.com> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
cue.Value implements json.Marshaler, so any call of json.Marshal with a cue.Value is a roundabout way to just call cue.Value.MarshalJSON. Moreover, we can call the internal API directly, which gives us more flexibility such as reusing buffers later on. │ old │ new │ │ sec/op │ sec/op vs base │ LargeValueMarshalJSON-8 106.23m ± 1% 12.19m ± 1% -88.52% (p=0.002 n=6) │ old │ new │ │ B/op │ B/op vs base │ LargeValueMarshalJSON-8 48.24Mi ± 8% 23.51Mi ± 0% -51.27% (p=0.002 n=6) │ old │ new │ │ allocs/op │ allocs/op vs base │ LargeValueMarshalJSON-8 106.17k ± 2% 78.36k ± 0% -26.19% (p=0.002 n=6) Updates #2470. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I1b0713489a5e888ab997389257772d2249d4389b Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201807 Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> Reviewed-by: Roger Peppe <rogpeppe@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
So that we can reuse buffers where possible rather than making new ones only to copy the data over to the final buffer. Note that this doesn't happen everywhere yet, as the underlying API we use to marshal literals is json.Marshal, which is not append-like. │ old │ new │ │ sec/op │ sec/op vs base │ LargeValueMarshalJSON-8 12.240m ± 1% 7.772m ± 1% -36.50% (p=0.002 n=6) │ old │ new │ │ B/op │ B/op vs base │ LargeValueMarshalJSON-8 23.508Mi ± 0% 7.158Mi ± 0% -69.55% (p=0.002 n=6) │ old │ new │ │ allocs/op │ allocs/op vs base │ LargeValueMarshalJSON-8 78.36k ± 0% 70.28k ± 0% -10.32% (p=0.002 n=6) Updates #2470. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Idd806ce52f9726ffd87d6bf71c2759131d2553a5 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201808 Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> Reviewed-by: Roger Peppe <rogpeppe@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Create the context once in MarshalJSON and reuse it in appendJSON calls. │ old │ new │ │ sec/op │ sec/op vs base │ LargeValueMarshalJSON-8 7.664m ± 1% 6.484m ± 1% -15.39% (p=0.002 n=6) │ old │ new │ │ B/op │ B/op vs base │ LargeValueMarshalJSON-8 7.158Mi ± 0% 4.224Mi ± 0% -40.98% (p=0.002 n=6) │ old │ new │ │ allocs/op │ allocs/op vs base │ LargeValueMarshalJSON-8 70.28k ± 0% 62.27k ± 0% -11.40% (p=0.002 n=6) Updates #2470. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Id24eb441c19e5d6028cc7fa8e1eaf0d1551623e9 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201824 Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Roger Peppe <rogpeppe@gmail.com> Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com>
It called getStruct, which built an entirely new adt.OpContext rather than reusing the one the parent already had. The method was used only once too, so inline it to simplify. │ old │ new │ │ sec/op │ sec/op vs base │ LargeValueMarshalJSON-8 6.370m ± 1% 6.083m ± 1% -4.50% (p=0.002 n=6) │ old │ new │ │ B/op │ B/op vs base │ LargeValueMarshalJSON-8 4.178Mi ± 0% 3.445Mi ± 0% -17.56% (p=0.002 n=6) │ old │ new │ │ allocs/op │ allocs/op vs base │ LargeValueMarshalJSON-8 60.26k ± 0% 58.26k ± 0% -3.33% (p=0.002 n=6) Updates #2470. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: I28bd8b820bf022e48257b445aa2e6d6e628dfe16 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1201826 Reviewed-by: Roger Peppe <rogpeppe@gmail.com> Reviewed-by: Marcel van Lohuizen <mpvl@gmail.com> TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Value.appendJSON already switches on the value kind and applies defaults so repeating that work in Value.List is a waste. Moreover, we already have an OpContext we can reuse. │ old │ new │ │ sec/op │ sec/op vs base │ LargeValueMarshalJSON-8 6.092m ± 1% 5.488m ± 1% -9.91% (p=0.002 n=6) │ old │ new │ │ B/op │ B/op vs base │ LargeValueMarshalJSON-8 3.445Mi ± 0% 2.360Mi ± 0% -31.49% (p=0.002 n=6) │ old │ new │ │ allocs/op │ allocs/op vs base │ LargeValueMarshalJSON-8 58.26k ± 0% 52.25k ± 0% -10.31% (p=0.002 n=6) Updates #2470. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Ie180170d277c99a996a8e38a8a9e1c0b13e28154 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1202101 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Roger Peppe <rogpeppe@gmail.com> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Value.appendJSON already switches on the value kind and applies defaults so repeating that work in Value.List is a waste. Moreover, we already have an OpContext we can reuse. │ old │ new │ │ sec/op │ sec/op vs base │ LargeValueMarshalJSON-8 6.092m ± 1% 5.488m ± 1% -9.91% (p=0.002 n=6) │ old │ new │ │ B/op │ B/op vs base │ LargeValueMarshalJSON-8 3.445Mi ± 0% 2.360Mi ± 0% -31.49% (p=0.002 n=6) │ old │ new │ │ allocs/op │ allocs/op vs base │ LargeValueMarshalJSON-8 58.26k ± 0% 52.25k ± 0% -10.31% (p=0.002 n=6) Updates cue-lang#2470. Signed-off-by: Daniel Martí <mvdan@mvdan.cc> Change-Id: Ie180170d277c99a996a8e38a8a9e1c0b13e28154 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1202101 TryBot-Result: CUEcueckoo <cueckoo@cuelang.org> Reviewed-by: Roger Peppe <rogpeppe@gmail.com> Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
What version of CUE are you using (
cue version
)?Does this issue reproduce with the latest stable release?
Yes
What did you do?
I ran this benchmark:
What did you expect to see?
Memory usage roughly proportional to the size of the data processed (20KB).
What did you see instead?
The encoding is allocating 315MB for each JSON marshal operation, and it takes 845ms to run.
The reason is that the internal CUE logic for marshaling does not write to a single buffer, but instead
it creates a buffer for each nested value which is then discarded after appending to the outer buffer.
See
cue.marshalList
for an example of this kind of behaviour.Although the example above is somewhat pathological in nature, all CUE values with nested structure will suffer from this issue to some extent.
The text was updated successfully, but these errors were encountered: