-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uses custom json-iter decoder for log entries. #3163
Conversation
Previously we were using json.Unmarshal for each line. However json-iter uses a Pool for each calls and I believe this can cause to increase memory usage. For each line we would put in a pool the iterator to re-use it, once put in a pool, the last data is retained, since we handle millions of lines, this can cause problem, using a custom extensions, keep using a pool but at the root object only, not for each line. On top of that we're going to process that json payload 50% faster. ``` ❯ benchcmp before.txt after.txt2 benchmark old ns/op new ns/op delta Benchmark_DecodePushRequest-16 13509236 6677037 -50.57% benchmark old allocs new allocs delta Benchmark_DecodePushRequest-16 106149 38719 -63.52% benchmark old bytes new bytes delta Benchmark_DecodePushRequest-16 10350362 5222989 -49.54% ``` Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #3163 +/- ##
==========================================
- Coverage 63.05% 62.90% -0.16%
==========================================
Files 188 189 +1
Lines 16218 16249 +31
==========================================
- Hits 10226 10221 -5
- Misses 5051 5090 +39
+ Partials 941 938 -3
|
@mizeng You raised an issue I believe around json and distributor memory, I think this might greatly help if not fix it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do you have benchmark stats for the encoding, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I used to but I don't anymore, it's also faster and less memory but it was less of a problem. |
Introduce a bug by removing the default marshalling (grafana#3163) but the tail api was using the default json. This fixes it by forcing the usage of jsoniter package. Added a missing test so that it doesn't happen again. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
Introduce a bug by removing the default marshalling (#3163) but the tail api was using the default json. This fixes it by forcing the usage of jsoniter package. Added a missing test so that it doesn't happen again. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
**What this PR does / why we need it**: This allows to reuse stream memory allocations between responses. Related #3163 but this time this is for the encoding. ``` ❯ benchstat before.txt after.txt name old time/op new time/op delta _Encode-16 29.2µs ±12% 25.2µs ± 1% -13.85% (p=0.016 n=5+4) name old alloc/op new alloc/op delta _Encode-16 24.9kB ± 6% 16.4kB ± 8% -34.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _Encode-16 145 ± 0% 129 ± 0% -11.03% (p=0.008 n=5+5) ```
**What this PR does / why we need it**: This allows to reuse stream memory allocations between responses. Related grafana#3163 but this time this is for the encoding. ``` ❯ benchstat before.txt after.txt name old time/op new time/op delta _Encode-16 29.2µs ±12% 25.2µs ± 1% -13.85% (p=0.016 n=5+4) name old alloc/op new alloc/op delta _Encode-16 24.9kB ± 6% 16.4kB ± 8% -34.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _Encode-16 145 ± 0% 129 ± 0% -11.03% (p=0.008 n=5+5) ```
**What this PR does / why we need it**: This allows to reuse stream memory allocations between responses. Related grafana#3163 but this time this is for the encoding. ``` ❯ benchstat before.txt after.txt name old time/op new time/op delta _Encode-16 29.2µs ±12% 25.2µs ± 1% -13.85% (p=0.016 n=5+4) name old alloc/op new alloc/op delta _Encode-16 24.9kB ± 6% 16.4kB ± 8% -34.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _Encode-16 145 ± 0% 129 ± 0% -11.03% (p=0.008 n=5+5) ```
**What this PR does / why we need it**: This allows to reuse stream memory allocations between responses. Related grafana#3163 but this time this is for the encoding. ``` ❯ benchstat before.txt after.txt name old time/op new time/op delta _Encode-16 29.2µs ±12% 25.2µs ± 1% -13.85% (p=0.016 n=5+4) name old alloc/op new alloc/op delta _Encode-16 24.9kB ± 6% 16.4kB ± 8% -34.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _Encode-16 145 ± 0% 129 ± 0% -11.03% (p=0.008 n=5+5) ```
Previously we were using json.Unmarshal for each line. However json-iter uses a Pool for each calls and I believe this can cause to increase memory usage.
For each line we would put in a pool the iterator to re-use it, once put in a pool, the last data is retained, since we handle millions of lines, this can cause problem, using a custom extensions, keep using a pool but at the root object only, not for each line.
On top of that we're going to process that json payload 50% faster.
Signed-off-by: Cyril Tovena cyril.tovena@gmail.com
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist