Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak ? #654

Closed
Laichj opened this issue Apr 21, 2021 · 1 comment
Closed

Memory Leak ? #654

Laichj opened this issue Apr 21, 2021 · 1 comment

Comments

@Laichj
Copy link

Laichj commented Apr 21, 2021

bash-4.4# jmap -histo 12 | head -50

num #instances #bytes class name

1: 397712 1597164184 [D
2: 483607 295217552 [C
3: 21586 243952824 [B
4: 397700 15908000 io.prometheus.client.CKMSQuantiles
5: 399649 12788768 java.util.LinkedList
6: 469368 11264832 java.lang.String
7: 261265 8360480 java.util.concurrent.ConcurrentHashMap$Node1
8: 26458 7742712 [I
9: 238620 7635840 io.prometheus.client.DoubleAdder
10: 166874 5378600 [Ljava.lang.String;
11: 50353 4025008 [Ljava.lang.Object;
12: 159382 3825168 java.util.Arrays$ArrayList
13: 38119 3354472 java.lang.reflect.Method
14: 79540 3181600 [Lio.prometheus.client.CKMSQuantiles;
15: 79540 3181600 io.prometheus.client.TimeWindowQuantiles
16: 79540 2545280 io.prometheus.client.Summary$Child
17: 19583 2162248 java.lang.Class
18: 688 1995592 [Ljava.util.concurrent.ConcurrentHashMap$Node;
19: 79541 1908976 [Lio.prometheus.client.CKMSQuantiles$Quantile;
20: 18809 1482992 [Ljava.util.HashMap$Node;
21: 79540 1272640 io.prometheus.client.Counter$Child
22: 56075 1251976 [Ljava.lang.Class;
23: 29710 1188400 java.util.LinkedHashMap$Entry
24: 22828 1095744 org.aspectj.weaver.reflect.ShadowMatchImpl
25: 32926 1053632 java.util.HashMap$Node
26: 17473 838704 java.util.HashMap
27: 13394 750064 java.util.LinkedHashMap
28: 22828 730496 org.aspectj.weaver.patterns.ExposedState
29: 44205 707280 java.lang.Object
30: 25162 603888 java.util.ArrayList

@fstab
Copy link
Member

fstab commented Apr 26, 2021

Little background of Quantiles: Calculating quantiles takes a lot of memory. In order to calculate the precise 0.95 quantile of a series of observations, you would need to keep a sorted array of all observations in memory, and then pick the one at the array.size()*0.95th position.

The CKMSQuantiles implements the "Effective Computation of Biased Quantiles over Data Streams" algorithm by Cormode, Korn, Muthukrishnan, and Srivastava. This is basically a trade-off between memory usage and precision: The algorithm throws away some observations (saving memory), but accepts a certain error margin (as configured in the Quantile.Builder.quantile() method). This saves memory, but it's still not a low memory data structure.

A better way to keep memory usage limited is to use a Histogram metric instead of a Summary metric, and then use the histogram_quantile function in Prometheus to calculate the quantiles.

The CKMSQuantiles look at a time window of maxAgeSeconds, which defaults to 10 minutes. If you have a constant rate of observations, the memory usage should be constant after 10 minutes. If you experience that memory usage increases after maxAgeSeconds even if you have a constant rate of observations then let me know how, because that would hint at an actual memory leak.

@Laichj Laichj closed this as completed May 17, 2021
DieBauer added a commit to DieBauer/client_java that referenced this issue Jan 22, 2022
CKMSQuantiles is copied from an implementation of 2012, where it states that a ‘HACK’ was done, admitting a space leak.
This leak has been noticed several times (prometheus#422, prometheus#550, prometheus#654).
By correctly applying the algorithm from the paper we fix the leak.

I have added unit-tests to show that the behaviour is correct.
I have also added a Benchmark in the benchmark module showing the difference with the old and current
implementation.

According to my benchmarks, is in the new implementation a `get` of a quantile that has ‘seen’ 1 million elements 440 times faster.
Inserting 1 million elements is 3.5 times faster.

While going through the CKMS paper and the Java implementation I have added remarks and snippets
from the paper, to clarify why certain choices are made.
DieBauer added a commit to DieBauer/client_java that referenced this issue Jan 22, 2022
CKMSQuantiles is copied from an implementation of 2012, where it states that a ‘HACK’ was done, admitting a space leak.
This leak has been noticed several times (prometheus#422, prometheus#550, prometheus#654).
By correctly applying the algorithm from the paper we fix the leak.

I have added unit-tests to show that the behaviour is correct.
I have also added a Benchmark in the benchmark module showing the difference with the old and current
implementation.

According to my benchmarks, is in the new implementation a `get` of a quantile that has ‘seen’ 1 million elements 440 times faster.
Inserting 1 million elements is 3.5 times faster.

While going through the CKMS paper and the Java implementation I have added remarks and snippets
from the paper, to clarify why certain choices are made.

Signed-off-by: Jens <jenskat@gmail.com>
fstab pushed a commit that referenced this issue Jan 30, 2022
CKMSQuantiles is copied from an implementation of 2012, where it states that a ‘HACK’ was done, admitting a space leak.
This leak has been noticed several times (#422, #550, #654).
By correctly applying the algorithm from the paper we fix the leak.

I have added unit-tests to show that the behaviour is correct.
I have also added a Benchmark in the benchmark module showing the difference with the old and current
implementation.

According to my benchmarks, is in the new implementation a `get` of a quantile that has ‘seen’ 1 million elements 440 times faster.
Inserting 1 million elements is 3.5 times faster.

While going through the CKMS paper and the Java implementation I have added remarks and snippets
from the paper, to clarify why certain choices are made.

Signed-off-by: Jens <jenskat@gmail.com>
fstab pushed a commit that referenced this issue Jan 30, 2022
CKMSQuantiles is copied from an implementation of 2012, where it states that a ‘HACK’ was done, admitting a space leak.
This leak has been noticed several times (#422, #550, #654).
By correctly applying the algorithm from the paper we fix the leak.

I have added unit-tests to show that the behaviour is correct.
I have also added a Benchmark in the benchmark module showing the difference with the old and current
implementation.

According to my benchmarks, is in the new implementation a `get` of a quantile that has ‘seen’ 1 million elements 440 times faster.
Inserting 1 million elements is 3.5 times faster.

While going through the CKMS paper and the Java implementation I have added remarks and snippets
from the paper, to clarify why certain choices are made.

Signed-off-by: Jens <jenskat@gmail.com>

Fix assertion in HistogramTest

Median of 1..11 = 6

Signed-off-by: Jens <jenskat@gmail.com>

Address PR remarks

Signed-off-by: Jens <jenskat@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants