Store: consumes lots of memory at startup and loop restart cause OOM #6643

chalut01 · 2023-08-24T06:42:47Z

Thanos Version v0.32.0
Everything is set up the same way, only the version is different. Are there others similar to me?

rgarcia89 · 2023-08-24T08:43:10Z

I have experienced exactly the same issue.

Closed my ticket as this one covers the same issue #6644

pahaeanx · 2023-08-24T08:55:33Z

Same here. Can't get it to start up with 6GB of RAM when it used to run with ~3GB.

yeya24 · 2023-08-24T11:27:06Z

I can only think about #6509 that might change the startup memory usage of store gateway. But ideally this change should improve the mem usage.

Do you enable lazy index header? Can you share the config of store gateway?

pahaeanx · 2023-08-24T11:32:31Z

I can only think about #6509 that might change the startup memory usage of store gateway. But ideally this change should improve the mem usage.

Do you enable lazy index header? Can you share the config of store gateway?

No, I use a pretty vanilla config I'd say. Failing config (formatted for readability):

thanos store --max-time=-1w --grpc-address=localhost:15000 --http-address=localhost:15001 \ 
--data-dir=/var/lib/thanos-cache/store  --objstore.config-file=/etc/thanos/s3.yml \
--grpc-server-tls-cert=/etc/thanos/thanos.cer --grpc-server-tls-key=/etc/thanos/thanos.key \
--sync-block-duration=30m

This is currently running with 0.30.2 and fails with 0.32.0.

rgarcia89 · 2023-08-24T11:50:10Z

@yeya24 I deploy the thanos store using kube-thanos. It starts with the following args in the manifest. Currently on v0.31.0

        - store
        - --log.level=info
        - --log.format=logfmt
        - --data-dir=/var/thanos/store
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --objstore.config=$(OBJSTORE_CONFIG)
        - --ignore-deletion-marks-delay=24h

antikilahdjs · 2023-08-24T12:58:31Z

I have the same issue and upgraded to v31, all the other components works perfectly in 0.32 version but the store consume 1tb

GiedriusS · 2023-08-24T14:59:43Z

Maybe it would be possible for you to take a pprof memory profile during bootup of Thanos Store and share it here? Thanos Store exposes a pprof endpoint on /debug/pprof. 🤔

bboysoulcn · 2023-08-25T07:34:42Z

the same as you

chalut01 · 2023-08-26T15:37:07Z

@GiedriusS I'm sorry, but I can't actually try out and get pprof on production environment.
However, In my development, It have a small data and everything works well with v0.32.0.

Can someone share a pprof?

rgarcia89 · 2023-08-28T07:29:18Z

Maybe it would be possible for you to take a pprof memory profile during bootup of Thanos Store and share it here? Thanos Store exposes a pprof endpoint on /debug/pprof. 🤔

@GiedriusS I have thanos running as container in kubernetes using kube-thanos. How can I find this endpoint?

yeya24 · 2023-08-28T07:36:27Z

@rgarcia89 Can you port forward one of your store gateway pod using its http port, for example it is 8080.

curl http://localhost:8080/debug/pprof/heap > heap.pprof

You can get the heap profile by running the command above. Make sure to do it when store gateway starts...

rgarcia89 · 2023-08-28T07:55:41Z

@yeya24 here you go - including the heap of running 0.31.0 thanos store vs the heap of a just started 0.32.0 thanos store which then crashed

heap-pprof.zip

MichaHoffmann · 2023-08-28T08:04:13Z

Oh thats the cuckoo filter completely.

MichaHoffmann · 2023-08-28T08:08:05Z

Do you have some figures around your cardinality?

rgarcia89 · 2023-08-28T08:19:30Z

@MichaHoffmann anything specific you are looking for? Thanos is currently managing a time series of 1792044 entries

MichaHoffmann · 2023-08-28T08:20:46Z

This filter should scale ( theoretically ) with the amount of label names you have. But for 128gb it would need like 100s of millions ( again theoretically, maybe there is a bug somewhere )

rgarcia89 · 2023-08-28T08:44:02Z

Good question. I can see everything running smooth with v0.31.0 and also haven't seen any kind of alerts or issues in regards of high cardinality

MichaHoffmann · 2023-08-28T08:44:57Z

Good question. I can see everything running smooth with v0.31.0 and also haven't seen any kind of alerts or issues in regards of high cardinality

That cuckoo filter was introduced in 0.32.0 so that makes sense

rgarcia89 · 2023-08-28T08:46:27Z

Maybe there is something counting wrong in that filter. Otherwise I am not sure how 128gb can be justified

saswatamcode · 2023-08-28T09:56:45Z

#6669 should address this

GiedriusS · 2023-08-28T12:50:44Z

How does it look like with v0.32.1?

rgarcia89 · 2023-08-28T12:59:48Z

Very good. Issue is gone on my clusters

pahaeanx · 2023-08-28T13:51:57Z

Unfortunately still crashing with 0.32.1. Can't do any more digging today so all I can offer for now is the crash log. But it seems like it OOM crashed again. I can try and scale the VM tomorrow so I can maybe get it to start up.

crash.log.tar.gz

farodin91 · 2023-08-28T13:59:04Z

It looks like store queries around 50% more mem as before, but no escalation.

MichaHoffmann · 2023-08-28T14:02:27Z

thast not a oom crash!

Aug 28 15:38:40 440019-prod-observer01 thanos[75687]: fatal error: ts=2023-08-28T13:38:40.784103104Z caller=bucket.go:688 level=info msg="loaded new block" elapsed=261.655268ms id=01H8GGHPJEQJ5XWESMV004DW8J
Aug 28 15:38:40 440019-prod-observer01 thanos[75687]: concurrent map iteration and map write

mateuszdrab · 2023-08-28T14:02:28Z

Works for me now, I can check memory metrics later.

MichaHoffmann · 2023-08-28T14:07:57Z

#6675

antikilahdjs · 2023-08-30T13:02:18Z

Works perfectly and thank you guys to work hard to delivery the best to us

MichaHoffmann · 2023-08-31T15:07:25Z

@chalut01 can we close the issue?

chalut01 · 2023-08-31T15:51:57Z

Fixed in v0.32.2 !!
Thank you everyone for your hard work.

antikilahdjs mentioned this issue Aug 24, 2023

Thanos Store Consuming more than 1tb #6652

Closed

douglascamata added bug difficulty: hard help wanted component: store needs-investigation labels Aug 25, 2023

chalut01 mentioned this issue Aug 26, 2023

Query Frontend: consumes more memory (OOM) #6665

Closed

chalut01 closed this as completed Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store: consumes lots of memory at startup and loop restart cause OOM #6643

Store: consumes lots of memory at startup and loop restart cause OOM #6643

chalut01 commented Aug 24, 2023 •

edited

Loading

rgarcia89 commented Aug 24, 2023

pahaeanx commented Aug 24, 2023

yeya24 commented Aug 24, 2023

pahaeanx commented Aug 24, 2023

rgarcia89 commented Aug 24, 2023 •

edited

Loading

antikilahdjs commented Aug 24, 2023

GiedriusS commented Aug 24, 2023

bboysoulcn commented Aug 25, 2023

chalut01 commented Aug 26, 2023 •

edited

Loading

rgarcia89 commented Aug 28, 2023

yeya24 commented Aug 28, 2023 •

edited

Loading

rgarcia89 commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

rgarcia89 commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

rgarcia89 commented Aug 28, 2023 •

edited

Loading

MichaHoffmann commented Aug 28, 2023

rgarcia89 commented Aug 28, 2023

saswatamcode commented Aug 28, 2023

GiedriusS commented Aug 28, 2023

rgarcia89 commented Aug 28, 2023

pahaeanx commented Aug 28, 2023

farodin91 commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

mateuszdrab commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

antikilahdjs commented Aug 30, 2023

MichaHoffmann commented Aug 31, 2023

chalut01 commented Aug 31, 2023

Store: consumes lots of memory at startup and loop restart cause OOM #6643

Store: consumes lots of memory at startup and loop restart cause OOM #6643

Comments

chalut01 commented Aug 24, 2023 • edited Loading

rgarcia89 commented Aug 24, 2023

pahaeanx commented Aug 24, 2023

yeya24 commented Aug 24, 2023

pahaeanx commented Aug 24, 2023

rgarcia89 commented Aug 24, 2023 • edited Loading

antikilahdjs commented Aug 24, 2023

GiedriusS commented Aug 24, 2023

bboysoulcn commented Aug 25, 2023

chalut01 commented Aug 26, 2023 • edited Loading

rgarcia89 commented Aug 28, 2023

yeya24 commented Aug 28, 2023 • edited Loading

rgarcia89 commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

rgarcia89 commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

rgarcia89 commented Aug 28, 2023 • edited Loading

MichaHoffmann commented Aug 28, 2023

rgarcia89 commented Aug 28, 2023

saswatamcode commented Aug 28, 2023

GiedriusS commented Aug 28, 2023

rgarcia89 commented Aug 28, 2023

pahaeanx commented Aug 28, 2023

farodin91 commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

mateuszdrab commented Aug 28, 2023

MichaHoffmann commented Aug 28, 2023

antikilahdjs commented Aug 30, 2023

MichaHoffmann commented Aug 31, 2023

chalut01 commented Aug 31, 2023

chalut01 commented Aug 24, 2023 •

edited

Loading

rgarcia89 commented Aug 24, 2023 •

edited

Loading

chalut01 commented Aug 26, 2023 •

edited

Loading

yeya24 commented Aug 28, 2023 •

edited

Loading

rgarcia89 commented Aug 28, 2023 •

edited

Loading