-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
donwsampling: Optimize downsample memory usage #297
Comments
20GB not enough as well. |
My block stats:
This guy is... HEAVY |
|
I've allocated 50GB for mine and it's dying for OOM :/ |
It looks like this has been improved in the last releases ? I don't know if it's related but I don't see any memory spike since one of the last upgrade |
I am still getting OOM on large compacted blocks. We have blocks that are 207GB and after grabbing the block it will eventually eat all system memory until it OOM's. Server memory:
The server also has 32 cores that are being dramatically underutilized. |
Yup, thanks for feedback. Nothing changed in this area, so this becomes an important issue. |
Reproduced with >200Gb blocks, it tries to obtain that block after being killed by oom and wastes s3 traffic in a loop. Is there a way to disable downsampling until memory usage is fixed? |
Fixes #297 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
I can volunteer, what are you suggesting to look at first? Writing chunk data to a tmp file instead of eat away all heap? |
Fixes #297 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Fixes #297 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Thanks @xjewer!
|
Fixes #297 Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
I tested on my 9GB block. We put all the chunks in-memory.
and keep them until writing all that data back to the file:
So even bytes.Pool won't help in suggested place. |
9GB for source block or for downsampled block? I guess for source block, right? can you check size of downsampled block? It should be much smaller. If that so, why we have 9GB in memory still? I don't think *output block matters that much, but I might be wrong. That was clear from beginning that we are keeping it in memory the ouput block. (: But data says we keep input... am I missing something? |
Sure, source block. And yes, we're keeping the output block there. What I see, that initially we loading XORenconded chunks in a source block, which means the size of aggregated to 5m decoded samples would be comparable to source. Here is my result:
where: So for the extremely large source block we would waste a huge amount of RAM |
Nice, thanks for those numbers. It seems we are getting somewhere. So definitely output block can be significant I can agree. So we could avoid this by flush things to file if possible (Let's name it |
from the discussion with @bwplotka in slack:
The next what I found: $ chunks ls -l
total 19980256
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:36 000001
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:36 000002
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000003
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000004
-rw-r--r-- 1 xjewer staff 511M 6 Sep 03:37 000005
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000006
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000007
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000008
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000009
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000010
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000011
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000012
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000013
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:39 000014
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:39 000015
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:39 000016
-rwxr-xr-x 1 xjewer staff 512M 6 Sep 03:39 000017
-rwxr-xr-x 1 xjewer staff 512M 6 Sep 03:39 000018
-rwxr-xr-x 1 xjewer staff 459M 6 Sep 03:39 000019 Summary: |
Agreed, good stuff!
wt., 18 wrz 2018, 01:34 użytkownik xjewer <notifications@github.com>
napisał:
… from the discussion with @bwplotka <https://github.com/bwplotka> in slack:
1. there was a misunderstanding, diff wasn't exactly 4GB, but some
memory has gone indeed, something about 1-1.5 GB
2. interesting point
https://github.com/improbable-eng/thanos/blob/3c8546ceef9cf13856d91b9897fa816303fc05b6/cmd/thanos/downsample.go#L228
https://github.com/prometheus/tsdb/blob/master/chunks/chunks.go#L371
we read chunks using pool, but don’t put them back 🤔
3. memBlock anyway is the target for optimisation: has to be handled
by series to avoid memory consumption.
The next what I found:
https://github.com/improbable-eng/thanos/blob/master/cmd/thanos/downsample.go#L228
uses mmap to open chunk files
https://github.com/prometheus/tsdb/blob/master/chunks/chunks.go#L334
For my case, ByteSlices are with 536781921 capacity, which means series'
data would be pointed out on this array behind this slice and wouldn't be
swept out by GC until the following series will reach next data file.
$ chunks ls -l
total 19980256
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:36 000001
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:36 000002
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000003
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000004
-rw-r--r-- 1 xjewer staff 511M 6 Sep 03:37 000005
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000006
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000007
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:37 000008
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000009
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000010
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000011
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000012
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:38 000013
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:39 000014
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:39 000015
-rw-r--r-- 1 xjewer staff 512M 6 Sep 03:39 000016
-rwxr-xr-x 1 xjewer staff 512M 6 Sep 03:39 000017
-rwxr-xr-x 1 xjewer staff 512M 6 Sep 03:39 000018
-rwxr-xr-x 1 xjewer staff 459M 6 Sep 03:39 000019
Summary:
For the time being, the only way is to handle data series by series, not
to keep all the output downsampled block in memory
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#297 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGoNu1tCVS9Usag6-yXUY8Qj3SYSnqqzks5ucD-WgaJpZM4TYDtE>
.
|
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Would we be able to open this issue until the fix? :) |
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes thanos-io#297
Add instant writer implementation to shrink memory consumption during the downsampling stage. Encoded chunks are written to chunks blob files right away after series was handled. Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files. It still works in one thread, hence operates only on one core. Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data Fixes #297 * compact: clarify purpose of streamed block writer Add comments and close resources properly. * downsample: fix postings index Use proper posting index to fetch series data with label set and chunks * Add stream writer an ability to write index data right during the downsampling process. One of the trade-offs is to preserve symbols from raw blocks, as we have to write them before preserving the series. Stream writer allows downsample a huge data blocks with no needs to keep all series in RAM, the only need it preserve label values and postings references. * fix nitpicks * downsampling: simplify StreamedBlockWriter interface Reduce of use public Flush method to finalize index and meta files. In case of error, a caller has to remove block directory with a preserved garbage inside. Rid of use tmp directories and renaming, syncing the final block on disk before upload.
Is there any update on this improvement? |
Downsample can OOM easily with 16GB.. wonder if we can improve that in any way.
It uses a lot of memory when downsampling my 2-week block (on top of other mem usage after lot's of compactions maybe). Wonder if that's not some lazy GC issue.
The text was updated successfully, but these errors were encountered: