Memory Usage Explosion by zstd #675
Replies: 6 comments 11 replies
-
Decompressing takes memory. Decompressing 9000x takes huge amount of memory. |
Beta Was this translation helpful? Give feedback.
-
Alternatively content must be encoded with a very small window, but that requires you to be in control of the encoding as well. You can enforce a max decoder window, but it will of course fail decoding if that cannot be satisfied. |
Beta Was this translation helpful? Give feedback.
-
I switched to using https://github.com/DataDog/zstd, and it works fine. So, the memory issue is specific to this implementation of zstd. In the reduce step, the loader has to read through all the map files because each file has sorted data. So, it's sort of like 9000 streams that the reducer has to read to achieve global sorting. With datadog zstd, each stream takes around 1 MiB, so the whole thing can still easily fit in memory. |
Beta Was this translation helpful? Give feedback.
-
I am still unable to reproduce the issue. Are you calling I added a benchmark that tests allocations on Once you have Read from it, it will keep the buffer around until you call |
Beta Was this translation helpful? Give feedback.
-
I am also facing the same issue. I am trying to decompress Reddit zstd archives from pushshift.io and they require a 2gb window size. It works fine with this repo, but memory usage goes upto 4gb just for decompression. |
Beta Was this translation helpful? Give feedback.
-
@klauspost Thank you for the suggestion. I will recompress it. |
Beta Was this translation helpful? Give feedback.
-
Hey @klauspost ,
I just switched to using zstd for compression for Outserv boot (/ Dgraph bulk) loader. My map phase produces ~9000 files, which are read concurrently by the reduce phase.
I was using snappy before, and it was working just fine. When I try to read this via zstd, my memory usage just blows up (using 90 GiB).
I'm opening my reader like so:
dec, err := zstd.NewReader(fd, zstd.WithDecoderConcurrency(1), zstd.WithDecoderLowmem(true))
So, trying my best to decrease memory usage. Suggestions?
P.S. Tangential, but perhaps consider using jemalloc for such big allocations instead of Go memory.
Beta Was this translation helpful? Give feedback.
All reactions