Investigate potential memory leak #522

muety · 2023-09-19T07:25:35Z

The plot shows memory usage (actual, not virtual) of the wakapi process on wakapi.dev over 7 days with Prometheus scrape interval of 5 minutes. We can see how memory accumulates during the day and all of a sudden drops precisely at 02:20 am (or anywhere in the past 5 minutes before that). The default for WAKAPI_AGGREGATION_TIME is 02:15 am. To check whether it's related, I changed the aggregation time to 01:45 am on Sep 19 and we can observe the drop happening around 01:50 am now. So apparently running the daily aggregation job causes memory to be freed. Let's investigate why that is and why memory accumulates during the day in the first place.

The text was updated successfully, but these errors were encountered:

muety · 2023-09-20T13:08:07Z

Found the cause.

Every time a user requests a summary for a relative time interval (such as today, past_6_months, etc.), the summary's to parameter (sometimes also from) will be time.Now(), which results in a new cache key and thus a new cache entry here. Once the nightly aggregation job runs, new summaries are inserted, as a side-effect of which the user's cache is being flushed - even before it's normal expiration after 24 hours.

That is, the more often you hit refresh on the dashboard page, the faster the memory will be filled up. This way, you could technically bring Wakapi down 😉.

Options to fix this include:

Don't cache summaries for relative time intervals. E.g. skip cache if to.Second() != 0, i.e. summary likely doesn't correspond to a clear-cut date (such as midnight or so) and thus will most likely never be retrieved again anyway.
Drop the cache in the first place, as it's anyway only useful for "absolute" time intervals (such as yesterday) and retrieving a summary isn't crazily expensive after we introduced the concept of durations
Just ignore it for now, as it doesn't really hurt

Cool side-effect of this investigation: I learned a lot about profiling in Go and added a profiler web endpoint to Wakapi.

muety · 2023-09-21T12:39:46Z

As expected, the pattern can still be observed after my latest commit. However, the delta is, as expected, a lot smaller now (~170 MB as opposed to ~450 MB before). We could get around this completely by disabling the cache (as discussed above), but I don't think it doesn't hurt much the way it is now.

muety · 2024-01-02T09:17:46Z

Found a new potential memory leak. Got aware of this because Wakapi is being OOM-killed sometimes lately.

muety · 2024-10-20T06:11:27Z

Reopening this, as Wakapi is still getting OOM-killed every now and then. It currently runs with GOMEMLIMIT=3000MiB on a machine with 8 GB of memory. From a first peak, I can't see any tremendous increases in used heap space shortly before getting killed, but will have to investigate further.

muety added technical_task prio b effort:3 labels Sep 19, 2023

muety self-assigned this Sep 19, 2023

muety added the help wanted Extra attention is needed label Sep 19, 2023

muety closed this as completed in 37ac895 Sep 20, 2023

muety reopened this Jan 2, 2024

muety closed this as completed Feb 22, 2024

muety reopened this Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate potential memory leak #522

Investigate potential memory leak #522

muety commented Sep 19, 2023

muety commented Sep 20, 2023 •

edited

Loading

muety commented Sep 21, 2023

muety commented Jan 2, 2024

muety commented Oct 20, 2024

Investigate potential memory leak #522

Investigate potential memory leak #522

Comments

muety commented Sep 19, 2023

muety commented Sep 20, 2023 • edited Loading

muety commented Sep 21, 2023

muety commented Jan 2, 2024

muety commented Oct 20, 2024

muety commented Sep 20, 2023 •

edited

Loading