-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leaks when the circular buffer is full #1521
Comments
1)If the event circular buffer fills up, then we're not fully freeing the events that are getting dropped on the datapath side. I have a fix for this on the bug/1521-mem-leaks branch. 2)On the reporting side, we A) read from the event circular buffer, and B) generate both events and metrics, and C) send both events and metrics. Right or wrong, I expected that things would stay on the event circular buffer until an event destination is connected, but this isn't true. If events and metrics are routed to different locations and the metric destination is configured to be udp, we'll read from the event circular buffer for the sake of metrics. From my perspective our behavior implies that we've decided that metric data is more important than event data. Right or wrong, I feel like the opposite is true. I don't have a suggestion here, except to double check that we want the behavior the way we have it now. 3)We have one thread reading off of the event circular buffer on the reporting side. If A) the datapath tries to put things on this queue faster than the reporting side can consume them, and B) the event circular buffer fills up then we find ourselves in an ugly state, particularly w.r.t. http. Why http? It's unique in that we try to keep some state from an http request that is only cleaned up when we marry it to it's paired http response. (g_maplist) Say we have a sequence events: http request A, http response A, http request B, http response B. If the event circular buffer was full and the reporting thread wasn't actively trying to read, all four of these would end up on the floor. Ideally we wouldn't lose data, but this scenario isn't the ugly one. Since we lost both each request and it's paired response, we won't accumulate cruft on the reporting side. Where it gets ugly is when the circular buffer is full, but the reporting thread consumes events so that there are gaps when the datapath intermittently can add one event to the event buffer but not the next. In this state the event circular buffer can end up with http requests without their corresponding responses, and responses without their corresponding requests. When this happens we'll accumulate cruft. Stuff allocated and added to the g_maplist structure and never freed, causing a second kind of leak. Possible improvements that can fix half of the problem for http1... on the datapath side if we can't add an http request to the circular buffer, don't even attempt to add the associated http response. Somewhat redundant with this, on the reporting side, don't create a map object for http responses, only requests |
With the first commit, made sure we're doing the same free logic when the event circular buffer is full on the datapath side as when we're processing events on the reporting side.
With the second commit, updated Makefille so unit tests can build successfully.
With the fourth commit, added code to break out of the doEvent() loop so other processing can not get starved indefinitely. Also refined logic to make sure that we are not processing http protocol data 1) if SCOPE_EVENT_ENABLE=false and SCOPE_EVENT_HTTP=true or SCOPE_METRIC_ENABLE=false and SCOPE_METRIC_HTTP=true. (Basically the "master enable" for events and metrics was being ignored before. Now it will be honored.
|
http protocol stuff if not configured to do so.
the 'Keep event circbuf from starving other work' commit.
Just to document the final state of #1521 (comment) above...
|
When scoping nginx, and we don't have a connection to send events out, the libraries' circular buffer will eventually fill up.
When this happens, we free events but not the event data they contain. Hitting nginx with "ab" is an easy way to put the library in this stressed position where we leak memory.
The text was updated successfully, but these errors were encountered: