-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5.6 - Fixed PS-7346 - Huge memory usage for audit plugin when large queries… #4073
Conversation
… are executed https://jira.percona.com/browse/PS-7346 Function audit_log_general_record can be called twice for each record. 1st time it is called with a 4096k buffer. It will estimate via query_length what can be the maximum size of incoming query if every character has to be converted to utf8mb4(basically 4 times its size). If this size fits the buffer, it will convert event->general_query.str to utf8mb4, use the current buffer to store the converted query and advance the return pointer by query_length bytes. After we have the real query, we escape it. Otherwise (if the buffer is not big enough to convert the query), we will fall into the else branch. Here we calculate the size of the current query after escaping it and multiply by 4 (since we have not converted the query to utf8mb4) + the maximum size of incoming query if every character has to be converted to utf8mb4. We continue by escaping user / host / external_user / ip and db storing the required bytes into buflen_estimated. After this point we do a check to validate if the current buffer is big enough to hold buflen_estimated (the converted query + the full output of audit log entry). In case we don't have a big enough buffer, we set outlen to buflen_estimated and return NULL. Then audit_log_notify will allocate outlen bytes into its buffer and call audit_log_general_record for the 2nd time, this time with the resized buffer. Issue: Before 9fd57d3 we estimated buflen_estimated as twice of the output. If it was not enough we would increase the size of buffer by 4 times before calling audit_log_general_record the second time. After 9fd57d3 the else branch has been enhance to make a better estimate of the size of the output + size of the query, taking into consideration utf8mb4, however it missed two points: 1) buflen_estimated was doubling the size. This is no longer required since the else branch already take into account escaped utf8mb4 query + non escaped utf8mb4 query. 2) When estimating the size of the non escaped utf8mb4 query, it was wrongly multiplying its size by my_charset_utf8mb4_general_ci.mbmaxlen, however, this computation has already been performed before the if/else.
5.6 is a bit difficult to have a test case without memory instrumentation on PS. Where is the example of 5.7 https://github.com/altmannmarcelo/percona-server/commit/0db8c9eff5820c8f8a4950111f9e1f05fa7d9ef2 before and after the patch (audit code is exactly the same for 5.6 and 5.7): Before
After:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@altmannmarcelo Please, also double check with ASan and Valgrind. No need to run full set of suites. Just try to do the same measurements you did for regular Debug build under these checkers. |
@percona-ysorokin https://ps56.cd.percona.com/job/percona-server-5.6-asan-param/11/ - ASAN ok |
Good to go then |
… are executed
https://jira.percona.com/browse/PS-7346
Function audit_log_general_record can be called twice for each record.
1st time it is called with a 4096k buffer. It will estimate via
query_length what can be the maximum size of incoming query if every
character has to be converted to utf8mb4(basically 4 times its size).
If this size fits the buffer, it will convert event->general_query.str
to utf8mb4, use the current buffer to store the converted query and
advance the return pointer by query_length bytes. After we have the real
query, we escape it.
Otherwise (if the buffer is not big enough to convert the query), we
will fall into the else branch. Here we calculate the size of the
current query after escaping it and multiply by 4 (since we have not
converted the query to utf8mb4) + the maximum size of incoming query if
every character has to be converted to utf8mb4.
We continue by escaping user / host / external_user / ip and db storing
the required bytes into buflen_estimated.
After this point we do a check to validate if the current buffer is big
enough to hold buflen_estimated (the converted query + the full output
of audit log entry). In case we don't have a big enough buffer, we set
outlen to buflen_estimated and return NULL. Then audit_log_notify will
allocate outlen bytes into its buffer and call audit_log_general_record
for the 2nd time, this time with the resized buffer.
Issue:
Before 9fd57d3 we estimated buflen_estimated as twice of the output. If
it was not enough we would increase the size of buffer by 4 times before
calling audit_log_general_record the second time.
After 9fd57d3 the else branch has been enhance to make a better estimate
of the size of the output + size of the query, taking into consideration
utf8mb4, however it missed two points:
since the else branch already take into account escaped utf8mb4 query +
non escaped utf8mb4 query.
wrongly multiplying its size by my_charset_utf8mb4_general_ci.mbmaxlen,
however, this computation has already been performed before the if/else.