From 8ebd346c67cf89c1c5a5b4dbc2d9771ab6634388 Mon Sep 17 00:00:00 2001 From: Marcelo Altmann Date: Mon, 5 Oct 2020 10:35:26 -0300 Subject: [PATCH] Fixed PS-7346 - Huge memory usage for audit plugin when large queries are executed https://jira.percona.com/browse/PS-7346 Function audit_log_general_record can be called twice for each record. 1st time it is called with a 4096k buffer. It will estimate via query_length what can be the maximum size of incoming query if every character has to be converted to utf8mb4(basically 4 times its size). If this size fits the buffer, it will convert event->general_query.str to utf8mb4, use the current buffer to store the converted query and advance the return pointer by query_length bytes. After we have the real query, we escape it. Otherwise (if the buffer is not big enough to convert the query), we will fall into the else branch. Here we calculate the size of the current query after escaping it and multiply by 4 (since we have not converted the query to utf8mb4) + the maximum size of incoming query if every character has to be converted to utf8mb4. We continue by escaping user / host / external_user / ip and db storing the required bytes into buflen_estimated. After this point we do a check to validate if the current buffer is big enough to hold buflen_estimated (the converted query + the full output of audit log entry). In case we don't have a big enough buffer, we set outlen to buflen_estimated and return NULL. Then audit_log_notify will allocate outlen bytes into its buffer and call audit_log_general_record for the 2nd time, this time with the resized buffer. Issue: Before 9fd57d3 we estimated buflen_estimated as twice of the output. If it was not enough we would increase the size of buffer by 4 times before calling audit_log_general_record the second time. After 9fd57d3 the else branch has been enhance to make a better estimate of the size of the output + size of the query, taking into consideration utf8mb4, however it missed two points: 1) buflen_estimated was doubling the size. This is no longer required since the else branch already take into account escaped utf8mb4 query + non escaped utf8mb4 query. 2) When estimating the size of the non escaped utf8mb4 query, it was wrongly multiplying its size by my_charset_utf8mb4_general_ci.mbmaxlen, however, this computation has already been performed before the if/else. --- plugin/audit_log/audit_log.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/plugin/audit_log/audit_log.c b/plugin/audit_log/audit_log.c index 8a45a30ec1a3..b94e15f832b9 100644 --- a/plugin/audit_log/audit_log.c +++ b/plugin/audit_log/audit_log.c @@ -617,6 +617,14 @@ char *audit_log_general_record(char *buf, size_t buflen, query_length= my_charset_utf8mb4_general_ci.mbmaxlen * event->general_query_length; + /* Note: query_length is the maximun size using utf8m4(4 bytes) that + * event->general_query_length may use. In the if branch, we convert it to + * utf8mb4. We store the recalculated (real size) length to query variable + * and use the remaing of the buffer for the output that will be printed to + * audit log. Parameter char *buf must be big enough to store + * the query (using utf8mb4) + the full output of audit event, which will + * contain the query again. At the else branch we estime this size. + */ if (query_length < (size_t) (endbuf - endptr)) { uint errors; @@ -639,7 +647,7 @@ char *audit_log_general_record(char *buf, size_t buflen, query= escape_string(event->general_query, event->general_query_length, endptr, endbuf - endptr, &endptr, &full_outlen); full_outlen*= my_charset_utf8mb4_general_ci.mbmaxlen; - full_outlen+= query_length * my_charset_utf8mb4_general_ci.mbmaxlen; + full_outlen+= query_length; } user= escape_string(event->general_user, event->general_user_length, @@ -654,10 +662,8 @@ char *audit_log_general_record(char *buf, size_t buflen, db= escape_string(default_db, strlen(default_db), endptr, endbuf - endptr, &endptr, &full_outlen); - buflen_estimated= full_outlen * 2 + - strlen(format_string[audit_log_format]) + - strlen(name) + - event->general_sql_command.length + + buflen_estimated= full_outlen + strlen(format_string[audit_log_format]) + + strlen(name) + event->general_sql_command.length + 20 + /* general_thread_id */ 20 + /* status */ MAX_RECORD_ID_SIZE + MAX_TIMESTAMP_SIZE;