-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when using TLS on Windows #374
Comments
I'm not sure whether it's reproduced also on GNU/Linux or not. |
It seems that some of diff --git a/socket.rb b/socket.rb
index afde072..7a72edc 100644
--- a/socket.rb
+++ b/socket.rb
@@ -107,36 +107,41 @@ module Fluent
log.trace "setting TLS verify_mode NONE"
context.verify_mode = OpenSSL::SSL::VERIFY_NONE
else
- cert_store = OpenSSL::X509::Store.new
- if allow_self_signed_cert && OpenSSL::X509.const_defined?('V_FLAG_CHECK_SS_SIGNATURE')
- cert_store.flags = OpenSSL::X509::V_FLAG_CHECK_SS_SIGNATURE
- end
- begin
- if enable_system_cert_store
- if Fluent.windows? && cert_logical_store_name
- log.trace "loading Windows system certificate store"
- loader = Certstore::OpenSSL::Loader.new(log, cert_store, cert_logical_store_name,
- enterprise: cert_use_enterprise_store)
- loader.load_cert_store
- cert_store = loader.cert_store
- context.cert = loader.get_certificate(cert_thumbprint) if cert_thumbprint
+ cert_store_key = "socket_cert_store_#{object_id}".to_sym
+ cert_store = ::Thread.current[cert_store_key]
+ unless cert_store
+ cert_store = OpenSSL::X509::Store.new
+ ::Thread.current[cert_store_key] = cert_store
+ if allow_self_signed_cert && OpenSSL::X509.const_defined?('V_FLAG_CHECK_SS_SIGNATURE')
+ cert_store.flags = OpenSSL::X509::V_FLAG_CHECK_SS_SIGNATURE
+ end
+ begin
+ if enable_system_cert_store
+ if Fluent.windows? && cert_logical_store_name
+ log.trace "loading Windows system certificate store"
+ loader = Certstore::OpenSSL::Loader.new(log, cert_store, cert_logical_store_name,
+ enterprise: cert_use_enterprise_store)
+ loader.load_cert_store
+ cert_store = loader.cert_store
+ context.cert = loader.get_certificate(cert_thumbprint) if cert_thumbprint
+ end
+ log.trace "loading system default certificate store"
+ cert_store.set_default_paths
end
- log.trace "loading system default certificate store"
- cert_store.set_default_paths
+ rescue OpenSSL::X509::StoreError
+ log.warn "failed to load system default certificate store", error: e
end
- rescue OpenSSL::X509::StoreError
- log.warn "failed to load system default certificate store", error: e
- end
- if cert_paths
- if cert_paths.respond_to?(:each)
- cert_paths.each do |cert_path|
+ if cert_paths
+ if cert_paths.respond_to?(:each)
+ cert_paths.each do |cert_path|
+ log.trace "adding CA cert", path: cert_path
+ cert_store.add_file(cert_path)
+ end
+ else
+ cert_path = cert_paths
log.trace "adding CA cert", path: cert_path
cert_store.add_file(cert_path)
end
- else
- cert_path = cert_paths
- log.trace "adding CA cert", path: cert_path
- cert_store.add_file(cert_path)
end
end |
I'm not sure the root cause yet but I'm now suspecting around this: |
I confirmed that require 'openssl'
loop do
cert_store = OpenSSL::X509::Store.new
cert_store.set_default_paths
cert_store = nil
GC.start
end It's reproduced only on Windows. |
openssl gem's implementation: 365 static VALUE
366 ossl_x509store_set_default_paths(VALUE self)
367 {
368 X509_STORE *store;
369
370 GetX509Store(self, store);
371 if (X509_STORE_set_default_paths(store) != 1){
372 ossl_raise(eX509StoreError, NULL);
373 }
374
375 return Qnil;
376 } It doesn't seem possible to cause the memory leak. |
I've wrote a minimum C code to reproduce the memory leak. #include <openssl/x509v3.h>
#include <stdio.h>
int
main(int argc, char *argv[])
{
X509_STORE *store = X509_STORE_new();
#ifdef USE_SET_DEFAULT_PATHS
int result = X509_STORE_set_default_paths(store);
if (result) {
printf("no error\r\n");
} else {
printf("error\r\n");
}
#endif
X509_STORE_free(store);
return 0;
} > ridk exec gcc openssl-leak.c -DUSE_SET_DEFAULT_PATHS -IC:/Ruby27-x64/msys64/mingw64/bin/../include -LC:/Ruby27-x64/msys64/mingw64/bin/../lib -lssl -lcrypto
> ridk exec "C:\Program Files (x86)\Dr. Memory\bin\drmemory.exe" -leaks_only -- a.exe
~~Dr.M~~ Dr. Memory version 2.5.0
~~Dr.M~~ Running "a.exe"
~~Dr.M~~ Using system call file C:\Users\aho\AppData\Roaming\Dr. Memory\symcache\syscalls_x64.txt
no error
~~Dr.M~~
~~Dr.M~~ Error #1: LEAK 121 direct bytes 0x000002ba3d420600-0x000002ba3d420679 + 0 indirect bytes
~~Dr.M~~ # 0 replace_malloc [d:\a\drmemory\drmemory\common\alloc_replace.c:2580]
~~Dr.M~~ # 1 KERNELBASE.dll!CompareStringW+0x164 (0x00007fff6a6b2875 <KERNELBASE.dll+0x22875>)
~~Dr.M~~ # 2 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4ea973 <libcrypto-1_1-x64.dll+0x13a973>)
~~Dr.M~~ # 3 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4eac18 <libcrypto-1_1-x64.dll+0x13ac18>)
~~Dr.M~~ # 4 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b560b68 <libcrypto-1_1-x64.dll+0x1b0b68>)
~~Dr.M~~ # 5 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b55d0b5 <libcrypto-1_1-x64.dll+0x1ad0b5>)
~~Dr.M~~ # 6 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b5609ed <libcrypto-1_1-x64.dll+0x1b09ed>)
~~Dr.M~~ # 7 main
~~Dr.M~~
~~Dr.M~~ Error #2: LEAK 313 direct bytes 0x000002ba3d4206a0-0x000002ba3d4207d9 + 0 indirect bytes
~~Dr.M~~ # 0 replace_malloc [d:\a\drmemory\drmemory\common\alloc_replace.c:2580]
~~Dr.M~~ # 1 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4eac2b <libcrypto-1_1-x64.dll+0x13ac2b>)
~~Dr.M~~ # 2 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4ea973 <libcrypto-1_1-x64.dll+0x13a973>)
~~Dr.M~~ # 3 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4ea6bf <libcrypto-1_1-x64.dll+0x13a6bf>)
~~Dr.M~~ # 4 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b560b83 <libcrypto-1_1-x64.dll+0x1b0b83>)
~~Dr.M~~ # 5 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b55d0b5 <libcrypto-1_1-x64.dll+0x1ad0b5>)
~~Dr.M~~ # 6 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b5609ed <libcrypto-1_1-x64.dll+0x1b09ed>) [4/786]
~~Dr.M~~ # 7 main
~~Dr.M~~
~~Dr.M~~ Error #3: LEAK 97 direct bytes 0x000002ba3d44ef20-0x000002ba3d44ef81 + 0 indirect bytes
~~Dr.M~~ # 0 replace_malloc [d:\a\drmemory\drmemory\common\alloc_replace.c:2580]
~~Dr.M~~ # 1 KERNELBASE.dll!CompareStringW+0x164 (0x00007fff6a6b2875 <KERNELBASE.dll+0x22875>)
~~Dr.M~~ # 2 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4ea973 <libcrypto-1_1-x64.dll+0x13a973>)
~~Dr.M~~ # 3 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4eac18 <libcrypto-1_1-x64.dll+0x13ac18>)
~~Dr.M~~ # 4 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b560b68 <libcrypto-1_1-x64.dll+0x1b0b68>)
~~Dr.M~~ # 5 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b55cca5 <libcrypto-1_1-x64.dll+0x1acca5>)
~~Dr.M~~ # 6 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b560a21 <libcrypto-1_1-x64.dll+0x1b0a21>)
~~Dr.M~~ # 7 main
~~Dr.M~~
~~Dr.M~~ Error #4: LEAK 289 direct bytes 0x000002ba3d44efb0-0x000002ba3d44f0d1 + 0 indirect bytes
~~Dr.M~~ # 0 replace_malloc [d:\a\drmemory\drmemory\common\alloc_replace.c:2580]
~~Dr.M~~ # 1 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4eac2b <libcrypto-1_1-x64.dll+0x13ac2b>)
~~Dr.M~~ # 2 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4ea973 <libcrypto-1_1-x64.dll+0x13a973>)
~~Dr.M~~ # 3 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b4ea6bf <libcrypto-1_1-x64.dll+0x13a6bf>)
~~Dr.M~~ # 4 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b560b83 <libcrypto-1_1-x64.dll+0x1b0b83>)
~~Dr.M~~ # 5 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b55cca5 <libcrypto-1_1-x64.dll+0x1acca5>)
~~Dr.M~~ # 6 libcrypto-1_1-x64.dll!? +0x0 (0x00007fff0b560a21 <libcrypto-1_1-x64.dll+0x1b0a21>)
~~Dr.M~~ # 7 main
~~Dr.M~~
~~Dr.M~~ ERRORS FOUND:
~~Dr.M~~ 0 unique, 0 total invalid heap argument(s)
~~Dr.M~~ 0 unique, 0 total warning(s)
~~Dr.M~~ 4 unique, 4 total, 820 byte(s) of leak(s)
~~Dr.M~~ 0 unique, 0 total, 0 byte(s) of possible leak(s)
~~Dr.M~~ ERRORS IGNORED:
~~Dr.M~~ 1 potential leak(s) (suspected false positives)
~~Dr.M~~ (details: C:\Users\aho\AppData\Roaming\Dr. Memory\DrMemory-a.exe.7696.000\potential_errors.txt)
~~Dr.M~~ 7 unique, 7 total, 550 byte(s) of still-reachable allocation(s)
~~Dr.M~~ (re-run with "-show_reachable" for details)
~~Dr.M~~ Details: C:\Users\aho\AppData\Roaming\Dr. Memory\DrMemory-a.exe.7696.000\results.txt |
The memory leak doesn't cause when I remove >ridk exec gcc openssl-leak.c -IC:/Ruby27-x64/msys64/mingw64/bin/../include -LC:/Ruby27-x64/msys64/mingw64/bin/../li
b -lssl -lcrypto
>ridk exec "C:\Program Files (x86)\Dr. Memory\bin\drmemory.exe" -leaks_only -- a.exe
~~Dr.M~~ Dr. Memory version 2.5.0
~~Dr.M~~ Running "a.exe"
~~Dr.M~~ Using system call file C:\Users\aho\AppData\Roaming\Dr. Memory\symcache\syscalls_x64.txt
~~Dr.M~~
~~Dr.M~~ NO ERRORS FOUND:
~~Dr.M~~ 0 unique, 0 total invalid heap argument(s)
~~Dr.M~~ 0 unique, 0 total warning(s)
~~Dr.M~~ 0 unique, 0 total, 0 byte(s) of leak(s)
~~Dr.M~~ 0 unique, 0 total, 0 byte(s) of possible leak(s)
~~Dr.M~~ ERRORS IGNORED:
~~Dr.M~~ 1 potential leak(s) (suspected false positives)
~~Dr.M~~ (details: C:\Users\aho\AppData\Roaming\Dr. Memory\DrMemory-a.exe.17940.000\potential_errors.txt)
~~Dr.M~~ 3 unique, 3 total, 38 byte(s) of still-reachable allocation(s)
~~Dr.M~~ (re-run with "-show_reachable" for details)
~~Dr.M~~ Details: C:\Users\aho\AppData\Roaming\Dr. Memory\DrMemory-a.exe.17940.000\results.txt |
Use OpenSSL 1.1.1n with debug symbol:
|
I got it. It's a bug of the patch in MINGW: |
How to build:
|
Fixed version of MinGW's OpenSSL package (1.1.1.o-2) has been released: We'll replace Ruby's OpenSSL with it (if we release td-agent before future version of Ruby which doesn't include it). |
Fix #374 Signed-off-by: Takuro Ashie <ashie@clear-code.com>
MinGW'S OpenSSL 1.1.1.o-1 or before has a memory leak bug in `X509_STORE_set_default_paths`. Because Fluentd's out_forward plugin calls it on every communication, memory leaks are accumulated so that it will causes out of memory. Fix #374 Signed-off-by: Takuro Ashie <ashie@clear-code.com>
MinGW'S OpenSSL 1.1.1.o-1 or before has a memory leak bug in `X509_STORE_set_default_paths`. Because Fluentd's out_forward plugin calls it on every communication, memory leaks are accumulated so that it will causes out of memory. Fix #374 Signed-off-by: Takuro Ashie <ashie@clear-code.com>
Recently we received a bug report from a customer that Windows version of td-agent causes memory leak.
It's reproducible with td-agent 4.3.0.
After I investigated it, I found out that just using TLS causes this issue.
In addition, Ruby doesn't recognize it. I confirmed it by fluent-plugin-sigdump.
fluent-plugin-windows-eventlog and certstore_c also don't have major memory leak, I confirmed it by Dr.Memory and reviewing the code.
While testing cerstore_c by Dr.Memory, OpenSSL related leaks are detected. So that the origin of this memory leaks might be Ruby (openssl gem) or OpenSSL. We don't yet understand the detail due to lack of debug symbols.
Although the memory leak by one communication is tiny, it's not ignorable. Because out_forward plugin of Fluentd discards a socket every time after finishing a communication and recreate it by default, so that memory leaks are accumulated.
It can be mitigate by
keepalive
option of out_forward, but we need to fix the root cause to allow long-life operation.This issue doesn't reproduce when I use Ruby 2.4.
The text was updated successfully, but these errors were encountered: