Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output forward connections are in CLOSE_WAIT state #4618

Open
karthiknagraj opened this issue Aug 28, 2024 · 11 comments
Open

Output forward connections are in CLOSE_WAIT state #4618

karthiknagraj opened this issue Aug 28, 2024 · 11 comments
Labels
moreinfo Missing version, need reproducible steps, need to investigate more stale

Comments

@karthiknagraj
Copy link

Describe the bug

I have fluentd deployed as a deamonset in kubernetes cluster which is sending logs using output forward plugin. I have shared the config below. In the netstat output of fluentd, I see alot of close_wait connections. How can I resolve this ?

tcp 1 0 fluentd-operator-:<port> 1.1.1.1:<port> CLOSE_WAIT

To Reproduce

This issue is seen when the remote side of the connection has closed the connection.

Expected behavior

Ideally fluentd should handle closing the connection and relive the socket.

Your Environment

- Fluentd version: 1.13.3
- Package version:
- Operating system: VMware Photon OS: V3.0
- Kernel version: 4.18.0-372.36.1.el8_6.x86_64

Your Configuration

<match **>
  @type forward
  keepalive: true                
  recoverWait: 20s               
  requireAckResponse: false      
  send_timeout: 90s              
  keepalive_timeout: 30s         
  <buffer>
        @type file
        path buffer1.buf
        flush_mode interval
        flush_thread_count 8
        flush_interval 2
        retry_type periodic
        retry_wait 10s
        retry_max_times 1
        chunk_limit_size 16m
        total_limit_size 32m
        overflow_action drop_oldest_chunk
        disable_chunk_backup true
    </buffer>
</match>

Your Error Log

[error]: #9 failed to flush the buffer, and hit limit for retries. dropping all chunks in the buffer queue. retry_times=1 records=8139 error_class=Errno::EPIPE error="Broken pipe - sendfile"
[error]: #9 suppressed same stacktrace
[error]: #7 failed to flush the buffer, and hit limit for retries. dropping all chunks in the buffer queue. retry_times=1 records=2457 error_class=Errno::ETIMEDOUT error="Connection timed out - sendfile"
[error]: #7 suppressed same stacktrace

Additional context

No response

@karthiknagraj
Copy link
Author

Hi, Can anyone please help me out with the issue that I am facing ?

@daipom
Copy link
Contributor

daipom commented Sep 9, 2024

Looks like we need more information to investigate this.

  • Is the server side out_forward plugin?
  • Could you please share more logs?
    • Isn't there some detailed stacktraces?
    • Isn't there server-side logs (out_forward)?
    • If you can set log-level to debug or trace, it can help us.

@daipom daipom added waiting-for-user Similar to "moreinfo", but especially need feedback from user and removed waiting-for-triage labels Sep 9, 2024
@karthiknagraj
Copy link
Author

karthiknagraj commented Sep 9, 2024

@daipom Thanks for responding.
The above Out_forward configuration is on the client side. Server side uses in_forward to receive it. When the server sends reset packet to forcefully close the connection, Sockets on client-side(fluentd) are going into CLOSE_WAIT states.

I see the below logs along with the logs that I have already shared.

2024-09-09 16:38:08 +0000 [debug]: #7 connect new socket #<struct Fluent::Plugin::ForwardOutput::SocketCache::TimedSocket timeout=2024-09-09 16:43:08.309568639 +0000, key=["11.25.252.222", <PORT>, "server.com"], sock=#<Fluent::PluginHelper::Socket::WrappedSocket::TCP:fd 91, AF_INET, 192.155.22.114, 46508>>
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `initialize'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `new'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `socket_create_tcp'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:410:in `create_transfer_socket'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/connection_manager.rb:46:in `call'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/connection_manager.rb:46:in `connect'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:807:in `connect'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:676:in `send_data'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:365:in `block in write'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:46:in `block in select_healthy_node'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `times'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `select_healthy_node'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/service_discovery/manager.rb:108:in `select_service'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/service_discovery.rb:82:in `service_discovery_select_service'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:365:in `write'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:353:in `try_flush'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2024-09-09 17:09:59 +0000 [warn]: #1 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 failed to send heartbeat packet host="127.0.0.1" port=<PORT> heartbeat_type=:transport error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"127.0.0.1\" port <PORT>"
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 connect new socket
2024-09-09 17:09:59 +0000 [debug]: #2 taking back chunk for errors. chunk="621b2d3a10bfccc53dc08c7c6d4019ff"
2024-09-09 17:09:59 +0000 [warn]: #2 failed to flush the buffer. retry_times=0 next_retry_time=2024-09-09 17:10:01 +0000 chunk="621b2d3a10bfccc53dc08c7c6d4019ff" error_class=Errno::ECONNREFUSED error="Connection refused - connect(2) for \"127.0.0.1\" port <PORT>"
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `initialize'
2024-09-09 17:09:59 +0000 [debug]: #2 taking back chunk for errors. chunk="621b2d3a10c9e08c09e407b1216c81b2"
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `new'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/socket.rb:62:in `socket_create_tcp'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:410:in `create_transfer_socket'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/connection_manager.rb:46:in `call'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/connection_manager.rb:46:in `connect'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:807:in `connect'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:676:in `send_data'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:365:in `block in write'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:46:in `block in select_healthy_node'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `times'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward/load_balancer.rb:37:in `select_healthy_node'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/service_discovery/manager.rb:108:in `select_service'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/service_discovery.rb:82:in `service_discovery_select_service'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:365:in `write'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1179:in `try_flush'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/out_forward.rb:353:in `try_flush'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:1500:in `flush_thread_run'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin/output.rb:499:in `block (2 levels) in start'
  2024-09-09 17:09:59 +0000 [warn]: #2 /usr/local/lib/ruby/gems/2.7.0/gems/fluentd-1.14.6/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

I am not sure if these logs will help in analyzing better. Please let me know if you are looking for some specific keyword in the logs, I can try to get the output of those logs.

@daipom
Copy link
Contributor

daipom commented Sep 11, 2024

Thanks!
We can know how the client works from these logs.
We need to investigate if this behavior is intended or not.

One more question.

When the server sends reset packet to forcefully close the connection

Why does the server send reset packet?
Could some issues occur on the server side as well?

Could you attach server-side (in_forward) logs?

@karthiknagraj
Copy link
Author

@daipom, I do not have access to the server-side logs. I was just checking the output plugin documentation, If we make the tcp connection as long-lived/persistent i.e., my setting keepalive: true , Then the fluentd doesn't handle closing of the the socket? Is my understanding correct?
https://docs.fluentd.org/plugin-helper-overview/api-plugin-helper-socket
Lines from the documentation: "The socket plugin helper does not manage the lifecycle of the socket. User must close the socket when it is no longer needed."

Could this be the reason why fluentd is not able to clean-up connections in close_wait state.
I even tried by adding the below parameters to output forward plugin, Even this didn't help

send_keepalive_packet true 
keepalive_timeout 300s 

@daipom
Copy link
Contributor

daipom commented Sep 12, 2024

If we make the tcp connection as long-lived/persistent i.e., my setting keepalive: true , Then the fluentd doesn't handle closing of the the socket? Is my understanding correct?

No, out_forward plugin handles the socket closing even if using keepalive.

https://docs.fluentd.org/plugin-helper-overview/api-plugin-helper-socket
Lines from the documentation: "The socket plugin helper does not manage the lifecycle of the socket. User must close the socket when it is no longer needed."
Could this be the reason why fluentd is not able to clean-up connections in close_wait state.
I even tried by adding the below parameters to output forward plugin, Even this didn't help

plugin-helper is a feature for developers of Fluentd plugins, not for Fluentd users.
This is about implementing the plugins, not how to use them.

Here are the documents for plugin users.

@karthiknagraj
Copy link
Author

@daipom, ok sure, But I don't see CLOSE_WAIT connections getting closed. Am I missing any configuration that should be added?

@daipom
Copy link
Contributor

daipom commented Sep 12, 2024

The out_forward plugin has its own logic to handle the sockets, so we need to investigate the implementation.

However, the problem is that I can't reproduce this issue.
In my test, out_forward can close the socket and re-send data to in_forward correctly if in_forward revives.

I found out one thing.

On non-Windows, in_forward sends RST rather than FIN when closing, by default.
It appears that it closes the socket forcefully.
So, by default, I can't reproduce CLOSE_WAIT state sockets that remain.

However, when setting linger_timeout to in_forward, it sends FIN when closing.
This generates some CLOSE_WAIT state sockets.

<source>
  @type forward
  <transport tcp>
    linger_timeout 1
  </transport>
</source>

However, in my testing, this was not a problem.
After in_forward revives, out_forward can reopen the socket and send data correctly in my testing.
So, I'm not sure what is causing this issue phenomenon...

@daipom
Copy link
Contributor

daipom commented Sep 12, 2024

At least, the close_wait socket should be closed after keepalive_timeout elapses.
In my testing, they are closed afer keepalive_timeout.

I need more information to reproduce the phenomenon...

@daipom daipom added moreinfo Missing version, need reproducible steps, need to investigate more and removed waiting-for-user Similar to "moreinfo", but especially need feedback from user labels Sep 12, 2024
@daipom
Copy link
Contributor

daipom commented Sep 12, 2024

If you can, please try a more recent version.

Copy link

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

@github-actions github-actions bot added the stale label Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
moreinfo Missing version, need reproducible steps, need to investigate more stale
Projects
None yet
Development

No branches or pull requests

2 participants