-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Windows Support to Fluent Bit #960
Comments
#2208 [Windows] fluent-bit can not find log files grater then 2Gbytes |
Hello, Has anyone checked fluent-bit is working well with Windows Container? I'm trying but it doesn't work. fluent-bit.conf
input-kubernetes.conf
Almost the same configuration Is working fine on fluent-bit on Linux container.
It looks like #1436, I'm afraid this is not a good issue comment because of lack of information, but I would appreciate if anyone has an idea. |
@tanaka-takayoshi yes, I think @include does not work in windows. fluent-bit.conf: |
parsers.conf: | |
@tanaka-takayoshi @sachinmsft Hmm. I suppose
The exact failure path is: https://github.com/monkey/monkey/blob/master/mk_core/mk_rconf.c#L221 So it's a plain My current guess is that Fluent Bit was somehow looking at a different To investigate further, can you share 1) the directory layout of your I'm going to find some time next week and try to reproduce your issue. |
Hi @fujimotos , [2020/06/05 17:25:07] [ info] [storage] version=1.0.3, initializing... but not triggering if it has connectivity issue [2020/06/05 17:25:07] [ info] [engine] started (pid=6320) problem that is happening because of that is that if in between docker try to delete the pod it is not able to do so since fluent-bit has opened file handle to the log file and docker keep waiting for that handle to be closed. |
@sachinmsft You need to look at this log line:
The reason Since Fluent Bit could not find any more place to store temporal data,
I guess this is another issue already fixed by #2141. Please use v1.4.4 |
@fujimotos I have already taken the fix #2141 and it solves the success scenario. **The reason tail_fs_check() didn't fire is that the send queue (memory Since Fluent Bit could not find any more place to store temporal data, think below scenario: now docker comes and wants to delete the pod and associated log file. but since fluent-bit has opened the log file handle so docker can not delete the log file and keep checking if log file handle is closed or not. but since tail_fs_check() is not running so fluent-bit also does not check the file status and does not close the file handle. and as a result of it pod stuck in terminating state for ever. I am not saying that that not running tail_fs_check() is bug. but for windows there is should be mechanism to check the file status even if send queue is full. i have repro of this scenario on my setup. |
the solution is to enable filesystem buffering, so if you hit a memory limit and cannot flush data, at least your collected data is stored in the file system and will be flushed once connectivity is up again. But, we don't support file system buffering on Windows yet. |
Actually I am talking about difference issue. I have observed all above scenario on Windows. I have not tested on Linux. |
So a new testing build for v1.5.0 is out.
We start to support "Windows Service" since this version. This means # Register "fluent-bit"
% sc.exe create fluent-bit binpath= "\flb\fluent-bit.exe -c \flb\fluent-bit.conf"
# Stop and stop fluent-bit
% sc.exe start fluent-bit
% sc.exe stop fluent-bit This feature is pretty new, so I'm awaiting your testing report and Also we started to include a PDB file "fluent-bit.pdb" to each build New features from v1.4
Test Builds |
@fujimotos I dropped the test build into a couple of production machines and they both quickly hung while spinning high cpu and are not generating output: Hang dump with Sysinternals ProcDump and busy thread in WinDbg: fluent-bit.exe_200630_005249.dmp.zip
Fluent Bit debug logs; still coming from WinSW since I haven't yet tried the new Windows Service feature: Each time I restart the Fluent Bit service it seems to have a burst of activity, uploading logs to date to AWS Elasticsearch, but then it hangs again. New to these logs compared to v1.4.6 is keepalive connection info, which seems to show it disconnecting immediately:
|
@gitfool Hmm. I could confirm this happens with I incorporated a fix #2309 into win32-next and released Now HTTP requests seems to be working reliably on my environment. I'd |
@fujimotos keepalive connections are now being recycled and cpu is back to normal (low) levels. Great turnaround! |
@fujimotos checking the logs from overnight, I'm seeing quite a few es output related warnings and errors:
The errors include the response from Elasticsearch, which doesn't look like an error to me. Maybe something else is afoot? |
@gitfool Evidently this is a bug in keepalive mode. I see
Now, I suspect this is occurring due to So when Fluent Bit attempts to re-use that socket, it first read the If my guess is correct, these errors should be gone if you increase [OUTPUT]
Name es
...
Buffer_Size 32kb
... Can you confirm it? If it indeed solves the issue, I'll work on a |
@fujimotos I can confirm it affects the outcome. I only had 1 error in the last 3 hours, and I noticed that was after restarting the service for the config change, so I then did a test where I stopped the service and waited a couple of minutes before starting the service again. As expected, the logs had backed up enough that the first bulk send was large enough to cause the response from Elasticsearch to be larger than 32KB and I saw a couple of errors in quick succession and then none after that. So it looks like the buffer is the issue, but rather than relying on a bigger buffer which could still be insufficient, the solution needs to bleed any excess data before the socket can be safely re-used. |
@gitfool Thank you for the confirmation.
I posted a fix to #2323. I ended up fixing it by marking the socket as "not The reason for the choice is the uncertainty of how long it takes to read So the patch above choose to close the socket, instead of the (small?) |
Here is the current tip of the Windows development (2020-07-03).
This release includes the improved support for Windows Event Log. Also two new output plugins are added to our Windows build:
One more thing: A fix for the connection-reuse bug is included in I'd very appreciate if anyone interested tries out the build and New features from v1.4
Test Builds |
@fujimotos FYI, just put up a few PRs to fix issues found by Coverity Scan, including the new AWS/CloudWatch code you mentioned. |
@PettitWesley Thank you. I'll integrate your fixes into my build |
Sorry… a few more fixes. I finally went through and tested every AWS use case today; I found a few things that I needed to fix:
There are also still a few Coverity issue fixes which Eduardo has not merged yet:
That should be it from me for 1.5. I have run through every AWS scenario now. Apologies for the late notice. |
This is the current tip of the Windows development (2020-07-08)
This is the final Windows candidate release for v1.5.0. No major change The official release of v1.5.0 is planned to be the next Monday (July 13). I'd like to express my thanks to everyone who has sent me suggestions |
Fluent Bit v1.5.0 is out. https://fluentbit.io/announcements/v1.5.0/ Thanks for everyone who helped the development on this cycle (especially, For v1.6 discussion, I decided to move to a new thread #2351, since this issue |
Overview
Right now, Fluent Bit is designed to run on Unix platforms (Linux, BSD and OSX).
This ticket aims to expand the supported platforms a bit more; Namely we want
to be able to run Fluent Bit on Windows, and make it possible to ship logs
efficiently from there.
Goals
setup.exe
orfluent-bit.msi
; We need to figure out how to do it.Roadmap
ETA of this feature (Windows support) is 2019 1Q.
We don't expect all plugins being migrated to Windows by the end of March 2019,
though. The high priority list of plugins is attached to this ticket.
Also we are planning to create a new input plugin for Windows event logs
(maybe
in_windows_eventlog
). This plugin should be included in the initialrelease of Windows support.
List of plugins to be ported
The text was updated successfully, but these errors were encountered: