Add Windows Support to Fluent Bit #960

fujimotos · 2018-12-14T05:29:36Z

Overview

Right now, Fluent Bit is designed to run on Unix platforms (Linux, BSD and OSX).

This ticket aims to expand the supported platforms a bit more; Namely we want
to be able to run Fluent Bit on Windows, and make it possible to ship logs
efficiently from there.

Goals

Make the core engine of Fluent Bit runnable on Windows.
- Our primary build target is MSVC 2017.
- (Cygwin and MinGW are out of our scope for now)
Create an installation package so that users can install easily
- Maybe setup.exe or fluent-bit.msi; We need to figure out how to do it.
Migrate plugins as many as possible (see below)

Roadmap

ETA of this feature (Windows support) is 2019 1Q.

We don't expect all plugins being migrated to Windows by the end of March 2019,
though. The high priority list of plugins is attached to this ticket.

Also we are planning to create a new input plugin for Windows event logs
(maybe in_windows_eventlog). This plugin should be included in the initial
release of Windows support.

List of plugins to be ported

Input Plugins
- in_tail
- in_lib
- in_syslog
Filter Plugins
- filter_grep
- filter_lua
- filter_modify
- filter_record_modifier
- filter_stdout
Output Plugins
- out_es
- out_kafka
- out_flowcounter
- out_counter
- out_http
- out_splunk

The text was updated successfully, but these errors were encountered:

farcop · 2020-05-28T17:05:47Z

#2208 [Windows] fluent-bit can not find log files grater then 2Gbytes

fujimotos · 2020-06-02T05:55:52Z

https://github.com/fluent/fluent-bit/commits/win32-next

Here is the current tip of the Windows development (2020-06-02).

Improvement
- Enable out_influxdb for Windows build out_influxdb: Enable the plugin on Windows #2207
- Remove the dependency on vcruntime140.dll build: fix "vcruntime140.dll is missing" error on Windows #2170
Kubernetes Support
- Fix the "unremovable pods" issue in_tail: Work around "undeletable file" issue on Windows #2141
- Add dns_retries option to mitigate unstable network filter_kubernetes: Poll DNS status on Windows pods #2186
- More natural tag expansion on Windows file system in_tail: add more path delimiters to the sanitization list #2188
Bug fixes
- Fix infinite loop bug in in_tail scheduler: fix infinite loop bug on Windows #2195
- Fix segf on net.keepalive=yes upstream: fix segmentation fault on net.keepalive=yes #2192
- Fix segf on closing network fonnection upstream: fix a segfault bug in flb_upstream_conn_release() #2206
- Fix "Type" option usable again parser: fix segmentation fault on "Types" option #2200

Here is the latest experimental builds:

A major improvement is a significantly better kubernetes support.
Most bugs reported have been resolved, so Fluent Bit should work
fine on Windows pods. Just report back to me if you see anything
working not well.

tanaka-takayoshi · 2020-06-04T12:28:42Z

Hello,

Has anyone checked fluent-bit is working well with Windows Container? I'm trying but it doesn't work.
My fluent-bit.conf has @include section.

fluent-bit.conf

[SERVICE]
...
  @INCLUDE input-kubernetes.conf
  @INCLUDE ...

input-kubernetes.conf

[INPUT]
  NAME tail
  ...

Almost the same configuration Is working fine on fluent-bit on Linux container.
The fluent-bit outputs the following error message. The same error happens both with the latest stable 1.4.4 and 1.5.0.

[2020/05/31 17:25:24] [Warning] [config] I cannot open input-kubernetes.conf file
Fluent Bit v1.4.4
* Copyright (C) 2019-2020 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
Error: Configuration file contains errors. Aborting

It looks like #1436,
At least I confirmed the files are actually located in the correct folder.

I'm afraid this is not a good issue comment because of lack of information, but I would appreciate if anyone has an idea.

sachinmsft · 2020-06-05T01:49:57Z

@tanaka-takayoshi yes, I think @include does not work in windows.
just use the config in below format and it should work

fluent-bit.conf: |
[SERVICE]

[INPUT]
    

[FILTER]
   



[OUTPUT]
    
[OUTPUT]
    
[OUTPUT]
    
[OUTPUT]
   
[OUTPUT]

parsers.conf: |
[PARSER]

fujimotos · 2020-06-05T14:05:13Z

@tanaka-takayoshi @sachinmsft Hmm. I suppose @INCLUDE should
work on Windows.

I cannot open input-kubernetes.conf file Fluent Bit v1.4.4

The exact failure path is:

https://github.com/monkey/monkey/blob/master/mk_core/mk_rconf.c#L221

So it's a plain fopen(). According to your error message, it failed
with -1, unable to open "input-kubernetes.conf".

My current guess is that Fluent Bit was somehow looking at a different
directory than you expected.

To investigate further, can you share 1) the directory layout of your
configuration files and 2) how you invoked fluent-bit.exe?

I'm going to find some time next week and try to reproduce your issue.

sachinmsft · 2020-06-05T19:18:34Z

Hi @fujimotos ,
I am seeing a new bug in the fluent-bit. I see that if fluent-bit is not able to reach to elastic-search cluster then I see that timer tail_fs_check is not triggering. I have added some logs inside the fluent-bit and as you can see in one instance it it triggering tail_fs_check time if it does not have connectivity issue

[2020/06/05 17:25:07] [ info] [storage] version=1.0.3, initializing...
[2020/06/05 17:25:07] [ info] [storage] in-memory
[2020/06/05 17:25:07] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2020/06/05 17:25:07] [ info] [engine] started (pid=10408)
[2020/06/05 17:25:07] [ info] [input:tail:tail.0] inside flb_tail_fs_init
[2020/06/05 17:25:07] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc.cluster.local port=443
[2020/06/05 17:25:07] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2020/06/05 17:25:07] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2020/06/05 17:25:28] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/06/05 17:25:28] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD loggingstack-fluent-bit-windows-7gp47
[2020/06/05 17:25:28] [ info] [sp] stream processor started
[2020/06/05 17:25:49] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/06/05 17:26:10] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/06/05 17:26:31] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/06/05 17:26:52] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/06/05 17:27:14] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/06/05 17:27:35] [error] [filter:kubernetes:kubernetes.0] upstream connection error
[2020/06/05 17:27:35] [ info] [input:tail:tail.0] inside tail_fs_check
[2020/06/05 17:27:37] [ info] [input:tail:tail.0] inside tail_fs_check
[2020/06/05 17:27:40] [ info] [input:tail:tail.0] inside tail_fs_check
[2020/06/05 17:27:42] [ info] [input:tail:tail.0] inside tail_fs_check
[2020/06/05 17:27:45] [ info] [input:tail:tail.0] inside tail_fs_check
[2020/06/05 17:27:47] [ info] [input:tail:tail.0] inside tail_fs_check
[2020/06/05 17:27:50] [ info] [input:tail:tail.0] inside tail_fs_check

but not triggering if it has connectivity issue

[2020/06/05 17:25:07] [ info] [engine] started (pid=6320)
[2020/06/05 17:25:07] [ info] [input:tail:tail.0] inside flb_tail_fs_init
[2020/06/05 17:25:07] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc.cluster.local port=443
[2020/06/05 17:25:07] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2020/06/05 17:25:07] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2020/06/05 17:25:07] [ info] [filter:kubernetes:kubernetes.0] API server connectivity OK
[2020/06/05 17:25:07] [ info] [sp] stream processor started
[2020/06/05 17:25:07] [ warn] [input] tail.0 paused (mem buf overlimit)
[2020/06/05 17:25:29] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:25:29] [ warn] [engine] failed to flush chunk '6320-1591377907.783287200.flb', retry in 11 seconds: task_id=1, input=tail.0 > output=es.0
[2020/06/05 17:25:29] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:25:29] [ warn] [engine] failed to flush chunk '6320-1591377907.800896400.flb', retry in 11 seconds: task_id=2, input=tail.0 > output=es.0
[2020/06/05 17:25:29] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:25:29] [ warn] [engine] failed to flush chunk '6320-1591377907.842661800.flb', retry in 11 seconds: task_id=4, input=tail.0 > output=es.0
[2020/06/05 17:25:29] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:25:29] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:25:29] [ warn] [engine] failed to flush chunk '6320-1591377907.765930100.flb', retry in 11 seconds: task_id=0, input=tail.0 > output=es.0
[2020/06/05 17:25:29] [ warn] [engine] failed to flush chunk '6320-1591377907.823109300.flb', retry in 11 seconds: task_id=3, input=tail.0 > output=es.0
[2020/06/05 17:25:29] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:25:29] [ warn] [engine] failed to flush chunk '6320-1591377907.861788600.flb', retry in 11 seconds: task_id=5, input=tail.0 > output=es.0
[2020/06/05 17:25:29] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:25:29] [ warn] [engine] failed to flush chunk '6320-1591377907.969191900.flb', retry in 11 seconds: task_id=6, input=tail.0 > output=es.0
[2020/06/05 17:25:29] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:25:29] [ warn] [engine] failed to flush chunk '6320-1591377907.970391200.flb', retry in 11 seconds: task_id=7, input=tail.0 > output=es.0
[2020/06/05 17:26:01] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:26:01] [ warn] [engine] failed to flush chunk '6320-1591377907.783287200.flb', retry in 88 seconds: task_id=1, input=tail.0 > output=es.0
[2020/06/05 17:26:01] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:26:01] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:26:01] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:26:01] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:26:01] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)
[2020/06/05 17:26:01] [error] [io] TCP connection failed: loggingstack-opendistro-es-client-service:8443 (Unknown error)

problem that is happening because of that is that if in between docker try to delete the pod it is not able to do so since fluent-bit has opened file handle to the log file and docker keep waiting for that handle to be closed.

fujimotos · 2020-06-07T22:49:03Z

@sachinmsft You need to look at this log line:

[2020/06/05 17:25:07] [ warn] [input] tail.0 paused (mem buf overlimit)

The reason tail_fs_check() didn't fire is that the send queue (memory
buffer) was already full.

Since Fluent Bit could not find any more place to store temporal data,
it stopped reading from files. This is an expected behaviour and not
exactly a bug.

problem that is happening because of that is that if in between docker try to delete the pod it is not able to do so since fluent-bit has opened file handle to the log file and docker keep waiting for that handle to be closed.

I guess this is another issue already fixed by #2141. Please use v1.4.4
and see if it resolves this issue.

sachinmsft · 2020-06-07T23:00:50Z

@fujimotos I have already taken the fix #2141 and it solves the success scenario.

**The reason tail_fs_check() didn't fire is that the send queue (memory
buffer) was already full.

Since Fluent Bit could not find any more place to store temporal data,
it stopped reading from files. This is an expected behaviour and not
exactly a bug.**

think below scenario:
fluent-bit starts and it will opens the file handle to logs file to read the logs.
though it check the connectivity with elastic search and see that it can not reach to ES so it stopped reading the file though it still has the log file handle opened.

now docker comes and wants to delete the pod and associated log file. but since fluent-bit has opened the log file handle so docker can not delete the log file and keep checking if log file handle is closed or not. but since tail_fs_check() is not running so fluent-bit also does not check the file status and does not close the file handle.

and as a result of it pod stuck in terminating state for ever.

I am not saying that that not running tail_fs_check() is bug. but for windows there is should be mechanism to check the file status even if send queue is full.

i have repro of this scenario on my setup.

edsiper · 2020-06-11T16:17:48Z

the solution is to enable filesystem buffering, so if you hit a memory limit and cannot flush data, at least your collected data is stored in the file system and will be flushed once connectivity is up again.

But, we don't support file system buffering on Windows yet.

sachinmsft · 2020-06-11T18:29:31Z

Actually I am talking about difference issue.
When tail input paused for any reason we destroy the timer that fires tail_fs_check() and as a reason we don't check if any file is deleted or not.
and when docker tries to delete the pod it try to delete the log file associated with pod but since fluent-bit has one handle opened for that log file and we are not firing (since tail is paused and we have destroyed the tail_fs_check() timer) tail_fs_check() to close the log file FD if docker is trying to delete the pod log file and pod stuck in terminating state.

I have observed all above scenario on Windows. I have not tested on Linux.

fujimotos · 2020-06-29T09:48:47Z

So a new testing build for v1.5.0 is out.

https://github.com/fluent/fluent-bit/releases/tag/v1.5.0-win32-rc4

We start to support "Windows Service" since this version. This means
that you can run Fluent Bit as a long-running background process
on Windows systems.

# Register "fluent-bit"
% sc.exe create fluent-bit binpath= "\flb\fluent-bit.exe -c \flb\fluent-bit.conf"

# Stop and stop fluent-bit
% sc.exe start fluent-bit
% sc.exe stop fluent-bit

This feature is pretty new, so I'm awaiting your testing report and
further suggestions.

Also we started to include a PDB file "fluent-bit.pdb" to each build
(thanks to @gitfool). You can use this file to get a detailed stack-
trace etc. I hope it helps general debugging.

New features from v1.4

Improvements
- Enable out_influxdb for Windows build out_influxdb: Enable the plugin on Windows #2207
- Remove the dependency on vcruntime140.dll build: fix "vcruntime140.dll is missing" error on Windows #2170
- Add a PDB file to each release Enable PDB generation on Windows build #2294
- Add Windows Service support win32: add 'Windows Service' support #2296
Kubernetes Support
- Fix the "unremovable pods" issue in_tail: Work around "undeletable file" issue on Windows #2141
- Add dns_retries option to mitigate unstable network filter_kubernetes: Poll DNS status on Windows pods #2186
- More natural tag expansion on Windows file system in_tail: add more path delimiters to the sanitization list #2188
Bug fixes
- Fix infinite loop bug in in_tail scheduler: fix infinite loop bug on Windows #2195
- Fix segf on net.keepalive=yes upstream: fix segmentation fault on net.keepalive=yes #2192
- Fix segf on closing network connection upstream: fix a segfault bug in flb_upstream_conn_release() #2206
- Fix "Type" option usable again parser: fix segmentation fault on "Types" option #2200
- Fix segf in logging worker fix a segmentation fault issue in logging worker #2295

Test Builds

gitfool · 2020-06-30T01:13:04Z

@fujimotos I dropped the test build into a couple of production machines and they both quickly hung while spinning high cpu and are not generating output:

Hang dump with Sysinternals ProcDump and busy thread in WinDbg:

fluent-bit.exe_200630_005249.dmp.zip

.  0  Id: 1c8.f7c Suspend: 0 Teb: 00000075`7dd0d000 Unfrozen
 # RetAddr           : Args to Child                                                           : Call Site
00 00007ffd`cedcea69 : 000001cb`00000000 00000000`00000000 000001cb`ae4dc570 00000000`00000000 : ntdll!NtDeviceIoControlFile+0x14
01 00007ffd`d0cbb3b3 : 00000000`00002736 00000000`00000000 00000000`00000000 00000000`00000000 : mswsock!WSPSelect+0x4c9
02 00007ff6`6885e4b5 : 000001cb`ae486270 00000075`7dbbf618 000001cb`ae493320 00007ff6`68a08a00 : ws2_32!select+0x1d3
03 00007ff6`6885879c : 000001cb`ae493320 00000075`7dbbf5d0 00000075`7dbbf618 000001cb`00000000 : fluent_bit!win32_dispatch+0x145 [c:\projects\fluent-bit-2e87g\lib\monkey\mk_core\deps\libevent\win32select.c @ 326] 
04 00007ff6`6884fd26 : 00000000`00000001 00000000`00000001 00000000`00000000 00000000`00000000 : fluent_bit!event_base_loop+0x24c [c:\projects\fluent-bit-2e87g\lib\monkey\mk_core\deps\libevent\event.c @ 1949] 
05 (Inline Function) : --------`-------- --------`-------- --------`-------- --------`-------- : fluent_bit!_mk_event_wait+0x1c [c:\projects\fluent-bit-2e87g\lib\monkey\mk_core\mk_event_libevent.c @ 349] 
06 00007ff6`6866590b : 00000000`00000000 00000000`00000000 00000000`000001fc 000001cb`ae474a40 : fluent_bit!mk_event_wait+0x26 [c:\projects\fluent-bit-2e87g\lib\monkey\mk_core\mk_event.c @ 163] 
07 00007ff6`68658746 : 00000000`00000001 00000000`00000003 00000000`00000003 000001cb`ae3f5960 : fluent_bit!flb_engine_start+0x37b [c:\projects\fluent-bit-2e87g\src\flb_engine.c @ 549] 
08 00007ff6`68864e08 : 00000000`00000000 00000000`00000000 000001cb`ae45f980 00007ff6`68a0e250 : fluent_bit!flb_main+0x6f6 [c:\projects\fluent-bit-2e87g\src\fluent-bit.c @ 1034] 
09 (Inline Function) : --------`-------- --------`-------- --------`-------- --------`-------- : fluent_bit!invoke_main+0x22 [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 78] 
0a 00007ffd`d07a84d4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : fluent_bit!__scrt_common_main_seh+0x10c [d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 
0b 00007ffd`d310e8b1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0x14
0c 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21

Fluent Bit debug logs; still coming from WinSW since I haven't yet tried the new Windows Service feature:

WinSW.NET461.err.log.zip

Each time I restart the Fluent Bit service it seems to have a burst of activity, uploading logs to date to AWS Elasticsearch, but then it hangs again. New to these logs compared to v1.4.6 is keepalive connection info, which seems to show it disconnecting immediately:

[2020/06/30 00:44:10] [debug] [upstream] KA connection #608 to logs:80 is now available
[2020/06/30 00:44:10] [debug] [upstream] KA connection #608 to logs:80 has been disconnected by the remote service

fujimotos · 2020-06-30T05:56:19Z

Each time I restart the Fluent Bit service it seems to have a burst of activity, uploading logs to date to AWS Elasticsearch, but then it hangs again. New to these logs compared to v1.4.6 is keepalive connection info, which seems to show it disconnecting immediately:

@gitfool Hmm. I could confirm this happens with keepalive enabled.

I incorporated a fix #2309 into win32-next and released v1.5.0-win32-rc5.
New test builds are:

Now HTTP requests seems to be working reliably on my environment. I'd
appreciate if you can confirm it.

gitfool · 2020-06-30T06:09:54Z

@fujimotos keepalive connections are now being recycled and cpu is back to normal (low) levels. Great turnaround!

WinSW.NET461.err.zip

gitfool · 2020-06-30T20:30:24Z

@fujimotos checking the logs from overnight, I'm seeing quite a few es output related warnings and errors:

WinSW.NET461.err.zip

[2020/06/30 18:52:28] [ warn] [engine] failed to flush chunk '8160-1593543147.508013200.flb', retry in 9 seconds: task_id=0, input=tail.0 > output=es.0
...
[2020/06/30 18:52:37] [ info] [engine] flush chunk '8160-1593543147.508013200.flb' succeeded at retry 1: task_id=1, input=tail.0 > output=es.0
...
[2020/06/30 18:52:30] [error] [output:es:es.0] HTTP status=0 URI=/_bulk, response:
{"took":7,"errors":false,"items":[{"index":{"_index":"logstash-2020.06.30","_type":"_doc","_id":"2QKSBnMBpWtYKTqnw2ci","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":234509,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2020.06.30","_type":"_doc","_id":"2gKSBnMBpWtYKTqnw2ci","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":234138,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2020.06.30","_type":"_doc","_id":"2wKSBnMBpWtYKTqnw2ci","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":234510,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2020.06.30","_type":"_doc","_id":"3AKSBnMBpWtYKTqnw2ci","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":234511,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2020.06.30","_type":"_doc","_id":"3QKSBnMBpWtYKTqnw2ci","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":234512,"_primary_term":1,"status":201}},{"index":{"_index":"logstash-2020.06.30","_type":"_doc","_id":"3gKSBnMBpWtYKTqnw2ci","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"_seq_no":234139,"_primary_term":1,"status":201}}]}

The errors include the response from Elasticsearch, which doesn't look like an error to me. Maybe something else is afoot?
(The logs I kept from previously running Fluent Bit 1.4.6 didn't have any of these warnings or errors.)

fujimotos · 2020-07-01T00:40:46Z

@gitfool Evidently this is a bug in keepalive mode. I see c->resp.status is broken here:

[2020/06/30 18:52:30] [error] [output:es:es.0] HTTP status=0 URI=/_bulk, response:

Now, I suspect this is occurring due to FLB_ES_DEFAULT_HTTP_MAX.
When there is more data than the payload limit, it could leave some
data into the socket.

So when Fluent Bit attempts to re-use that socket, it first read the
payload from the previous request, resulting in misdetection of the
status code.

If my guess is correct, these errors should be gone if you increase
buffer_size as follows (default is 4kb):

[OUTPUT]
    Name es
    ...
    Buffer_Size 32kb
    ...

Can you confirm it? If it indeed solves the issue, I'll work on a
fix on this issue later.

gitfool · 2020-07-01T05:28:11Z

@fujimotos I can confirm it affects the outcome. I only had 1 error in the last 3 hours, and I noticed that was after restarting the service for the config change, so I then did a test where I stopped the service and waited a couple of minutes before starting the service again. As expected, the logs had backed up enough that the first bulk send was large enough to cause the response from Elasticsearch to be larger than 32KB and I saw a couple of errors in quick succession and then none after that.

So it looks like the buffer is the issue, but rather than relying on a bigger buffer which could still be insufficient, the solution needs to bleed any excess data before the socket can be safely re-used.

fujimotos · 2020-07-03T09:18:29Z

@gitfool Thank you for the confirmation.

So it looks like the buffer is the issue, but rather than relying on a bigger buffer which could still be insufficient, the solution needs to bleed any excess data before the socket can be safely re-used.

I posted a fix to #2323. I ended up fixing it by marking the socket as "not
recyclable", to advise the connection manager to open a new connection.

The reason for the choice is the uncertainty of how long it takes to read
the remaining payload; If the server sends a very large data (say, 1GB),
I fear it can easily get fluent-bit to perform a expensive busy loop.

So the patch above choose to close the socket, instead of the (small?)
benefit of connection reuse on that failure path.

fujimotos · 2020-07-03T09:56:55Z

Here is the current tip of the Windows development (2020-07-03).

https://github.com/fluent/fluent-bit/releases/tag/v1.5.0-win32-rc6

This release includes the improved support for Windows Event Log.
It is now possible to safely use in_winlog for multi-byte data. So the
"invalid UTF-8 bytes" error #1949 should not happen anymore.

Also two new output plugins are added to our Windows build:

out_azure
out_cloudwatch_logs (by @PettitWesley)

One more thing: A fix for the connection-reuse bug is included in
this release. I hope this resolves the issue reported by @gitfool.

I'd very appreciate if anyone interested tries out the build and
report back (note: this will be the last rc build before v1.5).

New features from v1.4

Improvements
- Remove the dependency on vcruntime140.dll build: fix "vcruntime140.dll is missing" error on Windows #2170
- Add a PDB file to each release Enable PDB generation on Windows build #2294
- Add Windows Service support win32: add 'Windows Service' support #2296
- Improve Windows Event Log support in_winlog: fix "invalid UTF-8 bytes, skipping" #2322
New plugins
- out_influxdb out_influxdb: Enable the plugin on Windows #2207
- out_stackdriver out_stackdriver: Port out_stackdriver to Windows #2041
- out_azure build: enable out_azure on Windows #2318
- out_cloudwatch_logs out_cloudwatch_logs: Rename 'event' to 'cw_event' #2319
Kubernetes Support
- Fix the "unremovable pods" issue in_tail: Work around "undeletable file" issue on Windows #2141
- Add dns_retries option to mitigate unstable network filter_kubernetes: Poll DNS status on Windows pods #2186
- More natural tag expansion on Windows file system in_tail: add more path delimiters to the sanitization list #2188
Bug fixes
- Fix infinite loop bug in in_tail scheduler: fix infinite loop bug on Windows #2195
- Fix segf on net.keepalive=yes upstream: fix segmentation fault on net.keepalive=yes #2192
- Fix segf on closing network connection upstream: fix a segfault bug in flb_upstream_conn_release() #2206
- Fix "Type" option usable again parser: fix segmentation fault on "Types" option #2200
- Fix segf in logging worker fix a segmentation fault issue in logging worker #2295
- Fix connection reuse bug in HTTP client http_client: mark connection as non-reuseable on failure #2323

Test Builds

PettitWesley · 2020-07-04T03:46:47Z

@fujimotos FYI, just put up a few PRs to fix issues found by Coverity Scan, including the new AWS/CloudWatch code you mentioned.

fujimotos · 2020-07-05T07:12:01Z

@PettitWesley Thank you. I'll integrate your fixes into my build
when I release a new RC.

PettitWesley · 2020-07-06T08:57:39Z

@fujimotos

Sorry… a few more fixes. I finally went through and tested every AWS use case today; I found a few things that I needed to fix:

There are also still a few Coverity issue fixes which Eduardo has not merged yet:

That should be it from me for 1.5. I have run through every AWS scenario now. Apologies for the late notice.

fujimotos · 2020-07-08T08:36:50Z

This is the current tip of the Windows development (2020-07-08)

https://github.com/fluent/fluent-bit/releases/tag/v1.5.0-win32-rc7

This is the final Windows candidate release for v1.5.0. No major change
has been made since rc6, but it incorporates some fixes in the mainline.

The official release of v1.5.0 is planned to be the next Monday (July 13).
I'm right now doing "last-minute" testing against the following build:

I'd like to express my thanks to everyone who has sent me suggestions
and bug reports. It was so much helpful for the project!

fujimotos · 2020-07-14T01:17:30Z

Fluent Bit v1.5.0 is out.

https://fluentbit.io/announcements/v1.5.0/

Thanks for everyone who helped the development on this cycle (especially,
@theggelund, @sachinmsft, @djsly, @titilambert, @heyaWorld, @gitfool
and @farcop)

For v1.6 discussion, I decided to move to a new thread #2351, since this issue
became too long that GH won't show every comment anymore. So if you have
ideas or suggestions, please comment to the new thread.

cosmo0920 mentioned this issue Dec 17, 2018

Add CIO_BACKEND_FILESYSTEM=On/Off to enable/disable filesystem backend support. fluent/chunkio#4

Merged

This was referenced Dec 21, 2018

onigmo: windows: Use CMake when using MSVC #990

Closed

luajit: windows: Use CMake when using MSVC #991

Closed

This was referenced Dec 31, 2018

core: use _fullpath in place of realpath on Windows #999

Merged

compat: move the network-related compatibility stuff #998

Merged

engine: include <fluent-bit/flb_time.h> explicitly #1005

Merged

fujimotos mentioned this issue Jul 3, 2020

http_client: mark connection as non-reuseable on failure #2323

Merged

fujimotos mentioned this issue Jul 3, 2020

td-agent-bit is not capturing channels information properly in Windows #1949

Closed

fujimotos closed this as completed Jul 14, 2020

nickgerace mentioned this issue Oct 9, 2020

WIP: Ensure logging pods only run on Linux nodes rancher/charts#775

Closed

fujimotos mentioned this issue Oct 30, 2020

General Windows development #2738

Closed

edespong mentioned this issue Apr 20, 2023

syslog input does not work on Windowss #7235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Windows Support to Fluent Bit #960

Add Windows Support to Fluent Bit #960

fujimotos commented Dec 14, 2018 •

edited

Loading

farcop commented May 28, 2020

fujimotos commented Jun 2, 2020

tanaka-takayoshi commented Jun 4, 2020 •

edited

Loading

sachinmsft commented Jun 5, 2020

fujimotos commented Jun 5, 2020

sachinmsft commented Jun 5, 2020 •

edited

Loading

fujimotos commented Jun 7, 2020

sachinmsft commented Jun 7, 2020

edsiper commented Jun 11, 2020

sachinmsft commented Jun 11, 2020 •

edited

Loading

fujimotos commented Jun 29, 2020

gitfool commented Jun 30, 2020 •

edited

Loading

fujimotos commented Jun 30, 2020

gitfool commented Jun 30, 2020

gitfool commented Jun 30, 2020

fujimotos commented Jul 1, 2020

gitfool commented Jul 1, 2020

fujimotos commented Jul 3, 2020 •

edited

Loading

fujimotos commented Jul 3, 2020 •

edited

Loading

PettitWesley commented Jul 4, 2020

fujimotos commented Jul 5, 2020

PettitWesley commented Jul 6, 2020

fujimotos commented Jul 8, 2020

fujimotos commented Jul 14, 2020

Add Windows Support to Fluent Bit #960

Add Windows Support to Fluent Bit #960

Comments

fujimotos commented Dec 14, 2018 • edited Loading

Overview

Goals

Roadmap

List of plugins to be ported

farcop commented May 28, 2020

fujimotos commented Jun 2, 2020

tanaka-takayoshi commented Jun 4, 2020 • edited Loading

sachinmsft commented Jun 5, 2020

fujimotos commented Jun 5, 2020

sachinmsft commented Jun 5, 2020 • edited Loading

fujimotos commented Jun 7, 2020

sachinmsft commented Jun 7, 2020

edsiper commented Jun 11, 2020

sachinmsft commented Jun 11, 2020 • edited Loading

fujimotos commented Jun 29, 2020

gitfool commented Jun 30, 2020 • edited Loading

fujimotos commented Jun 30, 2020

gitfool commented Jun 30, 2020

gitfool commented Jun 30, 2020

fujimotos commented Jul 1, 2020

gitfool commented Jul 1, 2020

fujimotos commented Jul 3, 2020 • edited Loading

fujimotos commented Jul 3, 2020 • edited Loading

PettitWesley commented Jul 4, 2020

fujimotos commented Jul 5, 2020

PettitWesley commented Jul 6, 2020

fujimotos commented Jul 8, 2020

fujimotos commented Jul 14, 2020

fujimotos commented Dec 14, 2018 •

edited

Loading

tanaka-takayoshi commented Jun 4, 2020 •

edited

Loading

sachinmsft commented Jun 5, 2020 •

edited

Loading

sachinmsft commented Jun 11, 2020 •

edited

Loading

gitfool commented Jun 30, 2020 •

edited

Loading

fujimotos commented Jul 3, 2020 •

edited

Loading

fujimotos commented Jul 3, 2020 •

edited

Loading