-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support java multiline parsing #1073
Comments
Investigate the possibility of enabling multiline parsing directly in the input since we have one input per LogPipeline. |
Step 1: Build a PoC and perform some performance tests. |
Another observation: |
Hello, @chrkl @hisarbalik I hope you're doing well! Can I understand it this way: Adding a built-in multiline parser to fluent-bit will affect its performance. If that's the case, are there any other solutions available for this case? |
@benzhonghai008 We have not yet reached a conclusion regarding the acceptability of the performance impact and whether we will proceed with this feature. We will provide updates on this issue once a decision has been made. |
The tests were very promising in the beginning but at the end revealed a serious problem in the scenario that a pod is constant multi-line error messages, bringing the successful delivery of any other pod log in danger. As anyway we are quite careful with any changes in fluentbit affecting the buffer handling from past experiences, we don't want to add that feature as a supported feature to the current setup. Instead we will assure to have it included natively in the planned setup based on OpenTelemetry, see #556, where we want to make major progress this year. For the time being, we could see how we enable multi-line parsing as an unsupported feature, to be used at your own risk, enabling the usage in the custom filter section. That we could do with very few effort based on the available standard parsers, allowing at max one filter definition, always listed as a first filter and having by default the buffer turned off. |
Thanks everyone so much! I deeply appreciate your support for this issue. I have discussed with my leader, we understand that if choose to use multiline parsing as an unsupported feature, we’ll assume the risk ourselves. However, our current development colleagues are indeed encountering the challenge of multi-line logs, and a significant number of these logs are generated in the default format provided by the framework. Unfortunately, developers are unable to specify that all logs should be outputted in the standardized JSON format. In this case, how should I proceed to enable the usage of multiline parsing within the custom filter section?Do you have any reference documents or examples available? |
Hi @benzhonghai008, |
Adding the feature as custom filter using built-in parsers will require these changes:
|
|
Tested it with following LogPipeline spec.filters definition and worked well:
Please remember that the "buffer setting" will enable the buffering of logs across chunks. Having it off will avoid the buffer fully but only logs can be combined which are part of the same chunk. Resulting in stacktraces divided into two log entries for example. |
Description
In general the logs of your application should be encoded in JSON, so that logging backends can automatically detect log attributes and can provide advanced capabilities for analyzing the data.
Using JSON will also cover the problem of recombining multiline logs.
However, there are border cases where it is very hard to enable JSON logging in Java, for example in the startup phase of Spring boot when the logger is not yet initialized. Having the stacktraces recorded in one line will be great.
It will be great if the LogPipeline can support multiline parsing for Java out of the box. Providing customers parsers would be even more optimal.
Goal
Add support to a LogPipeline input for Java multiline parsing.
Criterias
Implementation Considerations
FluenBit supports multline parsing at the tail plugin and at the multiline filter plugin. It turned out that the tail plugin applies only one matching parser out of the defined parser collection (the first matching one). In the kubernetes scenario, first the CRI parser must be applied and then the Java parser. With that there is no way to solve the problem with the tail plugin alone.
The filter has a strong limitation of requiring a re-emmitting of the logs as it needs to do buffering, which always was a no-go. However, it seems that the additional buffering is not needed when using it in combination with the tail input only.
A test using this setup worked well:
In this setup, a new emitter gets created using the new alias, however, the original emitter is inactive and seem to be not used actively. The filesystem buffer metrics indicate that indeed only one buffer is in use.
The stacktraces were all recombined successfull over multiple tries.
API Proposal
Thinking about new API elements it is crucial to have the future OTEL based setup in mind. Here, recombine parsers are available as part of the filelogreceiver. Simila input is required, see for example an otel setup and a fluent setup
Release Notes
The text was updated successfully, but these errors were encountered: