Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support java multiline parsing #1073

Closed
a-thaler opened this issue May 14, 2024 · 12 comments
Closed

Support java multiline parsing #1073

a-thaler opened this issue May 14, 2024 · 12 comments
Assignees
Labels
area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@a-thaler
Copy link
Collaborator

a-thaler commented May 14, 2024

Description
In general the logs of your application should be encoded in JSON, so that logging backends can automatically detect log attributes and can provide advanced capabilities for analyzing the data.
Using JSON will also cover the problem of recombining multiline logs.

However, there are border cases where it is very hard to enable JSON logging in Java, for example in the startup phase of Spring boot when the logger is not yet initialized. Having the stacktraces recorded in one line will be great.

It will be great if the LogPipeline can support multiline parsing for Java out of the box. Providing customers parsers would be even more optimal.

Goal
Add support to a LogPipeline input for Java multiline parsing.

Criterias

  • Enhance the API to optional enable multiline support for a predefined java parser
  • The API is designed to support custom parsers in future as well
  • If the java parser is enabled, Java stacktraces are recorded as one line automatically

Implementation Considerations
FluenBit supports multline parsing at the tail plugin and at the multiline filter plugin. It turned out that the tail plugin applies only one matching parser out of the defined parser collection (the first matching one). In the kubernetes scenario, first the CRI parser must be applied and then the Java parser. With that there is no way to solve the problem with the tail plugin alone.
The filter has a strong limitation of requiring a re-emmitting of the logs as it needs to do buffering, which always was a no-go. However, it seems that the additional buffering is not needed when using it in combination with the tail input only.

A test using this setup worked well:

[INPUT]
    name             tail
    alias            cls
    db               /data/flb_cls.db
    exclude_path     /var/log/containers/telemetry-fluent-bit-*_kyma-system_fluent-bit-*.log
    mem_buf_limit    5MB
    multiline.parser cri
    path             /var/log/containers/*_*_istio-proxy-*.log
    read_from_head   true
    skip_long_lines  on
    storage.type     filesystem
    tag              cls.*

[FILTER]
    name         multiline
    match         cls.*
    multiline.key_content log
    multiline.parser   java
    emitter_name multi_cls
    emitter_storage.type filesystem

In this setup, a new emitter gets created using the new alias, however, the original emitter is inactive and seem to be not used actively. The filesystem buffer metrics indicate that indeed only one buffer is in use.
The stacktraces were all recombined successfull over multiple tries.

API Proposal

spec:
  input:
    application:
      multilineParsers: # the first matching parser will be applied
        - preset: java
        - custom: # do not realize as part of this story, just as inspiration how a custom parser could look like
             firstEntryRegex: ^[^\s].*. 
             combineWith: ""

Thinking about new API elements it is crucial to have the future OTEL based setup in mind. Here, recombine parsers are available as part of the filelogreceiver. Simila input is required, see for example an otel setup and a fluent setup

Release Notes


@a-thaler a-thaler added area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature. labels May 14, 2024
@skhalash
Copy link
Collaborator

skhalash commented May 27, 2024

Investigate the possibility of enabling multiline parsing directly in the input since we have one input per LogPipeline.

@skhalash
Copy link
Collaborator

Step 1: Build a PoC and perform some performance tests.

@hisarbalik
Copy link
Contributor

hisarbalik commented Jun 6, 2024

PoC results multiline parser with logpipeline

The fluent-bit tail plugin supports a built-in multiline parser however this will not work with Kubernetes scenarios since the first parser should CRI parser, after matching the first parser the tail plugin will ignore the next one.
We used a multiline parser with a multiline filter with the pipeline to achieve functionality and perform several tests to understand the impact of the additional filter on the tail plugin, especially buffering of the tail plugin under high-load or back-pressure scenarios.

The following tests are performed

  1. Test throughput with and without multiline filter for throughput comparison

  2. Force multiline filter to create back-pressure within the pipeline (not from the output) and observe the impact on the tail plugin and tail plugin buffering behavior

Throughput test

This test uses a constant log load with a basic fluentd deployment configured with a null output as the backend to ensure the backend will not act as a bottleneck. A Prometheus instance was used for the test result measurements.

  • With a multiline filter

Input bytes processing rate measured ~4.6MB/second on average

Multiline Filter input bytes processing rate

Multiline Filter input record processing rate

Buffer usage during the test measured ~20MB on average, with no impact observed on the tail plugin buffer
Multiline Filter buffer usage

Output bytes processing rate measured ~15MB/second
Multiline Filter output bytes processing rate

Container memory usage for both pods measured ~160MB on average
Multiline Filter container memory usage

CPU usage measured for both pods was ~0.9 on average
Multiline Filter container CPU usage

  • Without a multiline filter
    Input bytes processing rate measured ~4.8MB/second on average
    Tail plugin input bytes processing rate

Tail plugin input record processing rate

Buffer usage during the test measured ~15MB on average

Tail plugin buffer usage

Tail plugin output bytes processing rate

Container memory usage for both pods measured ~120MB on average
Tail plugin container memory usage

CPU usage measured for both pods was ~0.9 on average
Tail plugin container CPU usage

The bottom line is that:

  1. The multiline filter has no impact on the tail plugin under high-load
  2. The input performance of the multiline filter is slightly slower than the tail plugin 4.6MB/second multiline filter vs 4.8MB/second the tail plugin
  3. Output performance is the same for both scenarios
  4. CPU usage measured tightly the same for both scenarios
  5. Memory usage was slightly higher for the multiline parser than the tail plugin scenario

The same test is repeated with a Java test application which generates Java stack traces, as a parser built-in multiline Java parser is used to parse Java multiline log lines with regular expressions.

The tail plugin reaches an average input performance of 1.7MB/seconds, the file buffer usages an average of 7MB and 34MB peak

The Java multiline parser reaches an average input performance of 30.7KB/second, and the file buffer usages an average of 400KB and 795KB peak.

The end-to-end log line delivery test results, a multi-line Java log delivery from reading into input to export to the backend took ~2 minutes (tested with built-in java parser and with custom parser), increasing the multiline filter buffer size improves performance however is not sufficient to compare tail plugin performance. No errors or log losses were observed during any test execution.

Multiline Filter back-pressure on the tail plugin

With this test scenario, we will add a filter after the multiline filter which will create back-pressure within the input data processing stage and observe impact on the tail plugin and the buffer of the tail plugin. The test setup used for the throughput test is also used here.

The delay within the input stage was tested with 3 values first with 1-second delay, 2 seconds, and 5 seconds.

Input bytes processing rate falls when the delay is increased
Multiline filter input bytes processing back-pressure

No tail plugin buffer usages during the back-pressure test were observed, only multiline filter buffer fill-ups
Multiline filter back-pressure buffer usage

The bottom line is that:

  1. When back pressure increases, the input throughput slows down
  2. The tail-plugin buffers noting, just multiline filter buffers in this case
  3. Sometimes the tail plugin buffer value shows values around 4KB, this seems to be filesystem-related, the fluent-bit storage metrics show no buffering for these cases

@chrkl
Copy link
Contributor

chrkl commented Jun 7, 2024

Another observation:
When having a node that processes single-line logs with a regular throughput (multiple MB/s), adding a single pod that produces multi-line Java exceptions is enough to slow down the entire Fluent Bit instance. The impact correlates with the frequency of multi-line log messages.

@benzhonghai008
Copy link

Hello, @chrkl @hisarbalik I hope you're doing well! Can I understand it this way: Adding a built-in multiline parser to fluent-bit will affect its performance. If that's the case, are there any other solutions available for this case?

@chrkl
Copy link
Contributor

chrkl commented Jun 19, 2024

@benzhonghai008 We have not yet reached a conclusion regarding the acceptability of the performance impact and whether we will proceed with this feature. We will provide updates on this issue once a decision has been made.

@a-thaler
Copy link
Collaborator Author

The tests were very promising in the beginning but at the end revealed a serious problem in the scenario that a pod is constant multi-line error messages, bringing the successful delivery of any other pod log in danger. As anyway we are quite careful with any changes in fluentbit affecting the buffer handling from past experiences, we don't want to add that feature as a supported feature to the current setup.

Instead we will assure to have it included natively in the planned setup based on OpenTelemetry, see #556, where we want to make major progress this year.

For the time being, we could see how we enable multi-line parsing as an unsupported feature, to be used at your own risk, enabling the usage in the custom filter section. That we could do with very few effort based on the available standard parsers, allowing at max one filter definition, always listed as a first filter and having by default the buffer turned off.
Support for a custom parser will require the configuration of an additional multiline parser which is currently not covered in the current design and we like to avoid

@benzhonghai008
Copy link

Thanks everyone so much! I deeply appreciate your support for this issue. I have discussed with my leader, we understand that if choose to use multiline parsing as an unsupported feature, we’ll assume the risk ourselves.

However, our current development colleagues are indeed encountering the challenge of multi-line logs, and a significant number of these logs are generated in the default format provided by the framework. Unfortunately, developers are unable to specify that all logs should be outputted in the standardized JSON format.

In this case, how should I proceed to enable the usage of multiline parsing within the custom filter section?Do you have any reference documents or examples available?

@a-thaler
Copy link
Collaborator Author

a-thaler commented Jul 5, 2024

Hi @benzhonghai008,
if the outlined solution is acceptable for you, we will start working on enabling this custom filter. Currently, the filter is blocked as there is such bunch of limitations documented.
We would allow only one filter of that, and will place it at the first place of the filter chain always.
Is it acceptable for you to go with the built-in parsers only for now?

@a-thaler
Copy link
Collaborator Author

a-thaler commented Jul 8, 2024

Adding the feature as custom filter using built-in parsers will require these changes:

  • Remove the filter from the exclusion list
  • Assure that such filter is always the first element in the filter chain
  • Keep only the CRI parser at the tail input as anyway they had no effect

@hisarbalik
Copy link
Contributor

  • Multiline filter removed from denied plugin list
  • Multiline filter can be configured via the custom filter option of LogPipeline
  • Multiline filter will be first filter in the filter chain

@a-thaler a-thaler added this to the 1.20.0 milestone Jul 11, 2024
@a-thaler
Copy link
Collaborator Author

Tested it with following LogPipeline spec.filters definition and worked well:

  filters:
    - custom: |
         name multiline
         multiline.key_content log
         multiline.parser   java
         emitter_storage.type filesystem
         buffer on

Please remember that the "buffer setting" will enable the buffering of logs across chunks. Having it off will avoid the buffer fully but only logs can be combined which are part of the same chunk. Resulting in stacktraces divided into two log entries for example.
Setting the storage type will enable a persistent buffer, surviving restarts, which is also used for the non-multiline scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

5 participants