Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/avoid oom accumulation in bufftok #17293

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Mar 10, 2025

TODO to be applied on top of #17229

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • [ ]

How to test this PR locally

1 Test on stdin

Test running the same configuration as in logstash-plugins/logstash-codec-json_lines#43 against this branch, it shouldn't go in OOM.

2 Test on TCP

Use the 1Gb big one line file generated at step 1, with same heap limitation. Run Logstash with following pipeline:

input {
  tcp {
    port => 1234

    codec => json_lines {
      decode_size_limit_bytes => 32768
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

and feed data in with netcat, like in:

cat /path/to/big_single_line.json | netcat localhost 1234

3 Test with file input

Use the 1Gb big one line file generated at step 1, with same heap limitation. Run Logstash with following pipeline:

input {
  file {
    path => "/path/to/big_single_line.json"
    sincedb_path => "/tmp/sincedb"
    mode => "read"
    file_completed_action => "log"
    file_completed_log_path => "/tmp/processed.log"

    codec => json_lines {
      decode_size_limit_bytes => 32768
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Related issues

Logs

Before this PR, but with branch in #17229

[2025-03-11T14:57:28,792][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}




java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid18156.hprof ...
Heap dump file created [450806605 bytes in 0.253 secs]
[2025-03-11T14:59:11,919][FATAL][org.logstash.Logstash    ][main][59381166257a19a15b5377e6f323f1a71da7efdffecf2017efc6813660a5656e] uncaught error (in thread [main]<stdin)
java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:3541) ~[?:?]
	at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:242) ~[?:?]
	at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:587) ~[?:?]
	at java.lang.StringBuilder.append(StringBuilder.java:179) ~[?:?]
	at org.logstash.common.BufferedTokenizer$DataSplitter.append(BufferedTokenizer.java:102) ~[logstash-core.jar:?]
	at org.logstash.common.BufferedTokenizer.extract(BufferedTokenizer.java:135) ~[logstash-core.jar:?]
	at org.logstash.common.BufferedTokenizerExt.extract(BufferedTokenizerExt.java:83) ~[logstash-core.jar:?]
	at java.lang.invoke.LambdaForm$DMH/0x00000008007dd400.invokeVirtual(LambdaForm$DMH) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007f5000.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e4800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e4800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder) ~[?:?]
	at Users.andrea.workspace.logstash_plugins.logstash_minus_codec_minus_json_lines.lib.logstash.codecs.json_lines.RUBY$method$decode$0(/Users/andrea/workspace/logstash_plugins/logstash-codec-json_lines/lib/logstash/codecs/json_lines.rb:69) ~[?:?]
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x0000000800bda800.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x000000080084b400.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e8000.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e8000.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder) ~[?:?]
	at Users.andrea.workspace.logstash_andsel.vendor.bundle.jruby.$3_dot_1_dot_0.gems.logstash_minus_input_minus_stdin_minus_3_dot_4_dot_0.lib.logstash.inputs.stdin.RUBY$method$process$0(/Users/andrea/workspace/logstash_andsel/vendor/bundle/jruby/3.1.0/gems/logstash-input-stdin-3.4.0/lib/logstash/inputs/stdin.rb:61) ~[?:?]
	at java.lang.invoke.LambdaForm$DMH/0x0000000800fc1000.invokeStatic(LambdaForm$DMH) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008008dd000.invokeExact_MT(LambdaForm$MH) ~[?:?]
	at org.jruby.internal.runtime.methods.CompiledIRMethod.call(CompiledIRMethod.java:178) ~[jruby.jar:?]
	at org.jruby.internal.runtime.methods.MixedModeIRMethod.call(MixedModeIRMethod.java:222) ~[jruby.jar:?]
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:228) ~[jruby.jar:?]
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:291) ~[jruby.jar:?]
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:324) ~[jruby.jar:?]
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:66) ~[jruby.jar:?]

With this fix:

[2025-03-11T15:05:22,081][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
{
       "message" => "Payload bigger than 32768 bytes",
    "@timestamp" => 2025-03-11T14:05:22.944113Z,
          "host" => {
        "hostname" => "andreas-MBP-2.station"
    },
         "event" => {
        "original" => "orn\",\"surname\":\"Borg\"},{\"name\":\"John\",\"surname\":\"McEnroe\"},{\"name\":\"Jimmy\",\"surname\":\"Connors\"}],\"field_2621593\": [{\"name\":\"Jannik\",\"surname\":\"Sinner\"},{\"name\":\"Novak\",\"surname\":\"Djokovic\"},{\"name\":\"Rafa\",\"surname\":\"Nadal\"},{\"name\":\"Roger\",\"surname\":\"Federer\"},{\"name\":\"Pete\",\"surname\":\"Sampras\"},{\"name\":\"Andr\xC3\xA9\",\"surname\":\"Agassi\"},{\"name\":\"Rod\",\"surname\":\"Laver\"},{\"name\":\"Ivan\",\"surname\":\"Lendl\"},{\"name\":\"Bjorn\",\"surname\":\"Borg\"},{\"name\":\"John\",\"surname\":\"McEnroe\"},{\"name\":\"Jimmy\",\"surname\":\"Connors\"}]}\r\n"
    },
          "tags" => [
        [0] "_jsonparsetoobigfailure"
    ],
      "@version" => "1"
}
[2025-03-11T15:05:23,115][INFO ][logstash.javapipeline    ][main] Pipeline terminated {"pipeline.id"=>"main"}
[2025-03-11T15:05:23,599][INFO ][logstash.pipelinesregistry] Removed pipeline from registry successfully {:pipeline_id=>:main}
[2025-03-11T15:05:23,609][INFO ][logstash.runner          ] Logstash shut down.

Copy link

mergify bot commented Mar 10, 2025

This pull request does not have a backport label. Could you fix it @andsel? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.
  • If no backport is necessary, please add the backport-skip label

@andsel andsel self-assigned this Mar 10, 2025
@andsel andsel force-pushed the fix/avoid_oom_accumulation_in_bufftok branch from eea9f37 to fc3f3e4 Compare March 10, 2025 13:17
@andsel andsel force-pushed the fix/avoid_oom_accumulation_in_bufftok branch from e23c263 to 71daf70 Compare March 12, 2025 13:43
Copy link

@elasticmachine
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @andsel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Protect new implementation of BufferedTokenizer against OOM
2 participants