Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Size limit bytes #43

Merged
merged 4 commits into from
Aug 29, 2024
Merged

Size limit bytes #43

merged 4 commits into from
Aug 29, 2024

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Aug 27, 2024

Release notes

Add decode_size_limit_bytes option to limit the maximum size of JSON document that can be parsed

What does this PR do?

Leverages the second parameter of BufferTokenizerExt https://github.com/elastic/logstash/blob/6e93b30c7fd809e148c1c1472954c1c56fbcd994/logstash-core/src/main/java/org/logstash/common/BufferedTokenizerExt.java#L58 to thrown an IllegalStateException when the size of the line to parse is bigger than decode_size_limit_bytes

Why is it important/What is the impact to the user?

As a user of json_lines codec I don't want that a single big line could generate an OOM error and kill the Logstash process.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • [ ] I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • test in real context.

How to test this PR locally

Used 1 line big json file (~1GB), limited the Java heap to 512Mb, and processed with a file input plugin. It goes in OOM

Generate one big json file

Use the script to generate it:

require "json"

part = [ 
    {:name => "Jannik", :surname => "Sinner"}, 
    {:name => "Novak", :surname => "Djokovic"}, 
    {:name => "Rafa", :surname => "Nadal"}, 
    {:name => "Roger", :surname => "Federer"}, 
    {:name => "Pete", :surname => "Sampras"}, 
    {:name => "André", :surname => "Agassi"}, 
    {:name => "Rod", :surname => "Laver"}, 
    {:name => "Ivan", :surname => "Lendl"}, 
    {:name => "Bjorn", :surname => "Borg"}, 
    {:name => "John", :surname => "McEnroe"}, 
    {:name => "Jimmy", :surname => "Connors"}
]
 
json_part = JSON.generate(part)
out_file = File.open("big_single_line.json", "a")
out_file.write "{"

counter = 1
desired_size = 1024 * 1024 * 1024
actual_size = 0
while actual_size < desired_size do
  json_fragment = "\"field_#{counter}\": #{json_part}"
  actual_size += json_fragment.size
  if actual_size < desired_size
  	json_fragment += ","
  end
  counter += 1
  out_file.write json_fragment
end
out_file.write "}\r\n"
out_file.flush

puts "Done! output file is #{out_file.size} bytes"
out_file.close

Configure Logstash

In config/jvm.options set

-Xms512m
-Xmx512m

and execute the pipeline

input {
  stdin {
    codec => json_lines {
      decode_size_limit_bytes => 32768
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Configure this patch PR, in Gemfile
replace

"logstash-codec-json_lines"

with

"logstash-codec-json_lines", :path => "/Users/andrea/workspace/logstash_plugins/logstash-codec-json_lines"

and execute

bin/logstash-plugin install --no-verify

Run with:

cat /path/to/big_single_line.json | bin/logstash -f /path/to/test_oom_pipeline.conf

Attention that testing with logstash-input-file we occur in a similar problem, but that happens before the codec is invoked logstash-plugins/logstash-input-file#210

Related issues

Logs

The console output with this PR

[2024-08-28T15:47:34,704][ERROR][logstash.javapipeline    ][main][1ecda24c09fdc5ba076096bc6e7499b710cb91e796741106f9e28599ed6a58a0] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::Stdin codec=><LogStash::Codecs::JSONLines decode_size_limit_bytes=>32768, id=>"fcb301d1-ea33-47a2-8bcc-c0373640fc5b", enable_metric=>true, charset=>"UTF-8", delimiter=>"\n">, id=>"1ecda24c09fdc5ba076096bc6e7499b710cb91e796741106f9e28599ed6a58a0", enable_metric=>true>
  Error: input buffer full
  Exception: Java::JavaLang::IllegalStateException
  Stack: org.logstash.common.BufferedTokenizerExt.extract(BufferedTokenizerExt.java:83)
org.logstash.common.BufferedTokenizerExt$INVOKER$i$1$0$extract.call(BufferedTokenizerExt$INVOKER$i$1$0$extract.gen)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:242)
...

The error repeats but doesn't kill the pipeline.

Without this PR the Logstash process is killed by OOM like in:

[2024-08-28T16:08:31,156][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2024-08-28T16:08:31,170][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid13170.hprof ...
Heap dump file created [1138986286 bytes in 1.088 secs]
[2024-08-28T16:08:35,325][FATAL][org.logstash.Logstash    ][main] uncaught error (in thread [main]>worker2)
java.lang.OutOfMemoryError: Java heap space
[2024-08-28T16:08:34,869][FATAL][org.logstash.Logstash    ][main][1421edd140d308215cd30e8c9b1d2188836aa21a1698375f4eb6906ffc6e302b] uncaught error (in thread [main]<stdin)
java.lang.OutOfMemoryError: Java heap space
2024-08-28 16:08:35,328 [main]>worker10 ERROR An exception occurred processing Appender plain_console org.apache.logging.log4j.core.appender.AppenderLoggingException: java.lang.OutOfMemoryError: Java heap space
	at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:165)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:134)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:125)
	at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:89)
	at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:675)
	at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:633)
	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:616)
	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:552)
	at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:82)
	at org.apache.logging.log4j.core.Logger.log(Logger.java:161)
	at org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2205)
	at org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2159)
	at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2142)
	at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2017)
	at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1983)
	at org.apache.logging.log4j.spi.AbstractLogger.fatal(AbstractLogger.java:1063)
	at org.logstash.Logstash.handleFatalError(Logstash.java:109)
	at org.logstash.Logstash.lambda$installGlobalUncaughtExceptionHandler$0(Logstash.java:101)
	at java.base/java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1082)
	at java.base/java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1077)
	at java.base/java.lang.Thread.dispatchUncaughtException(Thread.java:2017)
Caused by: java.lang.OutOfMemoryError: Java heap space

@andsel andsel self-assigned this Aug 27, 2024
@andsel andsel linked an issue Aug 28, 2024 that may be closed by this pull request
@andsel andsel marked this pull request as ready for review August 28, 2024 14:46
Co-authored-by: Mashhur <99575341+mashhurs@users.noreply.github.com>
@andsel andsel requested a review from mashhurs August 28, 2024 15:14
Copy link
Contributor

@mashhurs mashhurs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add max_line_size option
4 participants