BufferedTokenizerExt applies sizeLimit check only of first token of input fragment


`BufferedTokenizerExt` throw an exception when the token discovered is bigger than `sizeLimit` parameter. However, given the existing implementation the check is executed only on the first token present in the input fragment, this means that if it's the second token the one that exceed no error is raised:

https://github.com/elastic/logstash/blob/32cc85b9a77f9566b4c8a5e76174293057bfa50e/logstash-core/src/main/java/org/logstash/common/BufferedTokenizerExt.java#L85-L88

While the implementation could be considered buggy on this aspect, it can be avoided selecting a `sizeLimit` which is bigger than length of input fragment. This is related to the context where the tokenizer is used, considering the actual code base it's used with `sizeLimit` only in json_lines codec:

https://github.com/logstash-plugins/logstash-codec-json_lines/blob/f4e4e004a30bad731826cdb10f94f012c1ad28d8/lib/logstash/codecs/json_lines.rb#L63

This means that problem appear depending in which input the codec is used.
If used with TCP input https://github.com/logstash-plugins/logstash-input-tcp/blob/e5ef98f781ab921b6a1ef3bb1095d597e409ea86/lib/logstash/inputs/tcp.rb#L215 the `decode_buffer` uses the codec passing the buffer read from socket, which for TCP could be a fragment of 64Kb. 

To grab a more practical view of this issue check https://github.com/elastic/logstash/pull/16968#issuecomment-2630594425


### Ideal solution
To solve this problem, the BufferedTokenizer 's extract method should return an iterator and not array (or list). The iterator should apply the boundary check on each next invocation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BufferedTokenizerExt applies sizeLimit check only of first token of input fragment #17017

Ideal solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	final int entitiesSize = ((RubyString) entities.first()).size();
	if (inputSize + entitiesSize > sizeLimit) {
	throw new IllegalStateException("input buffer full");
	}

BufferedTokenizerExt applies sizeLimit check only of first token of input fragment #17017

Description

Ideal solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions