-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle EOF on single line content #33568
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like some more unit tests. I'm specifically worried about the case where a partial event is written to the file, the reader reads that, then the rest of event is written to the file. The current code will give us one event, because it waits for the separator. I'd like verification that the proposed code does that as well.
hello @leehinman, totally agree with that! I thought that the bug could be handled only in the aws s3 input part (see 26eb38f) then I saw the test failing and realised that it's more complicated, I've moved the PR to draft basically we have to find how to handle this and this the problem is that in the second err check, when we have so we receive an empty message in the input anyway I'm thinking about changing the approach of the bugfix and do a check in the input if the content contains newline separator at all, if not, let's not attach the linereader, what do you think about it? it's safer but going to be more expensive in terms of performance. a trade off could be to fallback to a non linereader only if zero events were emitted |
I think the root of the problem is the current linereader is expecting data to be appended, so EOF doesn't mean the last data you could ever receive, just last for now. But in the case of s3 input, we actually have all the data, so EOF means the last data you could ever receive. Could we maybe just have a flag where the line reader will treat EOF like a line separator? That way it would work for data that has no line separators and ones where the first n-1 events have line separators but the nth one doesn't. I'm having a hard time coming up with a good heuristic for determining append vs non-append case, which is why I'm leaning towards a flag. |
This pull request is now in conflicts. Could you fix it? 🙏
|
@leehinman please, look a the current solution if you are not satisfied I will add the flag for EOF my only concern in the last commit I've pushed is that I have to use a TeeReader, and that can be very expensive |
hi @leehinman , did you have the chance to review my changes? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320)
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320)
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320) Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
* handle EOF on single line content (#33568) * handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320) * changelog Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co>
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320) # Conflicts: # libbeat/reader/readfile/line.go # libbeat/reader/readfile/line_test.go # x-pack/filebeat/input/awss3/s3_objects.go
* handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage
* handle EOF on single line content (#33568) * handle EOF on single line content * changelog * fallback to encode_eof if no events in aws-s3 input * lint * lint * collect on EOF in line reader * remove encode eof * remove iterN * fix test * increase test coverage * linting * more linting * increase coverage (cherry picked from commit 7b45320) # Conflicts: # libbeat/reader/readfile/line.go # libbeat/reader/readfile/line_test.go # x-pack/filebeat/input/awss3/s3_objects.go * Fix conflicts * Fix failing test - TestMaxBytesLimit * Fix #2 failing test - TestMaxBytesLimit * Fix failing test checks * Fix linter errors * Fix typo * Fix linter errors #2 * Fix linter errors #3 * Fix linter errors #4 * Fix linter errors #5 * Changelog clean up * Change order of publish event --------- Co-authored-by: Andrea Spacca <andrea.spacca@elastic.co> Co-authored-by: Tamara Dancheva <tamara.dancheva@elastic.co>
Bug
What does this PR do?
Closes #30436
Why is it important?
Checklist
- [ ] I have commented my code, particularly in hard-to-understand areas- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs