-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza/fileconsumer] Fix long line parsing #32100
Conversation
9683fd8
to
954249f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @OverOrion and @ChrsMark for finding this. Unfortunately, I do not believe this implementation works.
The underlying array may point to data that will be overwritten by a subsequent call to Scan.
This certainly seems to be a problem. To articulate the issue a bit more, we are not calling Scan again until we emit the token. However, because the token is emitted as a slice which directly references the scanner's buffer, it's contents may change later.
Possible solutions then would seem to be:
- Copy the token into a new slice to ensure the underlying contents will not change.
- Clearly advise emit funcs that they may need to do this, depending on their need for correctness vs performance. (And then handle the copy in our emit funcs)
I think it's fair of us to prioritize correctness but it will be interesting to see the performance impact associated with copying every token. To that point, we will certainly need unit tests for this as well as benchmark results to understand the tradeoff we are introducing.
Longer term, we may want to look at replacing the scanner altogether, as it's not necessarily the most performant solution. It does however provide a high degree of robustness that will be difficult to reproduce.
224b323
to
36c8f1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, finally starting to catch up.
Thanks @djaglowski @ChrsMark, will address this today! |
Flush should only happen if the scanner reached EOF Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>
36c8f1d
to
c8e274b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just correcting the description a bit, since we update the flush timeout whenever new data is found, even if we do not emit a token.
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Description:
Flush could have sent partial input before EOF was reached, this PR fixes it.
Link to tracking Issue: #31512, #32170
Testing: Added unit test
TestFlushPeriodEOF
Documentation: Added a note to
force_flush_period
option