-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libbeat] Improve syslog parser/processor error handling #31798
Conversation
/test |
1 similar comment
/test |
- Shifted most of the validation logic to the host language instead of the Ragel parser. This allows for the message to be fully parsed even if there are validation errors with one or more fields. - Errors are accumulated as they occur, any fields that have already been parsed and validated will be returned, alongside error.message. - Priority field for RFC 3164 messages are now optional, as some syslog providers will omit this part of the header (especially when writing to files). - Structured data fields for RFC 5424 messages are processed in a second pass. If the structured data format doesn't fit the RFC, it will be prepended to the message, separated by a space. - Structured data now uses map[string]interface{} - Update testify to v1.7.1 dependency for assert.ErrorContains support
6854691
to
dfbb75c
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
/test |
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
/package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes make sense to me.
// ErrPriority indicates a priority value is outside the acceptable range. | ||
ErrPriority = errors.New("priority value out of range (expected 0..191)") | ||
// ErrEOF indicates the message is truncated and cannot be parsed. | ||
ErrEOF = errors.New("message is truncated (unexpected EOF)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to not use io.ErrUnexpectedEOF
? It's commonly used for thing like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe EOF is the wrong terminology here. I don't want to confuse an improperly formatted/shaped message with not receiving enough data, but then again, how do you accurately tell the difference? I'd be okay with using the one from the standard library as long as it doesn't lose its meaning here (does it really matter saying "message is truncated"?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make a lot of sense either (error that is):
input:
<13> test-host this is the message
error: "parsing error at position 5: unexpected EOF"
The actual error here is that the input doesn't match the expected shape. There should be a timestamp directly after the <13>
, but for this test, it's missing. The parser is supposed to pull out the individual fields (timestamp, hostname, etc) and validation on those values happens in the host language, after the parser has run. If, however, the parser can't pull out all the fields, it fails completely. Aside from error-ing out, the only other thing I can think of is that whatever can't be parsed out will be put into the message field and we move on. This already happens, but I believe the original, whole message gets used. It seems a bit pointless to do that, though, and I would argue using something like grok instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Depending on how much you want to go into it, a way to do this might be to have (parse error([new error type]missing field(unexpected EOF))), which would end up looking like "parsing error at position 5: missing timestamp: unexpected EOF" when rendered. The logic would be to look for the first missing field when there is an io.ErrUnexpectedEOF
added by the parser and interpose the missing field error, or similar.
In terms of use of the io.ErrUnexpectedEOF
sentinel, its used generally in decoders for this kind of thing. It's best to put yourself in the mindset of a plan9 developer where everything is a file to see why.
- Add explicit wants to test cases - Un-export test input fields - Add checks for errors in timestamp tests
I'm going to let this sit for a good week or two. Beats CI is just too unstable right now to get a clean build out of it. As far as I can tell, all of the failures recently have been completely unrelated to any changes introduced in this PR. |
- Shifted most of the validation logic to the host language instead of the Ragel parser. This allows for the message to be fully parsed even if there are validation errors with one or more fields. - Errors are accumulated as they occur, any fields that have already been parsed and validated will be returned, alongside error.message. - Priority field for RFC 3164 messages are now optional, as some syslog providers will omit this part of the header (especially when writing to files). - Structured data fields for RFC 5424 messages are processed in a second pass. If the structured data format doesn't fit the RFC, it will be prepended to the message, separated by a space. - Structured data now uses map[string]interface{} - Update testify to v1.7.1 dependency for assert.ErrorContains support (cherry picked from commit cabc8ba) # Conflicts: # libbeat/processors/syslog/syslog_test.go # libbeat/reader/syslog/message.go # libbeat/reader/syslog/message_test.go # libbeat/reader/syslog/syslog.go # libbeat/reader/syslog/syslog_test.go # libbeat/reader/syslog/util_test.go
- Shifted most of the validation logic to the host language instead of the Ragel parser. This allows for the message to be fully parsed even if there are validation errors with one or more fields. - Errors are accumulated as they occur, any fields that have already been parsed and validated will be returned, alongside error.message. - Priority field for RFC 3164 messages are now optional, as some syslog providers will omit this part of the header (especially when writing to files). - Structured data fields for RFC 5424 messages are processed in a second pass. If the structured data format doesn't fit the RFC, it will be prepended to the message, separated by a space. - Structured data now uses map[string]interface{} - Update testify to v1.7.1 dependency for assert.ErrorContains support (cherry picked from commit cabc8ba)
…2118) - Shifted most of the validation logic to the host language instead of the Ragel parser. This allows for the message to be fully parsed even if there are validation errors with one or more fields. - Errors are accumulated as they occur, any fields that have already been parsed and validated will be returned, alongside error.message. - Priority field for RFC 3164 messages are now optional, as some syslog providers will omit this part of the header (especially when writing to files). - Structured data fields for RFC 5424 messages are processed in a second pass. If the structured data format doesn't fit the RFC, it will be prepended to the message, separated by a space. - Structured data now uses map[string]interface{} - Update testify to v1.7.1 dependency for assert.ErrorContains support (cherry picked from commit cabc8ba) Co-authored-by: Taylor Swanson <90622908+taylor-swanson@users.noreply.github.com>
- Shifted most of the validation logic to the host language instead of the Ragel parser. This allows for the message to be fully parsed even if there are validation errors with one or more fields. - Errors are accumulated as they occur, any fields that have already been parsed and validated will be returned, alongside error.message. - Priority field for RFC 3164 messages are now optional, as some syslog providers will omit this part of the header (especially when writing to files). - Structured data fields for RFC 5424 messages are processed in a second pass. If the structured data format doesn't fit the RFC, it will be prepended to the message, separated by a space. - Structured data now uses map[string]interface{} - Update testify to v1.7.1 dependency for assert.ErrorContains support (cherry picked from commit cabc8ba)
…2117) - Shifted most of the validation logic to the host language instead of the Ragel parser. This allows for the message to be fully parsed even if there are validation errors with one or more fields. - Errors are accumulated as they occur, any fields that have already been parsed and validated will be returned, alongside error.message. - Priority field for RFC 3164 messages are now optional, as some syslog providers will omit this part of the header (especially when writing to files). - Structured data fields for RFC 5424 messages are processed in a second pass. If the structured data format doesn't fit the RFC, it will be prepended to the message, separated by a space. - Structured data now uses map[string]interface{} - Update testify to v1.7.1 dependency for assert.ErrorContains support (cherry picked from commit cabc8ba) Co-authored-by: Taylor Swanson <90622908+taylor-swanson@users.noreply.github.com>
…) (elastic#32117) - Shifted most of the validation logic to the host language instead of the Ragel parser. This allows for the message to be fully parsed even if there are validation errors with one or more fields. - Errors are accumulated as they occur, any fields that have already been parsed and validated will be returned, alongside error.message. - Priority field for RFC 3164 messages are now optional, as some syslog providers will omit this part of the header (especially when writing to files). - Structured data fields for RFC 5424 messages are processed in a second pass. If the structured data format doesn't fit the RFC, it will be prepended to the message, separated by a space. - Structured data now uses map[string]interface{} - Update testify to v1.7.1 dependency for assert.ErrorContains support (cherry picked from commit 7ba9a7b) Co-authored-by: Taylor Swanson <90622908+taylor-swanson@users.noreply.github.com>
- Shifted most of the validation logic to the host language instead of the Ragel parser. This allows for the message to be fully parsed even if there are validation errors with one or more fields. - Errors are accumulated as they occur, any fields that have already been parsed and validated will be returned, alongside error.message. - Priority field for RFC 3164 messages are now optional, as some syslog providers will omit this part of the header (especially when writing to files). - Structured data fields for RFC 5424 messages are processed in a second pass. If the structured data format doesn't fit the RFC, it will be prepended to the message, separated by a space. - Structured data now uses map[string]interface{} - Update testify to v1.7.1 dependency for assert.ErrorContains support
What does this PR do?
Why is it important?
These changes make the parser/processor more tolerant of non-RFC compliant syslog messages. Many products/services will introduce changes that slightly deviate from RFC.
Checklist
[ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
Run unit tests in these packages:
Related issues