Skip to content
This repository has been archived by the owner on May 21, 2024. It is now read-only.

Allow non UTF8 characters in message #35

Merged
merged 16 commits into from
May 10, 2020

Conversation

bastjan
Copy link
Contributor

@bastjan bastjan commented May 1, 2020

MSG             = MSG-ANY / MSG-UTF8
MSG-ANY         = *OCTET ; not starting with BOM
MSG-UTF8        = BOM UTF-8-STRING

Usage

message, err := rfc5424.NewMachine(rfc5424.WithCompliantMsg()).Parse(subject)

Performance

Initialization of machine moved out of benchmark for all comparisons.

develop...allow-non-utf8

benchstat

name                                                        old time/op  new time/op  delta
Parse/[no]_empty_input__________________________________-8   409ns ± 1%   408ns ± 1%     ~     (p=0.416 n=10+10)
Parse/[no]_multiple_syslog_messages_on_multiple_lines___-8   494ns ± 0%   509ns ± 1%   +2.97%  (p=0.000 n=10+10)
Parse/[no]_impossible_timestamp_________________________-8  1.47µs ± 0%  1.48µs ± 1%   +0.82%  (p=0.000 n=10+9)
Parse/[no]_malformed_structured_data____________________-8   681ns ± 0%   707ns ± 3%   +3.90%  (p=0.000 n=9+10)
Parse/[no]_with_duplicated_structured_data_id___________-8  1.92µs ± 1%  1.99µs ± 1%   +3.38%  (p=0.000 n=10+9)
Parse/[ok]_minimal______________________________________-8   254ns ± 0%   263ns ± 1%   +3.43%  (p=0.000 n=7+10)
Parse/[ok]_average_message______________________________-8  2.67µs ± 1%  2.64µs ± 2%     ~     (p=0.271 n=10+10)
Parse/[ok]_complicated_message__________________________-8  2.12µs ± 0%  2.09µs ± 0%   -1.60%  (p=0.000 n=10+8)
Parse/[ok]_very_long_message____________________________-8  5.16µs ± 1%  4.60µs ± 0%  -10.89%  (p=0.000 n=10+8)
Parse/[ok]_all_max_length_and_complete__________________-8  5.14µs ± 1%  5.16µs ± 2%     ~     (p=0.645 n=10+10)
Parse/[ok]_all_max_length_except_structured_data_and_mes-8  3.73µs ± 0%  3.72µs ± 1%   -0.33%  (p=0.023 n=10+10)
Parse/[ok]_minimal_with_message_containing_newline______-8   287ns ± 1%   283ns ± 0%   -1.43%  (p=0.000 n=10+7)
Parse/[ok]_w/o_procid,_w/o_structured_data,_with_message-8  1.06µs ± 1%  1.02µs ± 1%   -3.66%  (p=0.000 n=8+10)
Parse/[ok]_minimal_with_UTF-8_message___________________-8   524ns ± 1%   523ns ± 0%     ~     (p=0.850 n=10+9)
Parse/[ok]_minimal_with_UTF-8_message_starting_with_BOM_-8   532ns ± 1%   526ns ± 0%   -1.27%  (p=0.000 n=10+9)
Parse/[ok]_with_structured_data_id,_w/o_structured_data_-8  1.08µs ± 0%  1.08µs ± 1%   +0.88%  (p=0.000 n=9+10)
Parse/[ok]_with_multiple_structured_data________________-8  1.77µs ± 0%  1.71µs ± 1%   -3.63%  (p=0.000 n=10+10)
Parse/[ok]_with_escaped_backslash_within_structured_data-8  1.61µs ± 1%  1.60µs ± 0%   -1.03%  (p=0.000 n=10+9)
Parse/[ok]_with_UTF-8_structured_data_param_value,_with_-8  1.86µs ± 0%  1.79µs ± 0%   -3.39%  (p=0.000 n=10+10)

develop...allow-non-utf8.WithCompliantMsg

benchstat

name                                                        old time/op  new time/op  delta
Parse/[no]_empty_input__________________________________-8   409ns ± 1%   411ns ± 1%     ~     (p=0.124 n=10+10)
Parse/[no]_multiple_syslog_messages_on_multiple_lines___-8   494ns ± 0%   511ns ± 0%   +3.38%  (p=0.000 n=10+10)
Parse/[no]_impossible_timestamp_________________________-8  1.47µs ± 0%  1.49µs ± 1%   +1.59%  (p=0.000 n=10+10)
Parse/[no]_malformed_structured_data____________________-8   681ns ± 0%   689ns ± 0%   +1.19%  (p=0.000 n=9+9)
Parse/[no]_with_duplicated_structured_data_id___________-8  1.92µs ± 1%  1.91µs ± 0%   -0.43%  (p=0.005 n=10+10)
Parse/[ok]_minimal______________________________________-8   254ns ± 0%   254ns ± 0%     ~     (p=0.103 n=7+10)
Parse/[ok]_average_message______________________________-8  2.67µs ± 1%  2.62µs ± 0%   -1.72%  (p=0.000 n=10+9)
Parse/[ok]_complicated_message__________________________-8  2.12µs ± 0%  2.08µs ± 0%   -1.76%  (p=0.000 n=10+10)
Parse/[ok]_very_long_message____________________________-8  5.16µs ± 1%  4.56µs ± 0%  -11.66%  (p=0.000 n=10+10)
Parse/[ok]_all_max_length_and_complete__________________-8  5.14µs ± 1%  5.11µs ± 0%   -0.42%  (p=0.009 n=10+10)
Parse/[ok]_all_max_length_except_structured_data_and_mes-8  3.73µs ± 0%  3.74µs ± 1%     ~     (p=0.492 n=10+10)
Parse/[ok]_minimal_with_message_containing_newline______-8   287ns ± 1%   283ns ± 0%   -1.32%  (p=0.000 n=10+10)
Parse/[ok]_w/o_procid,_w/o_structured_data,_with_message-8  1.06µs ± 1%  1.07µs ± 1%   +0.94%  (p=0.001 n=8+9)
Parse/[ok]_minimal_with_UTF-8_message___________________-8   524ns ± 1%   523ns ± 1%     ~     (p=0.597 n=10+10)
Parse/[ok]_minimal_with_UTF-8_message_starting_with_BOM_-8   532ns ± 1%   531ns ± 0%     ~     (p=0.693 n=10+8)
Parse/[ok]_with_structured_data_id,_w/o_structured_data_-8  1.08µs ± 0%  1.08µs ± 0%   +0.81%  (p=0.000 n=9+10)
Parse/[ok]_with_multiple_structured_data________________-8  1.77µs ± 0%  1.71µs ± 0%   -3.44%  (p=0.000 n=10+10)
Parse/[ok]_with_escaped_backslash_within_structured_data-8  1.61µs ± 1%  1.61µs ± 1%     ~     (p=0.926 n=10+10)
Parse/[ok]_with_UTF-8_structured_data_param_value,_with_-8  1.86µs ± 0%  1.80µs ± 1%   -2.95%  (p=0.000 n=10+9)

allow-non-utf8...allow-non-utf8.WithCompliantMsg

benchstat

name                                                        old time/op  new time/op  delta
Parse/[no]_empty_input__________________________________-8   408ns ± 1%   411ns ± 1%  +0.66%  (p=0.022 n=10+10)
Parse/[no]_multiple_syslog_messages_on_multiple_lines___-8   509ns ± 1%   511ns ± 0%  +0.39%  (p=0.017 n=10+10)
Parse/[no]_impossible_timestamp_________________________-8  1.48µs ± 1%  1.49µs ± 1%  +0.77%  (p=0.006 n=9+10)
Parse/[no]_malformed_structured_data____________________-8   707ns ± 3%   689ns ± 0%  -2.61%  (p=0.002 n=10+9)
Parse/[no]_with_duplicated_structured_data_id___________-8  1.99µs ± 1%  1.91µs ± 0%  -3.69%  (p=0.000 n=9+10)
Parse/[ok]_minimal______________________________________-8   263ns ± 1%   254ns ± 0%  -3.43%  (p=0.000 n=10+10)
Parse/[ok]_average_message______________________________-8  2.64µs ± 2%  2.62µs ± 0%    ~     (p=0.483 n=10+9)
Parse/[ok]_complicated_message__________________________-8  2.09µs ± 0%  2.08µs ± 0%    ~     (p=0.163 n=8+10)
Parse/[ok]_very_long_message____________________________-8  4.60µs ± 0%  4.56µs ± 0%  -0.87%  (p=0.000 n=8+10)
Parse/[ok]_all_max_length_and_complete__________________-8  5.16µs ± 2%  5.11µs ± 0%    ~     (p=0.190 n=10+10)
Parse/[ok]_all_max_length_except_structured_data_and_mes-8  3.72µs ± 1%  3.74µs ± 1%  +0.50%  (p=0.006 n=10+10)
Parse/[ok]_minimal_with_message_containing_newline______-8   283ns ± 0%   283ns ± 0%    ~     (p=0.353 n=7+10)
Parse/[ok]_w/o_procid,_w/o_structured_data,_with_message-8  1.02µs ± 1%  1.07µs ± 1%  +4.78%  (p=0.000 n=10+9)
Parse/[ok]_minimal_with_UTF-8_message___________________-8   523ns ± 0%   523ns ± 1%    ~     (p=0.525 n=9+10)
Parse/[ok]_minimal_with_UTF-8_message_starting_with_BOM_-8   526ns ± 0%   531ns ± 0%  +1.11%  (p=0.000 n=9+8)
Parse/[ok]_with_structured_data_id,_w/o_structured_data_-8  1.08µs ± 1%  1.08µs ± 0%    ~     (p=0.926 n=10+10)
Parse/[ok]_with_multiple_structured_data________________-8  1.71µs ± 1%  1.71µs ± 0%    ~     (p=0.181 n=10+10)
Parse/[ok]_with_escaped_backslash_within_structured_data-8  1.60µs ± 0%  1.61µs ± 1%  +1.08%  (p=0.000 n=9+10)
Parse/[ok]_with_UTF-8_structured_data_param_value,_with_-8  1.79µs ± 0%  1.80µs ± 1%  +0.45%  (p=0.000 n=10+9)

Fixes #21

bastjan and others added 5 commits May 1, 2020 13:48
Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
@leodido leodido self-requested a review May 1, 2020 18:43
Copy link
Collaborator

@leodido leodido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @bastjan! Great work :)

I've pushed some refinements on top of your work.

Before to merge this in, to avoid any confusion I'd call the option you are introducing WithCompliantMsg because (it seems to me) it basically implements (almost) all that RFC5424 paragraph 6.4 says about the MSG part of RFC5424 Syslog messages.

Which is: if this option is enabled, parse MSG as valid if they are free-form (but not starting with BOM) or a UTF-8 octet sequence starting with a BOM.

Since the default behavior (no option passed in) stays as is, if we want, we could also relax it more, which should lead to even better performances. In this case a lot of test cases should be updated, tho. :)

@bastjan
Copy link
Contributor Author

bastjan commented May 1, 2020

Thanks for the review and the refinements!

I think the name WithCompliantMsg reflects the intended option better and I'm all for it.

Since the default behavior (no option passed in) stays as is, if we want, we could also relax it more, which should lead to even better performances. In this case a lot of test cases should be updated, tho. :)

So the default (no option) would be relaxed to accept any message; WithCompliantMsg would make the parser more strict?

@leodido
Copy link
Collaborator

leodido commented May 1, 2020

Yes, exactly @bastjan :)

@bastjan
Copy link
Contributor Author

bastjan commented May 1, 2020

Sounds good to me! I could tackle updating the test cases over the next two days if you need any help.

@leodido
Copy link
Collaborator

leodido commented May 1, 2020

I can proceed right now with the option rename.

But, I'd love if you could help me with the second and last step of this PR (relaxing the default behaviour and refactoring the test cases)

Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
@bastjan
Copy link
Contributor Author

bastjan commented May 1, 2020

But, I'd love if you could help me with the second and last step of this PR (relaxing the default behaviour and refactoring the test cases)

Sure, glad to help :)

- Add tests with latin-1 encoding
- Add UTF-8 BOM to the start of UTF-8 related tests
- Test if non compliant message become compliant if option 'WithCompliantMsg' is off.
@bastjan
Copy link
Contributor Author

bastjan commented May 3, 2020

Hey @leodido,

I relaxed the default parsing, updated the tests (there are many 😅), and rerun the benchmarks in the PR description. Mind to take a look? :)

I also relaxed the builder to not create a situation where a parser->builder roundtrip would remove the MSG. Is this alright with you?

@bastjan bastjan marked this pull request as ready for review May 3, 2020 15:13
@leodido
Copy link
Collaborator

leodido commented May 10, 2020

Hey @bastjan yes, relaxing the builder was needed.

Ideally, in a second PR, we should approach the same topic for builder.

Ie., making the builder configurable (compliant message or not) and its SetMessage to work accordingly. There are already (empty) builder test cases for such scenarios.

Anyways, you did a great job, thanks for approaching issue #21 .. I'm about to push some little final refinements and then merge this 🎉

Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
Signed-off-by: Leonardo Di Donato <leodidonato@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC5424: error on non UTF-8 free-form message
2 participants