Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_syslog follows RFC 5425 even with empty structured data, lacking key-value pairs. #1043

Open
itkovian opened this issue Sep 19, 2024 · 6 comments
Labels
vrl: stdlib Changes to the standard library

Comments

@itkovian
Copy link
Contributor

We're getting a bit of junk when processing syslog lines from several applications that implement the old syslog format (RFC 3164) but are actually putting something in [] without key-value pairs.

I see in the tests that

        empty_sd_element {
            args: func_args![value: r#"<13>1 2019-02-13T19:48:34+00:00 74794bfb6795 root 8449 - [empty] qwerty"#],
            want: Ok(btreemap!{
                "message" => "qwerty",
                "appname" => "root",
                "facility" => "user",
                "hostname" => "74794bfb6795",
                "message" => "qwerty",
                "procid" => 8449,
                "severity" => "notice",
                "timestamp" => chrono::Utc.ymd(2019, 2, 13).and_hms_milli(19, 48, 34, 0),
                "version" => 1,
                "empty" => btreemap! {},
            }),
            tdef: TypeDef::object(inner_kind()).fallible(),
        }

I am wondering if this is actually the expected behaviour. What would be the rationale behind having empty structured data? Is it not more logical that these kinds of messages actually follow RFC 3164 instead of RFC 5424?

An empty structured message with information that varies wildly (e.g., a job ID (from slurm), some GC time (from Kafka)) means that the data that is shipped off to e.g. Elasticsearch will pollute the fields, as the object key may be interpreted as a fieldname with dynamic mapping turned on.

Is there a way to control this behaviour?

@jszwedko jszwedko added the vrl: stdlib Changes to the standard library label Sep 23, 2024
@jszwedko
Copy link
Member

jszwedko commented Sep 23, 2024

Hi @itkovian ! Thanks for opening this. I might be missing something, but it looks like your input is actually RFC5424 and not RFC3164:

<13>1 2019-02-13T19:48:34+00:00 74794bfb6795 root 8449 - [empty] qwerty

You can tell because it has the version specifier (1) after the severity/priority (also the timestamp is RFC5424 format and not RFC3164 format, and it has procid, msgid, etc.). In this case [empty] is interpreted as structured data per RFC5424. If it isn't structured data, and instead part of the msg, there should be an extra - preceding it to indicate that the sd-data is nil.

@itkovian
Copy link
Contributor Author

Yes, the example comes from the tests in your code :)

The problem I am facing is that several applications are logging lines like

2024-09-19T15:39:45.469435+02:00 node3521 slurmstepd[548422]: [65684352.batch] done with job

And these lack a -.

This gets parsed like so: https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoiLiB8PSBwYXJzZV9zeXNsb2chKC5tZXNzYWdlKSIsImV2ZW50Ijp7Im1lc3NhZ2UiOiIyMDI0LTA5LTE5VDE1OjM5OjQ1LjQ2OTQzNSswMjowMCBub2RlMzUyMSBzbHVybXN0ZXBkWzU0ODQyMl06IFs2NTY4NDM1Mi5iYXRjaF0gZG9uZSB3aXRoIGpvYlxuIn0sImlzX2pzb25sIjpmYWxzZSwiZXJyb3IiOm51bGx9

{
	"65684352.batch": {},
	"appname": "slurmstepd",
	"hostname": "node3521",
	"message": "done with job",
	"procid": 548422,
	"timestamp": "2024-09-19T13:39:45.469435Z"
}

This is correct according to the RFC 5424 as I understand, but I'd like to be able to force it to RFC 3164 instead of 5424.

@StephenWakely
Copy link
Contributor

Yes, the example comes from the tests in your code :)

The problem I am facing is that several applications are logging lines like

2024-09-19T15:39:45.469435+02:00 node3521 slurmstepd[548422]: [65684352.batch] done with job

And these lack a -.

This gets parsed like so: https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoiLiB8PSBwYXJzZV9zeXNsb2chKC5tZXNzYWdlKSIsImV2ZW50Ijp7Im1lc3NhZ2UiOiIyMDI0LTA5LTE5VDE1OjM5OjQ1LjQ2OTQzNSswMjowMCBub2RlMzUyMSBzbHVybXN0ZXBkWzU0ODQyMl06IFs2NTY4NDM1Mi5iYXRjaF0gZG9uZSB3aXRoIGpvYlxuIn0sImlzX2pzb25sIjpmYWxzZSwiZXJyb3IiOm51bGx9

{
	"65684352.batch": {},
	"appname": "slurmstepd",
	"hostname": "node3521",
	"message": "done with job",
	"procid": 548422,
	"timestamp": "2024-09-19T13:39:45.469435Z"
}

This is correct according to the RFC 5424 as I understand, but I'd like to be able to force it to RFC 3164 instead of 5424.

Can you post exactly what you would like the resulting message to look like. I think it should be fairly easy to add an extra parameter, but need to make sure it would do what you want.

@itkovian
Copy link
Contributor Author

Sure. I'd like it to match the RFC 3164, resulting in

{
	"appname": "slurmstepd",
	"hostname": "node3521",
	"message": "[65684352.batch] done with job",
	"procid": 548422,
	"timestamp": "2024-09-19T13:39:45.469435Z"
}

If you can tell me where in the code I should start looking, I can have a try myself.

@StephenWakely
Copy link
Contributor

StephenWakely commented Sep 25, 2024

Cool. The relevant call is here. Currently it passes Variant::Either. You will need to update it so it can also pass Variant::RFC3164 or Variant::RFC5424.

You will need to add an optional parameter here. Make it default to either.

Have a look at the level parameter for the log function to get an idea of how to setup an enum parameter that can only accept a limited set of values.

Any questions give me a shout.

@itkovian
Copy link
Contributor Author

itkovian commented Sep 25, 2024

Hmm, I see that syslog_loose also tries to get some structured data for RFC3164, even though this should not be supported (calling structured_data_optional(false) here)

I opened an issue for this: StephenWakely/syslog-loose#37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
vrl: stdlib Changes to the standard library
Projects
None yet
Development

No branches or pull requests

3 participants