Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/journald]: add support for matches configuration #20852

Merged
merged 7 commits into from
Apr 24, 2023

Conversation

sumo-drosiek
Copy link
Member

Description:

Adds support for matches configuration, which allows to add more filtering options

Link to tracking Issue: #20295

Testing:

Unit tests, manual tests

Documentation:

Updating pkg/stanza/docs/operators/journald_input.md

@sumo-drosiek sumo-drosiek requested a review from a team April 11, 2023 08:01
@sumo-drosiek sumo-drosiek force-pushed the drosiek-journald-matches branch from a87f1f9 to 359eb81 Compare April 11, 2023 08:02
@runforesight
Copy link

runforesight bot commented Apr 11, 2023

Foresight Summary

    
Major Impacts

build-and-test-windows duration(4 seconds) has decreased 30 minutes 26 seconds compared to main branch avg(30 minutes 30 seconds).
View More Details

⭕  build-and-test-windows workflow has finished in 4 seconds (30 minutes 26 seconds less than main branch avg.) and finished at 14th Apr, 2023.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  telemetrygen workflow has finished in 1 minute 6 seconds and finished at 14th Apr, 2023.


Job Failed Steps Tests
build-dev -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details

✅  check-links workflow has finished in 1 minute 35 seconds (⚠️ 39 seconds more than main branch avg.) and finished at 14th Apr, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  changelog workflow has finished in 2 minutes 21 seconds and finished at 14th Apr, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 3 minutes 4 seconds (3 minutes 15 seconds less than main branch avg.) and finished at 14th Apr, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  N/A See Details

✅  e2e-tests workflow has finished in 12 minutes 3 seconds (1 minute 59 seconds less than main branch avg.) and finished at 14th Apr, 2023.


Job Failed Steps Tests
kubernetes-test (v1.26.0) -     🔗  N/A See Details
kubernetes-test (v1.25.3) -     🔗  N/A See Details
kubernetes-test (v1.24.7) -     🔗  N/A See Details
kubernetes-test (v1.23.13) -     🔗  N/A See Details

✅  load-tests workflow has finished in 15 minutes 43 seconds (⚠️ 5 minutes 14 seconds more than main branch avg.) and finished at 14th Apr, 2023.


Job Failed Steps Tests
setup-environment -     🔗  N/A See Details
loadtest (TestIdleMode) -     🔗  N/A See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  N/A See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  N/A See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  N/A See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  N/A See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  N/A See Details
loadtest (TestTraceAttributesProcessor) -     🔗  N/A See Details

✅  build-and-test workflow has finished in 37 minutes 13 seconds (9 minutes 22 seconds less than main branch avg.) and finished at 14th Apr, 2023.


Job Failed Steps Tests
govulncheck -     🔗  N/A See Details
setup-environment -     🔗  N/A See Details
checks -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
correctness-metrics -     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
integration-tests -     🔗  N/A See Details
correctness-traces -     🔗  N/A See Details
unittest-matrix (1.20, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.20, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.20, processor) -     🔗  N/A See Details
unittest-matrix (1.20, exporter) -     🔗  N/A See Details
unittest-matrix (1.20, extension) -     🔗  N/A See Details
unittest-matrix (1.20, connector) -     🔗  N/A See Details
unittest-matrix (1.20, internal) -     🔗  N/A See Details
unittest-matrix (1.20, other) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.19, processor) -     🔗  N/A See Details
unittest-matrix (1.19, exporter) -     🔗  N/A See Details
unittest-matrix (1.19, extension) -     🔗  N/A See Details
unittest-matrix (1.19, connector) -     🔗  N/A See Details
unittest-matrix (1.19, internal) -     🔗  N/A See Details
unittest-matrix (1.19, other) -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (connector) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
lint -     🔗  N/A See Details
unittest (1.20) -     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
cross-compile (darwin, amd64) -     🔗  N/A See Details
cross-compile (darwin, arm64) -     🔗  N/A See Details
cross-compile (linux, 386) -     🔗  N/A See Details
cross-compile (linux, amd64) -     🔗  N/A See Details
cross-compile (linux, arm) -     🔗  N/A See Details
cross-compile (linux, arm64) -     🔗  N/A See Details
cross-compile (linux, ppc64le) -     🔗  N/A See Details
cross-compile (windows, 386) -     🔗  N/A See Details
cross-compile (windows, amd64) -     🔗  N/A See Details
build-package (deb) -     🔗  N/A See Details
build-package (rpm) -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
rotate-milestone -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
@sumo-drosiek sumo-drosiek force-pushed the drosiek-journald-matches branch from 359eb81 to e9d0953 Compare April 11, 2023 08:19
@@ -33,7 +34,23 @@ The `journald_input` operator will use the `__REALTIME_TIMESTAMP` field of the j
- type: journald_input
priority: emerg..err
```
#### Simple journald input

#### Matches

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a mechanism to prevent specifying both units: and matches: configuration for a single input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure about it, as you can do it in journalctl as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing with journalctl trying to use both units and matches syntax gave unexpected results - specifically NO LOG DATA. At a minimum, there should be a warning in the docs that mixing units and matches could lead to unexpected results.

  • One Unit:
    • journalctl --utc --output=short --follow --priority info --unit ssh.service
    • Only SSH unit logs ... as expected
  • One Match Rule (with 2 terms):
    • journalctl --utc --output=short --follow --priority info _TRANSPORT=kernel PRIORITY=5
    • Only kernel logs ... as expected
  • Two Match Rules:
    • journalctl --utc --output=short --follow --priority info _SYSTEMD_UNIT=ssh.service + _TRANSPORT=kernel PRIORITY=5
    • Both log streams present ... as expected
  • Unit and Match:
    • journalctl --utc --output=short --follow --priority info --unit ssh.service _TRANSPORT=kernel PRIORITY=5
    • NO LOGS
  • Unit plus Match:
    • journalctl --utc --output=short --follow --priority info --unit ssh.service + _TRANSPORT=kernel PRIORITY=5
    • Invalid syntax: "+" can only be used between terms
  • Match and Unit:
    • journalctl --utc --output=short --follow --priority info _TRANSPORT=kernel PRIORITY=5 --unit ssh.service
    • NO LOGS
  • Match plus Unit:
    • journalctl --utc --output=short --follow --priority info _TRANSPORT=kernel PRIORITY=5 + --unit ssh.service
    • Invalid syntax: "+" can only be used between terms

Tested on Ubuntu 20.04 host with a noisy iptables LOG rule (kernel log stream) and inbound SSH probes.

Copy link
Member Author

@sumo-drosiek sumo-drosiek Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say the the behavior fully unexpected. Look at the following example:

➜  ~ journalctl --utc --output=json --no-tail --priority info --unit=user@1000.service --unit=init.scope _COMM=wireplumber + _COMM=systemd | jq '{unit: ._SYSTEMD_UNIT, comm: ._COMM} | tostring' | sort | uniq
"{\"unit\":\"init.scope\",\"comm\":\"systemd\"}"
"{\"unit\":\"user@1000.service\",\"comm\":\"systemd\"}"
"{\"unit\":\"user@1000.service\",\"comm\":\"wireplumber\"}

How I understand the behavior is:

  1. take all entries which matches the units configuration
  2. apply matches configuration to the entries which has been selected in 1

Having both matches and units can be actually helpful, if you want to select multiple units with the same additional parameter, so you can do it in the following way:

units:
- a
- b
- c
- d
matches:
- _COMM=systemd

instead of:

matches:
- _SYSTEMD_UNIT: a
  _COMM: systemd
- _SYSTEMD_UNIT: b
  _COMM: systemd
- _SYSTEMD_UNIT: c
  _COMM: systemd
- _SYSTEMD_UNIT: d
  _COMM: systemd

Anyway, I would allow either matches or units to avoid confusion and non-well documented behavior. It will be also easier to deal with it if we decide to go with jorunald gateway path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing with journalctl trying to use both units and matches syntax gave unexpected results - specifically NO LOG DATA.

I don't agree that the results are unexpected - in fact, they are exactly what I would expect. The units and matches conditions are AND-ed, same as priority and units are AND-ed in the current version of the receiver.

Anyway, I would allow either matches or units to avoid confusion and non-well documented behavior. It will be also easier to deal with it if we decide to go with jorunald gateway path

I don't agree with this. Both matches and units should be possible to set together, as this makes perfect sense - as you have described in your comment @sumo-drosiek, and as it is possible in journalctl. The argument about journald gateway doesn't make much sense, as the matches configuration as implemented is already incompatible with the journald gateway (which apparently only accepts a flat list of AND-ed conditions). We are already not compatible with the journald gateway with the matches and I think it makes most sense to allow the user to do the same thing as they can do with journalctl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument about journald gateway doesn't make much sense, as the matches configuration as implemented is already incompatible with the journald gateway (which apparently only accepts a flat list of AND-ed conditions).

We are compatible with journald-gateway and will be anyway, as we are able to get all entries based on the configuration (there is no requirement to do it in one query). Only thing is that allowing to have both matches and units introduces some logic complexity which could be avoided.

priority: info
```

will be passed to `journald` as the following arguments: `journald ... _SYSTEMD_UNIT=dbus.service + _SYSTEMD_UNIT=user@1000.service _UID=1000`
Copy link
Member

@andrzej-stencel andrzej-stencel Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a good job at explaining how the matches configuration works for someone who knows how the journalctl binary works. I think we should provide an explanation that does not depend on user's knowledge of the underlying journalctl binary.

@andrzej-stencel
Copy link
Member

json: jsoniter.ConfigFastest,
}, nil
if len(c.Matches) > 0 {
ms, err := c.buildMatchesConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ms" - is it "metrics"? is it "milliseconds"? 🤔 No, it's "matches"! 🤯

Why not just call the variable "matches"?

Dominik Rosiek added 2 commits April 13, 2023 07:57
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple typos

pkg/stanza/docs/operators/journald_input.md Outdated Show resolved Hide resolved
receiver/journaldreceiver/README.md Outdated Show resolved Hide resolved
sumo-drosiek and others added 2 commits April 13, 2023 17:20
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
@sumo-drosiek sumo-drosiek requested a review from ballensans April 13, 2023 15:21
Copy link

@ballensans ballensans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks solid to me. Thank you very much for adding this!


func (c Config) buildArgs() ([]string, error) {
// validate arguments
if len(c.Units) > 0 && len(c.Matches) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed. I don't see a reason to impose such a restriction on users.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we should update the documentation to ensure it's clear to users how the logic chain is handled:

( priority )
AND
( units[0] OR units[1] OR units[2] OR ... units[U] )
AND
( matches[0] OR matches[1] OR matches[2] OR ... matches[M] )

Also, it is worth documenting that using units prevents matching any journal entries from _TRANSPORT s which are not associated with a unit - eg. _TRANSPORT=kernel, _TRANSPORT=audit or _TRANSPORT=journal.

Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
@sumo-drosiek
Copy link
Member Author

I removed restriction of using only matches or units and added explanation how filtering works with multiple configuration options

- `_SYSTEMD_UNIT` is `ssh`
- `_SYSTEMD_UNIT` is `kubelet` and `_UID` is `1000`

Note, that if you use some fields which aren't associated with an entry, the entry will always be filtered out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth noting that this configuration will not emit ssh or systemd logs because of the AND between matches and units?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional note to explicitly say that the example configuration won't get any entries for ssh and systemd

Copy link

@ballensans ballensans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation updates do a good job of emphasizing the need for care when combining units and matches.

Copy link
Contributor

@MovieStoreGuy MovieStoreGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me so far, thank you @astencel-sumo for taking the time to look over it.
Feel free to mark it ready to merge once you're happy

Signed-off-by: Dominik Rosiek <drosiek@sumologic.com>
@sumo-drosiek sumo-drosiek force-pushed the drosiek-journald-matches branch from d500e63 to 5d3a40b Compare April 17, 2023 06:31
Copy link
Member

@andrzej-stencel andrzej-stencel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sumo-drosiek for this contribution 🙏

@djaglowski
Copy link
Member

Thanks @sumo-drosiek!

@djaglowski djaglowski merged commit 95778f2 into open-telemetry:main Apr 24, 2023
@github-actions github-actions bot added this to the next release milestone Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants