Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/journald]: add support for matches configuration #20852

Merged
merged 7 commits into from
Apr 24, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .chloggen/drosiek-journald-matches.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: journaldreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: add support for `matches` configuration

# One or more tracking issues related to the change
issues: [20295]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
19 changes: 18 additions & 1 deletion pkg/stanza/docs/operators/journald_input.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ The `journald_input` operator will use the `__REALTIME_TIMESTAMP` field of the j
| `directory` | | A directory containing journal files to read entries from. |
| `files` | | A list of journal files to read entries from. |
| `units` | | A list of units to read entries from. |
| `matches` | | A list of matches to read entries from. See [Matches](#matches) example. |
| `priority` | `info` | Filter output by message priorities or priority ranges. |
| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. |
| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. |
Expand All @@ -33,7 +34,23 @@ The `journald_input` operator will use the `__REALTIME_TIMESTAMP` field of the j
- type: journald_input
priority: emerg..err
```
#### Simple journald input

#### Matches

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a mechanism to prevent specifying both units: and matches: configuration for a single input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure about it, as you can do it in journalctl as well

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing with journalctl trying to use both units and matches syntax gave unexpected results - specifically NO LOG DATA. At a minimum, there should be a warning in the docs that mixing units and matches could lead to unexpected results.

  • One Unit:
    • journalctl --utc --output=short --follow --priority info --unit ssh.service
    • Only SSH unit logs ... as expected
  • One Match Rule (with 2 terms):
    • journalctl --utc --output=short --follow --priority info _TRANSPORT=kernel PRIORITY=5
    • Only kernel logs ... as expected
  • Two Match Rules:
    • journalctl --utc --output=short --follow --priority info _SYSTEMD_UNIT=ssh.service + _TRANSPORT=kernel PRIORITY=5
    • Both log streams present ... as expected
  • Unit and Match:
    • journalctl --utc --output=short --follow --priority info --unit ssh.service _TRANSPORT=kernel PRIORITY=5
    • NO LOGS
  • Unit plus Match:
    • journalctl --utc --output=short --follow --priority info --unit ssh.service + _TRANSPORT=kernel PRIORITY=5
    • Invalid syntax: "+" can only be used between terms
  • Match and Unit:
    • journalctl --utc --output=short --follow --priority info _TRANSPORT=kernel PRIORITY=5 --unit ssh.service
    • NO LOGS
  • Match plus Unit:
    • journalctl --utc --output=short --follow --priority info _TRANSPORT=kernel PRIORITY=5 + --unit ssh.service
    • Invalid syntax: "+" can only be used between terms

Tested on Ubuntu 20.04 host with a noisy iptables LOG rule (kernel log stream) and inbound SSH probes.

Copy link
Member Author

@sumo-drosiek sumo-drosiek Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say the the behavior fully unexpected. Look at the following example:

➜  ~ journalctl --utc --output=json --no-tail --priority info --unit=user@1000.service --unit=init.scope _COMM=wireplumber + _COMM=systemd | jq '{unit: ._SYSTEMD_UNIT, comm: ._COMM} | tostring' | sort | uniq
"{\"unit\":\"init.scope\",\"comm\":\"systemd\"}"
"{\"unit\":\"user@1000.service\",\"comm\":\"systemd\"}"
"{\"unit\":\"user@1000.service\",\"comm\":\"wireplumber\"}

How I understand the behavior is:

  1. take all entries which matches the units configuration
  2. apply matches configuration to the entries which has been selected in 1

Having both matches and units can be actually helpful, if you want to select multiple units with the same additional parameter, so you can do it in the following way:

units:
- a
- b
- c
- d
matches:
- _COMM=systemd

instead of:

matches:
- _SYSTEMD_UNIT: a
  _COMM: systemd
- _SYSTEMD_UNIT: b
  _COMM: systemd
- _SYSTEMD_UNIT: c
  _COMM: systemd
- _SYSTEMD_UNIT: d
  _COMM: systemd

Anyway, I would allow either matches or units to avoid confusion and non-well documented behavior. It will be also easier to deal with it if we decide to go with jorunald gateway path

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing with journalctl trying to use both units and matches syntax gave unexpected results - specifically NO LOG DATA.

I don't agree that the results are unexpected - in fact, they are exactly what I would expect. The units and matches conditions are AND-ed, same as priority and units are AND-ed in the current version of the receiver.

Anyway, I would allow either matches or units to avoid confusion and non-well documented behavior. It will be also easier to deal with it if we decide to go with jorunald gateway path

I don't agree with this. Both matches and units should be possible to set together, as this makes perfect sense - as you have described in your comment @sumo-drosiek, and as it is possible in journalctl. The argument about journald gateway doesn't make much sense, as the matches configuration as implemented is already incompatible with the journald gateway (which apparently only accepts a flat list of AND-ed conditions). We are already not compatible with the journald gateway with the matches and I think it makes most sense to allow the user to do the same thing as they can do with journalctl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument about journald gateway doesn't make much sense, as the matches configuration as implemented is already incompatible with the journald gateway (which apparently only accepts a flat list of AND-ed conditions).

We are compatible with journald-gateway and will be anyway, as we are able to get all entries based on the configuration (there is no requirement to do it in one query). Only thing is that allowing to have both matches and units introduces some logic complexity which could be avoided.


The following configuration:

```yaml
- type: journald_input
matches:
- _SYSTEMD_UNIT: ssh
- _SYSTEMD_UNIT: kubelet
_UID: "1000"
sumo-drosiek marked this conversation as resolved.
Show resolved Hide resolved
priority: info
```

will be passed to `journald` as the following arguments: `journald ... _SYSTEMD_UNIT=dbus.service + _SYSTEMD_UNIT=user@1000.service _UID=1000`
sumo-drosiek marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

@andrzej-stencel andrzej-stencel Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a good job at explaining how the matches configuration works for someone who knows how the journalctl binary works. I think we should provide an explanation that does not depend on user's knowledge of the underlying journalctl binary.


### Simple journald input

Configuration:
```yaml
Expand Down
93 changes: 77 additions & 16 deletions pkg/stanza/operator/input/journald/journald.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ import (
"fmt"
"io"
"os/exec"
"regexp"
"sort"
"strconv"
"sync"
"time"
Expand Down Expand Up @@ -60,20 +62,42 @@ func NewConfigWithID(operatorID string) *Config {
type Config struct {
helper.InputConfig `mapstructure:",squash"`

Directory *string `mapstructure:"directory,omitempty"`
Files []string `mapstructure:"files,omitempty"`
StartAt string `mapstructure:"start_at,omitempty"`
Units []string `mapstructure:"units,omitempty"`
Priority string `mapstructure:"priority,omitempty"`
Directory *string `mapstructure:"directory,omitempty"`
Files []string `mapstructure:"files,omitempty"`
StartAt string `mapstructure:"start_at,omitempty"`
Units []string `mapstructure:"units,omitempty"`
Priority string `mapstructure:"priority,omitempty"`
Matches []MatchConfig `mapstructure:"matches,omitempty"`
}

type MatchConfig map[string]string

// Build will build a journald input operator from the supplied configuration
func (c Config) Build(logger *zap.SugaredLogger) (operator.Operator, error) {
inputOperator, err := c.InputConfig.Build(logger)
if err != nil {
return nil, err
}

args, err := c.buildArgs()
if err != nil {
return nil, err
}

return &Input{
InputOperator: inputOperator,
newCmd: func(ctx context.Context, cursor []byte) cmd {
if cursor != nil {
args = append(args, "--after-cursor", string(cursor))
}
return exec.CommandContext(ctx, "journalctl", args...) // #nosec - ...
// journalctl is an executable that is required for this operator to function
},
json: jsoniter.ConfigFastest,
}, nil
}

func (c Config) buildArgs() ([]string, error) {
args := make([]string, 0, 10)

// Export logs in UTC time
Expand Down Expand Up @@ -108,17 +132,54 @@ func (c Config) Build(logger *zap.SugaredLogger) (operator.Operator, error) {
}
}

return &Input{
InputOperator: inputOperator,
newCmd: func(ctx context.Context, cursor []byte) cmd {
if cursor != nil {
args = append(args, "--after-cursor", string(cursor))
}
return exec.CommandContext(ctx, "journalctl", args...) // #nosec - ...
// journalctl is an executable that is required for this operator to function
},
json: jsoniter.ConfigFastest,
}, nil
if len(c.Matches) > 0 {
ms, err := c.buildMatchesConfig()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ms" - is it "metrics"? is it "milliseconds"? 🤔 No, it's "matches"! 🤯

Why not just call the variable "matches"?

if err != nil {
return nil, err
}
args = append(args, ms...)
}

return args, nil
}

func buildMatchConfig(mc MatchConfig) ([]string, error) {
re := regexp.MustCompile("^[_A-Z]+$")

// Sort keys to be consistent with every run and to be predictable for tests
sortedKeys := make([]string, 0, len(mc))
for key := range mc {
if !re.MatchString(key) {
return []string{}, fmt.Errorf("'%s' is not a valid Systemd field name", key)
}
sortedKeys = append(sortedKeys, key)
}
sort.Strings(sortedKeys)

configs := []string{}
for _, key := range sortedKeys {
configs = append(configs, fmt.Sprintf("%s=%s", key, mc[key]))
}

return configs, nil
}

func (c Config) buildMatchesConfig() ([]string, error) {
matches := []string{}

for i, mc := range c.Matches {
if i > 0 {
matches = append(matches, "+")
}
ms, err := buildMatchConfig(mc)
if err != nil {
return []string{}, err
}

matches = append(matches, ms...)
}

return matches, nil
}

// Input is an operator that process logs using journald
Expand Down
68 changes: 68 additions & 0 deletions pkg/stanza/operator/input/journald/journald_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/mock"
"github.com/stretchr/testify/require"

Expand Down Expand Up @@ -115,3 +116,70 @@ func TestInputJournald(t *testing.T) {
require.FailNow(t, "Timed out waiting for entry to be read")
}
}

func TestBuildConfig(t *testing.T) {
testCases := []struct {
Name string
Config func(cfg *Config)
Expected []string
ExpectedError string
}{
{
Name: "empty config",
Config: func(cfg *Config) {},
Expected: []string{"--utc", "--output=json", "--follow", "--priority", "info"},
},
{
Name: "units",
Config: func(cfg *Config) {
cfg.Units = []string{
"dbus.service",
"user@1000.service",
}
},
Expected: []string{"--utc", "--output=json", "--follow", "--unit", "dbus.service", "--unit", "user@1000.service", "--priority", "info"},
},
{
Name: "matches",
Config: func(cfg *Config) {
cfg.Matches = []MatchConfig{
{
"_SYSTEMD_UNIT": "dbus.service",
},
{
"_UID": "1000",
"_SYSTEMD_UNIT": "user@1000.service",
},
}
},
Expected: []string{"--utc", "--output=json", "--follow", "--priority", "info", "_SYSTEMD_UNIT=dbus.service", "+", "_SYSTEMD_UNIT=user@1000.service", "_UID=1000"},
},
{
Name: "invalid match",
Config: func(cfg *Config) {
cfg.Matches = []MatchConfig{
{
"-SYSTEMD_UNIT": "dbus.service",
},
}
},
ExpectedError: "'-SYSTEMD_UNIT' is not a valid Systemd field name",
},
}

for _, tt := range testCases {
t.Run(tt.Name, func(t *testing.T) {
cfg := NewConfigWithID("my_journald_input")
tt.Config(cfg)
args, err := cfg.buildArgs()

if tt.ExpectedError != "" {
require.Error(t, err)
require.ErrorContains(t, err, tt.ExpectedError)
return
}
require.NoError(t, err)
assert.Equal(t, tt.Expected, args)
})
}
}