Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/syslog] Syslog Receiver fails to parse long messages, even with a max_log_size set #33182

Closed
sinkingpoint opened this issue May 22, 2024 · 11 comments
Assignees
Labels
bug Something isn't working receiver/syslog Stale

Comments

@sinkingpoint
Copy link
Contributor

sinkingpoint commented May 22, 2024

Component(s)

receiver/syslog

What happened?

Description

When using the syslog receiver, we can only parse messages up to the default maximum length (8192 octets), even with a max_log_size set much higher.

Steps to Reproduce

  1. Create a receiver with the provided config (note the max_log_size of 100MiB)
  2. Send a message in longer than 8192 characters
  3. Observe an error: message too long to parse. was size 40366, max length 8192

Expected Result

The message should parse properly

Actual Result

The message fails to parse

Collector version

v0.100.0

Environment information

Environment

OS: Debian Bookworm

OpenTelemetry Collector configuration

receivers:
  syslog:
    protocol: rfc5424
    enable_octet_counting: true
    tcp:
      listen_address: :4278
      max_log_size: 100000000 # 100MiB
exporters:
  debug:
service:
  pipelines:
    logs:
      receivers: [syslog]
      exporters: [debug]

Log output

{"level":"error","ts":1716356887.9432147,"caller":"helper/transformer.go:101","msg":"Failed to process entry","kind":"receiver","name":"syslog/db","data_type":"logs","operator_id":"syslog_input_internal_parser","operator_type ":"syslog_parser","error":"message too long to parse. was size 40366, max length 8192","action":"send","stacktrace":"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*TransformerOperator).HandleEntryError\\n\\tgithub.co m/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/helper/transformer.go:101\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ParseWith\\n\\tgithub.com/open-telemetry/opentelemetry-collect or-contrib/pkg/stanza@v0.99.0/operator/helper/parser.go:140\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/ope rator/helper/parser.go:112\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/syslog.(*Parser).Process\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/parser/syslog/parser.go:54\\ngithub.com/ open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/helper/writer.go:53\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/p kg/stanza/operator/input/tcp.(*Input).handleMessage\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/input/tcp/input.go:191\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/tcp.(*Input).goHan dleMessages.func1\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/input/tcp/input.go:152"}

Additional context

This seems to be because we aren't parsing a value into here: https://github.com/influxdata/go-syslog/blob/66067a10754ae90b9540d5312989ae685413c4fe/octetcounting/parser.go#L46 so we get stuck with the default limit

@sinkingpoint sinkingpoint added bug Something isn't working needs triage New item requiring triage labels May 22, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@sinkingpoint sinkingpoint changed the title [receiver/syslog] [receiver/syslog] Syslog Receiver fails to parse long messages, even with a max_log_length set May 22, 2024
@sinkingpoint sinkingpoint changed the title [receiver/syslog] Syslog Receiver fails to parse long messages, even with a max_log_length set [receiver/syslog] Syslog Receiver fails to parse long messages, even with a max_log_size set May 22, 2024
@frzifus frzifus removed the needs triage New item requiring triage label May 22, 2024
@frzifus
Copy link
Member

frzifus commented May 22, 2024

As far as I understand that part, the parser has no option the pass the maxSize information to to any parser?

type Parser struct {
helper.ParserOperator
protocol string
location *time.Location
enableOctetCounting bool
allowSkipPriHeader bool
nonTransparentFramingTrailer *string
}

Looking at this construction part non of those takes a maxsize into account.

case RFC3164:
return func(input []byte) (sl.Message, error) {
if p.allowSkipPriHeader && !priRegex.Match(input) {
input = append([]byte("<0>"), input...)
}
return rfc3164.NewMachine(rfc3164.WithLocaleTimezone(p.location)).Parse(input)
}, nil
case RFC5424:
switch {
// Octet Counting Parsing RFC6587
case p.enableOctetCounting:
return newOctetCountingParseFunc(), nil
// Non-Transparent-Framing Parsing RFC6587
case p.nonTransparentFramingTrailer != nil && *p.nonTransparentFramingTrailer == LFTrailer:
return newNonTransparentFramingParseFunc(nontransparent.LF), nil
case p.nonTransparentFramingTrailer != nil && *p.nonTransparentFramingTrailer == NULTrailer:
return newNonTransparentFramingParseFunc(nontransparent.NUL), nil
// Raw RFC5424 parsing
default:
return func(input []byte) (sl.Message, error) {
if p.allowSkipPriHeader && !priRegex.Match(input) {
input = append([]byte("<0>"), input...)
}
return rfc5424.NewMachine().Parse(input)
}, nil
}

The used version of github.com/influxdata/go-syslog/v3/rfc5424 doesnt even offer an option that can be set.

@djaglowski
Copy link
Member

max_log_size is a feature of the TCP input component, but it doesn't apply to syslog.

The used version of github.com/influxdata/go-syslog/v3/rfc5424 doesnt even offer an option that can be set.

I looked into this further and found that go-syslog justifies the hard limit based on
RFC 5425 Section 4.3.1. My reading of that section is that it is the minimum which the library should support but it is not prescriptive about it being a maximum.

@sinkingpoint
Copy link
Contributor Author

@djaglowski considering that that repo has been archived, would it make sense to fork it here?

@djaglowski
Copy link
Member

Actually I'm happy to see that the original author has recently created a fork and is making updates again! We should definitely switch in my opinion. https://github.com/leodido/go-syslog.

@andrzej-stencel
Copy link
Member

If I'm reading this correctly, the v4 release from leodido/go-syslog allows us to fix this issue, as it contains the WithMaxMessageLength function introduced in influxdata/go-syslog#39 that we can call when instatiating the parser. Is my thinking correct?

@andrzej-stencel
Copy link
Member

PR switching the dependency to the fork: #33205.

@bacherfl
Copy link
Contributor

Hi! I would like to pick this issue up if still available

@bacherfl
Copy link
Contributor

@djaglowski I went ahead and created a draft PR making use of the new option the updated library. I do have some open questions which I have added to the PR description - appreciate any feedback there

andrzej-stencel pushed a commit that referenced this issue Jul 8, 2024
**Description:** add `MaxLogSize` parameter to `syslog` parser. Note
that for this option to be available, `enable_octet_counting` needs to
be set to `true`, as this is an option that is exclusive to the
`octetcounting` parser in the [syslog
library](github.com/leodido/go-syslog).

One aspect where I'm not sure about yet is regarding the placement of
the `max_log_size` option: Right now, this option is also set within the
`TCP` input configuration, whereas this new option would be one layer
above, i.e. in the syslog base config. This would mean that this option
could potentially be set to different values in the parser and tcp input
config like for example:

```
receivers:
  syslog:
    protocol: rfc5424
    enable_octet_counting: true
    max_log_size: 200000000 # 200MiB
    tcp:
      listen_address: :4278
      max_log_size: 100000000 # 100MiB
exporters:
  debug:
service:
  pipelines:
    logs:
      receivers: [syslog]
      exporters: [debug]
```

For now I have implemented this in a way where if nothing is set if the
tcp input config, the max_log_size value from the syslog base config
will be used. If set in the tcp config, the tcp input will use that more
specific value. To me this makes the most sense right now, but I
appreciate any feedback on this.

**Link to tracking Issue:** #33182

**Testing:** so far added unit test for the syslog parser, will also add
some tests for the syslog input config to test the behavior described
above.

**Documentation:** TODO, will add once we have figured out all open
questions

---------

Signed-off-by: Florian Bacher <florian.bacher@dynatrace.com>
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 28, 2024
@djaglowski
Copy link
Member

Resolved by #33777.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/syslog Stale
Projects
None yet
Development

No branches or pull requests

5 participants