-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filelog receiver looses characters #31512
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Thanks for reporting @eloo-abi.
Can you clarify whether you are tailing the log files and viewing them independently of the collector, vs comparing to output of the In general, it would be helpful if we can reduce the complexity of the problem space. Are you able to capture a log file which is not being parsed correctly and then read it through a simpler configuration? This would demonstrate whether the problem is indeed due to the filelog receiver, vs some parsing or other problem. For example, something like below: exporters:
debug: {}
receivers:
filelog:
include:
- local/copy.log
include_file_path: true
# no operators
service:
pipelines:
logs:
receivers:
- filelog
exporters:
- debug |
I believe I am running into this issue too, my pipeline is very similar for extracting containerd logs from a k8s platform. I think it is related with the size of the message being sent to the regex_parser. With some testing here is what I observed. When a log event is greater than 16385 characters, it seems another log event is sent to the regex-parser which basically is trimming those characters and sending them as a separate event. Example Log Event with 16385 characters
Example Logs events with 16431 characters
Error in Log File
|
The cutoff may be related to |
Hey @djaglowski, sure thing, I'll take a look at it 👀 |
I run some tests and it seems that there is indeed a problem with the So with the following config there is no error: receivers:
filelog:
start_at: beginning
include:
- /var/log/busybox/long_files/xl_long_file.log
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
receivers: [filelog]
exporters: [debug]
processors: [] While when I add a parser it fails: receivers:
filelog:
start_at: beginning
include:
- /var/log/busybox/long_files/xl_long_file.log
operators:
- id: parser-containerd
regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: regex_parser
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
receivers: [filelog]
exporters: [debug]
processors: [] The error I see is the same from #31512 (comment):
Note that the error is no fatal, and the record is logged in the console but apparently it's not properly parsed. However, in my case it seems that it even fails for logs with less than 16385 characters like the wc -c /var/log/busybox/long_files/*.log
16385 /var/log/busybox/long_files/long_file.log
14698 /var/log/busybox/long_files/s_long_file.log
16431 /var/log/busybox/long_files/xl_long_file.log |
It seems that a log line like A proper containerd log looks like the following: 2021-06-22T10:27:25.813799277Z stdout F some log It could be the timestamp If I extend the proper containerd log to the size of @JDMooreMN could you double check this please so as we can ensure if the problem you spotted is a valid regexp missmatch of it is related with the size of the log?@JDMooreMN could you double check this please so as we can ensure if the problem you spotted is a valid regexp missmatch of it is related with the size of the log? |
As @ChrsMark has already pointed out @JDMooreMN your problem is likely with the regexp: Nevertheless I believe we need a simpler reproducer for the original issue. |
@ChrsMark @OverOrion - I am using the below containerd-parser, please use this instead. I am not getting regex related error with the entire message, but like I stated in my initial comment it is seems to be chunked, and second chunk doesn't match because it is missing the expected format. I apologize for the confusion.
|
Thank's @JDMooreMN! I'm still not able to reproduce the issue with a minimal setup. In my case the collector manages to parse the whole line successfully. Here is the config I use: receivers:
filelog:
start_at: beginning
include:
- /var/log/busybox/long_files/xl_GH_long_file.log
operators:
- id: parser-containerd
regex: ^(?P<time>.+) (?P<stream>stdout|stderr) (?P<logtag>\w) ?(?P<message>.*)
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%s%j'
parse_from: attributes.time
type: regex_parser
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
receivers: [filelog]
exporters: [debug]
processors: [] And the target file 2024-03-19T11:21:00.839338492-05:00 stdout P 2024-03-13 11:51:00,838 [scheduler-2] INFO  Where the log length is: > wc -c /var/log/busybox/long_files/xl_GH_long_file.log
16436 /var/log/busybox/long_files/xl_GH_long_file.log @JDMooreMN do I miss anything in the above scenario? |
@ChrsMark - The only difference I am seeing, is as follows. Can you try echo '<log_event>' >> xl_GH_long_file.log
|
Hey @JDMooreMN I was able to reproduce it, the key seems to be the It works fine with |
Thank's @JDMooreMN, +1 I can reproduce it as well with The first part of the long line is properly parsed and then the left over fails to get parsed. The leftover looks like: LogRecord #1
ObservedTimestamp: 2024-03-28 12:45:20.288996015 +0000 UTC
Timestamp: 1970-01-01 00:00:00 +0000 UTC
SeverityText:
SeverityNumber: Unspecified(0)
Body: Str(yfGWG8aXlbNNKW0iw2e5XVDb6RqBg7LLUAbDH5x8WM3OT424242)
Attributes:
-> log.file.name: Str(xl_GH_long_file.log)
Trace ID:
Span ID:
Flags: 0
{"kind": "exporter", "data_type": "logs", "name": "debug"} @OverOrion feel free to claim this one and let me know if I could help in any way :). |
Hey @JDMooreMN! Just opened a PR that should solve this issue. Could you take a moment to test it out on your end as well? Thanks! |
**Description:** Flush could have sent partial input before EOF was reached, this PR fixes it. **Link to tracking Issue:** #31512, #32170 **Testing:** Added unit test `TestFlushPeriodEOF` **Documentation:** Added a note to `force_flush_period` option --------- Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com> Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Since #32100 was merged it's likely that we have fixed that, right? |
Thanks for following up, @ChrsMark, and for confirming this has been resolved, @OverOrion! Closing as resolved by #32100. |
I am using otel in jitsi meet docker environment and I am getting the following error log on otel. I beleive it does not match with my docker logs as it is giving regex does not match Here is the error log and here is my /otel-collector-config.yaml file `receivers: exporters: service: |
Component(s)
receiver/filelog
What happened?
Description
Hi,
we are using the file log receiver since a few weeks to collect our kubernetes pods logs (json) and push them into elasticsearch.
But recently we have found an issue that some characters are going to be lost for unknown reasons.
We have right now only observed this with huge log entries (~800kb)
Steps to Reproduce
Expected Result
Actual Result
Here is a snippet of our log which is outputted by the otel-collector
And as you can see the "level" has no ending quotes, no the json here got corrupted.
Also we are not sure if more characters here are lost.
We had seen a different example where around 20 characters where lost (manually compared from the console log to the output of otel collector)
We are not sure what could case this issue so far as it looks like the json in console looks good.
Our log entries are having the following structure in general. And the issues occur (at least only observed there) in the "short_message" field:
Log output
No response
Additional context
Maybe we should note that we have increase the
max_log_size
from thecrio-recombine
to0
We have done this because others wise our logs would be splitted up into multiple log entries (default of the helm chart was around 100k)
So with this settings we have improved the logs in general but now we run into the missing characters issues.
The text was updated successfully, but these errors were encountered: