Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telegraf logparser sometimes doesn't realize new logfile creation or starts parsing in the middle of the file #9664

Closed
bolek2000 opened this issue Mar 30, 2018 · 1 comment

Comments

@bolek2000
Copy link

Bug report

InfluxDB 1.5.1, telgraf 1.5.3, Ubuntu 16.04

  1. I have big nginx access.log files (GBs) which are partly unchronoligical, because of buffering (see: Logparser plugin process half of lines when nginx access log buffer is enabled telegraf#3613)
    All log entries are in combined logformat with some custom additions
  2. telegraf can't handle unchronological data properly, so to get all data correctly into InfluxDB, I pre-sort the files with a script e.g. with sort --stable -k 4,4 $UNSORTED > $SORTED. This command overwrites the old logfile I created the day before.
  3. telegraf logparser is configured to listen on the logfile $SORTED ( files = /var/log/... )
  4. I use from_beginning = true in logparser config

Before I upgraded to 1.5.0 of telegraf, this workflow worked pretty good and the logparser realized, that a new logfile was written and started ingesting the data. It might have happened once or twice in several month that it did not realize the new data in place and did nothing, but I cannot say for sure.

Expected behavior:
When I overwrite the logfile, logparser/telegraf realizes it and starts parsing the file.
Actual behavior:
Multiple times in the last 2 weeks the parsing seems to start at some point in the file (not the beginning) and strange entries in telegraf.log indicate that something went wrong.
When I restart the telegraf daemon, then it ingests the file correctly from the beginning.

Additional info:

2018-03-29T04:45:01Z D! Output [socket_writer] buffer fullness: 55 / 10000 metrics. 
2018-03-29T04:45:01Z D! Output [socket_writer] wrote batch of 55 metrics in 1.076344ms
2018-03-29T04:45:11Z D! Output [socket_writer] buffer fullness: 55 / 10000 metrics. 
2018-03-29T04:45:11Z D! Output [socket_writer] wrote batch of 55 metrics in 233.234µs

Logfile creation (Log file starts with [28/Mar/2018:00:00:00 +0200]) here the Grok no match only includes part of a log message
2018-03-29T04:45:18Z D! Grok no match found for: "5.1.1; SM-J200M Build/LMY47X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.109 Mobile Safari/537.36\""
Then it reads 1000+ metrics followed by incomplete "no match" line again

2018-03-29T04:45:19Z D! Output [socket_writer] wrote batch of 1000 metrics in 3.688691ms
2018-03-29T04:45:20Z D! Output [socket_writer] buffer fullness: 368 / 10000 metrics. 
2018-03-29T04:45:20Z D! Output [socket_writer] wrote batch of 368 metrics in 1.828269ms
2018-03-29T04:45:20Z D! Grok no match found for: ",623,631,700,837,837,842,2270,2270,2271,266788&nc=undefined HTTP/2.0\" 204 184 \"https://www......

Then I can see that, shortly after starting the parsing, logparser is already near the end of the file / day. Usually it takes at least an hour to parse the whole file
2018-03-29T04:45:20Z D! Grok no match found for: "11.22.33.44 - - [28/Mar/2018:23:10:45 +0200]....

@dgnorton
Copy link
Contributor

@bolek2000 This is a telegraf issue and appears to be a duplicate of influxdata/telegraf#3906. Closing this issue in favor of that one but I'll make sure that issue links to this one. Thanks for reporting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants