Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync up logger used in Fleet server with Agent/ Beats #25391

Closed
EricDavisX opened this issue Apr 26, 2021 · 15 comments · Fixed by #26016
Closed

Sync up logger used in Fleet server with Agent/ Beats #25391

EricDavisX opened this issue Apr 26, 2021 · 15 comments · Fixed by #26016
Assignees

Comments

@EricDavisX
Copy link
Contributor

The Fleet Server component is known to use a different logging construct. We find that in doing so it doesn't allow for easy debugging of Fleet Server / Agent because the logs will show different timestamps.

here is an example from the fleet server log:
"@timestamp":"2021-04-26T17:59:49.824Z"
/opt/Elastic/Agent/data/elastic-agent-9c3fe9/logs/default/fleet-server-json.log

and here is one from the Agent log:
"@timestamp":"2021-04-26T15:21:38.184-0400"
/opt/Elastic/Agent/data/elastic-agent-9c3fe9/logs/elastic-agent-json.log

Note that the formatting is different, besides the fact that events that seemed to happen near the same time frame were cited a being minutes apart, making the non-interleaved files much harder to match up, causing false alarms in some cases with reasonable assumptions made based on the logging information.

@EricDavisX EricDavisX changed the title sync up the loggers used in Fleet server with the rest of Beats/Agent Sync up logger used in Fleet server with Agent/ Beats Apr 26, 2021
@scunningham
Copy link

Both timestamps are RFC3339. The agent timestamp is using local timezone, the fleet server timestamp is in Zulu with microsecond precision. They should align if you take timezone into account in parsing. @EricDavisX Can you provide an example of two things happening at the same time but the stamps are minutes apart?

It would be preferable if the agent logged in zulu.

@EricDavisX
Copy link
Contributor Author

I don't have the example anymore and it was the best assessment PH and I had at the time that things were not sync'ed up. Maybe we were reading the logs wrong, a bad inference (from a repeated log line or something - it is possible it is all ok except for the minor format and time zone.

@ph
Copy link
Contributor

ph commented Apr 28, 2021

Relooking at this, Elastic Agent is the problem, let me transfer that to tbe beats repository.

@ph ph transferred this issue from elastic/fleet-server Apr 28, 2021
@ph ph added bug Team:Elastic-Agent Label for the Agent team v7.13.0 labels Apr 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@ph
Copy link
Contributor

ph commented Apr 28, 2021

@michalpristas This look like a good issue to pair with @michel-laterman?

@michalpristas
Copy link
Contributor

michalpristas commented Apr 30, 2021

now i wonder if we should align fleet-server to the rest (meaning agent and beats) or the rest to fleet-server, i would prefer changing how fleet-server logs time (https://github.com/elastic/fleet-server/blob/22d2a7651fd631c5caa145043c42c59d42931ee3/internal/pkg/logger/logger.go#L84)
what i see here is

root@Agent $ cat elastic-agent.log | head -n1
2021-04-30T11:18:01.314+0200    ERROR   cmd/watch.go:61 failed to load markeropen /Library/Elastic/Agent/data/.update-marker: no such file or directory

root@Agent $ cat data/elastic-agent-e8c73c/logs/elastic-agent-json.log | head -n1
{"log.level":"info","@timestamp":"2021-04-30T11:18:01.244+0200","log.origin":{"file.name":"warn/warn.go","file.line":18},"message":"The Elastic Agent is currently in BETA and should not be used in production","ecs.version":"1.6.0"}

root@Agent $ cat data/elastic-agent-e8c73c/logs/default/filebeat_monitor-json.log | head -n1
{"log.level":"info","@timestamp":"2021-04-30T10:01:12.314+0200","log.origin":{"file.name":"instance/beat.go","file.line":659},"message":"Beat ID: 3c95f8e5-2499-4926-8fe8-5b4610d0553d","ecs.version":"1.6.0"}

root@Agent $ cat data/elastic-agent-e8c73c/logs/default/fleet-server-json.log | head -n1
{"log.level":"info","@timestamp":"2021-04-30T09:21:41.659Z","pid":99489,"ppid":99314,"exe":"/Library/Elastic/Agent/data/elastic-agent-e8c73c/install/fleet-server-8.0.0-SNAPSHOT-darwin-x86_64/fleet-server","args":["--agent-mode","-E","logging.level=info","-E","http.enabled=true","-E","http.host=unix:///Library/Elastic/Agent/data/tmp/default/fleet-server/fleet-server.sock","-E","logging.json=true","-E","logging.ecs=true","-E","logging.files.path=/Library/Elastic/Agent/data/elastic-agent-e8c73c/logs/default","-E","logging.files.name=fleet-server-json.log","-E","logging.files.keepfiles=7","-E","logging.files.permission=0640","-E","logging.files.interval=1h","-E","path.data=/Library/Elastic/Agent/data/elastic-agent-e8c73c/run/default/fleet-server--8.0.0-SNAPSHOT"],"message":"boot"}

@scunningham
Copy link

The beats should log in rfc-3339 zulu. It is a smaller format and more concise.

@michalpristas
Copy link
Contributor

@urso can this change on beats level be considered breaking?

@ruflin
Copy link
Contributor

ruflin commented May 3, 2021

For Beats itself I would consider this a breaking change. If the log format changes, from a log collector perspective things break.

Bug if Beats logs differently when run by Elastic Agent it is a breaking change but one we could still do as we are in beta. So maybe we should adjust it when run by Elastic Agent?

@urso
Copy link

urso commented May 3, 2021

Most important is that filebeat can pick up json logs and parse the timestamps correctly. In the past the RFC-3339 Zulu was the only supported format.

@michalpristas Updated the timestamp parser. It should be able to parse both formats, right?

Bug if Beats logs differently when run by Elastic Agent it is a breaking change but one we could still do as we are in beta. So maybe we should adjust it when run by Elastic Agent?

Not sure how straight forward this is. +1 if its easy to change the timestamp format in our logger.
When Filebeat collects JSON logs it parses the timestamp into a time.Time. When shipping those logs to ES, all logs will use the same format.

@michalpristas Is the updated timestamp parser aware of timezones?

@ph ph added v7.14.0 and removed v7.13.0 labels May 3, 2021
@ph
Copy link
Contributor

ph commented May 3, 2021

I've reassigned it to 7.14, its a bug but I think it's not required for 7.13.

@ph ph assigned michalpristas and unassigned michalpristas May 4, 2021
@michalpristas
Copy link
Contributor

updated timestamp praser recognizes two formats :

  • 2006-01-02T15:04:05Z07:00
  • 2006-01-02T15:04:05.000Z0700

when log timestamp contains information about timezone, this information is not lost.
timezone info is lost with format fleet-server uses but they have a workaround in a form of transforming time to UTC before printing it.
https://play.golang.org/p/LY7R9rG2Y2V

@urso
Copy link

urso commented May 5, 2021

when log timestamp contains information about timezone, this information is not lost.
timezone info is lost with format fleet-server uses but they have a workaround in a form of transforming time to UTC before printing it.

In other words, logs indexed in Elasticsearch will all be correctly indexed with the same timestamp and timezeone (Beats also converts timestamps to UTC before indexing) and there will be no offset between Log events from Agent, Beats, and Fleet Server in Elasticsearch.

@EricDavisX As Agent is supposed to ship logs to ES, we rather might want to check the logs in Elasticsearch and not on disk for now. Minor formatting differences on disk should be no problem (and we can fix those later). But when shipping Logs, they get parsed and processed by Filebeat before shipping to ES (which might introduce other errors). In ES (Logs UI):

  • are logs with corresponding times close to each other (no time offsets)
  • Is the log format consistent (e.g. msg vs message?)
  • Are ECS fields added?

besides the fact that events that seemed to happen near the same time frame were cited a being minutes apart

I guess we got too focused on formatting here. After timestamps have been normalized, are there still big gaps between corresponding events in Agent and Fleet-Server? If so, let's file a separate issue, if you think the offsets are too big.

@EricDavisX
Copy link
Contributor Author

will do, thanks for the good discussion here.

@ph
Copy link
Contributor

ph commented May 13, 2021

We shouldn't change beats but we could change Fleet Server format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants