-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type field fixes in central management #33921
Conversation
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
❕ Build Aborted
Expand to view the summary
Build stats
Test stats 🧪
Steps errorsExpand to view the steps failures
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fix things. Have you confirmed that starting Filebeat under agent with an input other than filestream
or logs
works as expected?
The winlog
input reproduced this problem reliably: elastic/elastic-agent#1800
Winlog should only run on Windows though, but I think this problem would occur with any non log
or filestream
input type like httpjson that is quite widely used: https://github.com/elastic/elastic-agent/blob/8aae0d629e3765fb2fb42e4e111df17a895a6745/specs/filebeat.spec.yml#L94
x-pack/heartbeat/cmd/root.go
Outdated
@@ -23,13 +24,22 @@ var RootCmd *cmd.BeatsRootCmd | |||
// heartbeatCfg is a callback registered via SetTransform that returns a Elastic Agent client.Unit | |||
// configuration generated from a raw Elastic Agent config | |||
func heartbeatCfg(rawIn *proto.UnitExpectedConfig, agentInfo *client.AgentInfo) ([]*reload.ConfigWithMeta, error) { | |||
//grab and properly format the input streams | |||
inputStreams, err := management.CreateInputsFromStreams(rawIn, "metrics", agentInfo) | |||
modules, err := management.CreateInputsFromStreams(rawIn, "metrics", agentInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modules, err := management.CreateInputsFromStreams(rawIn, "metrics", agentInfo) | |
modules, err := management.CreateInputsFromStreams(rawIn, "synthetics", agentInfo) |
I think synthetics
is the right default here most of the time, although defaulting to metrics
is more obviously wrong so it makes mistakes more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few other Beats where the default probably isn't right:
- Auditbeat shouldn't be
metrics
it should be logs:beats/x-pack/auditbeat/cmd/root.go
Line 32 in f1d073f
modules, err := management.CreateInputsFromStreams(rawIn, "metrics", agentInfo) - Packetbeat shouldn't be
metrics
it should be logs:beats/x-pack/packetbeat/cmd/root.go
Line 34 in f1d073f
inputStreams, err := management.CreateInputsFromStreams(rawIn, "metrics", agentInfo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, alright, good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re, synthetics
, I thought the index fields had to be one of logs
or metrics
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be, but the type
part of the datastream is definitely being set to synthetics
in some cases:
beats/heartbeat/monitors/factory.go
Line 296 in fd82372
ds.Type = "synthetics" |
We have to keep the existing behaviour here to avoid breaking things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The packetbeat and auditbeat default type still needs to be updated to from "metrics" to "logs" from what I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synthetics has its own type and always has, directly supported by ES's list of golden types (or whatever it's properly called).
Let us know if you have any other Qs about how synthetics handles stuff here.
x-pack/heartbeat/cmd/root.go
Outdated
} | ||
|
||
configList, err := management.CreateReloadConfigFromInputs(inputStreams) | ||
// Extract the stream-level type from the input | ||
typeField := strings.Split(rawIn.Type, "/")[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec files have an exhaustive list of the possible input types, perhaps we should have tests ensuring they are correctly translated into either an input type or metricset for each Beat?
For synthetics the list is here: https://github.com/elastic/elastic-agent/blob/d3496acb1fcd0da7e4f6a837c339c9b5cd174c82/specs/heartbeat.spec.yml#L3-L46
We can at least link to the spec file in a comment here to explain what we are parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems right to me. We may add a "scripted" type in the future, but for now it's just browser
http
tcp
icmp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should mention however, that browser monitors output 3 different stream types browser
browser_screenshot
and browser_network
. Does that impact your work here. So, that one input creates three separate outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should mention that those are data sets, not types. Everything goes under synthetics
for us. The full list of data set types is listed here: https://www.elastic.co/guide/en/observability/current/synthetics-manage-retention.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, heartbeat/synthetics is able to parse all of its configuration itself so at this level we just need to pass it through and let it be dealt with in Heartbeat. We actually removed looking at the type
field here in Heartbeat, as it isn't the same as Filebeat for example where the type
field at the input level is mandatory to start the right Filebeat input.
Going through all the different input configs I've seen while debugging this, it's making me a tad paranoid and I want to think about a more defensive way to do this, since the Agreed that we should have a more formalized spec for some of this too. |
The top level input type field controls which binary the agent starts, so it is predictible in that it always has to be there and be listed in the spec file. The thing to watch out for is that the agent policy supports aliases for several input types, for example there are three ways to specify that the Filebeat - name: log
aliases:
- logfile
- event/file |
@cmacknz yah, I'm referring to the stream-level |
My understanding is that we should only care about two If I look at the sample synthetics configuration we were provided, it includes a streams:
- id: ccfdfd04-7c1c-4c17-93ef-f0d22ab0a1d5
name: 8.8.8.8
type: icmp # <--- I don't think we care about this in the agent
enabled: true
data_stream:
dataset: icmp
type: synthetics # <--- We do care about this type, since it controls the target index Does that clarify things? |
@cmacknz yah, I'm referring to the
Looking at the integrations, it appears that sometimes the synthetic integrations have the stream-level type field, and sometimes they don't, which is where I assume the errors are coming from in the linked issue. Considering how this kind of inconsistency is apparently possible, I just want to make sure we're not breaking anything. |
Right, I am thinking this from the perspective of parsing the agent policy. There is the second problem of where to put the Given we have: - id: id1
type: synthetics/icmp # <--- This is mandatory to run under agent
use_output: default
data_stream:
namespace: default
streams:
- id: ccfdfd04-7c1c-4c17-93ef-f0d22ab0a1d5
name: 8.8.8.8
type: icmp # <--- This is specific to the synthetics integration configuration, likely has the same meaning as the input.type above.
enabled: true
data_stream:
dataset: icmp
type: synthetic Looking at the reference heartbeat configuration at https://www.elastic.co/guide/en/beats/heartbeat/current/heartbeat-reference-yml.html I believe both the The Heartbeat configuration format is completely different than the one for Filebeat, I don't think that just inserting the I'm not even sure that adding logic to extract type makes sense in the The primary bug for heartbeat is us parsing the |
Alright, a few last-minute changes before the weekend; I feel like we're going to be tinkering with the logic for extracting from |
Can we always log the configurations that are created at info level? Or at least the input type (when applicable) and data streams that are extracted from the Unit configurations? Without this problems in this code are undebuggable from agent diagnostics bundles. |
@cmacknz there's a PR here to aid with the debugability here: #33940 I have another upcoming PR that adds some more debug-level statements about unit config at the level of the elastic-agent-client, which might also help. Do you thing it's worth it to add more debug statements about |
I think #33940 will get us 99% of what we need. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
So, now realizing that this only works with some inputs but not others, as the syslog integrations has a type
Not sure if |
Hmm, so the agent defines the - name: log
aliases:
- logfile
- event/file We have two options:
@blakerouse do you have any opinion here? |
Yah, using type |
@cmacknz It is agents job to translate this not beats. This is a bug in agent. I will look into getting this fixed properly. |
@fearful-symmetry let's get this merged then and backported into the 8.6 branch to make the build candidate today. |
I'm a bit confused about the discussion here, it'd help to make sure we are on the same page wrt For synthetics it should always be:
The For browsers the input yml type of |
@andrewvc thanks for following up. The only thing we explicitly set here is the default At a fundamental level most of this PR is addressing bugs in how we parsed the agent input type and data stream fields out of the agent policy. For example see the resolved comment #33921 (comment) We don't actually look at or modify the data stream fields, we are mostly concerned with extracting them properly to configure processors and set the target index name. What is in the integration configuration should be what gets configured at the Beat level. We can try to set up a quick zoom chat to clarify further if needed, there is already a lot of discussion in this PR making it hard to follow. |
Alright; I fixed the test to use |
Kudos, SonarCloud Quality Gate passed! |
All tests have passed, we are just waiting for the packaging jobs to complete. Going to merge this to unblock the next 8.6 BC. |
* fix use of types in filebeat, index generation * add map checks, heartbeat * fix up per-beat processors, add new logic for data_streams * try to make linter happy * still making linter happy * change fallback types for auditbeat and packetbeat * fix test (cherry picked from commit 7bfae23)
* fix use of types in filebeat, index generation * add map checks, heartbeat * fix up per-beat processors, add new logic for data_streams * try to make linter happy * still making linter happy * change fallback types for auditbeat and packetbeat * fix test (cherry picked from commit 7bfae23) Co-authored-by: Alex K <8418476+fearful-symmetry@users.noreply.github.com>
* fix use of types in filebeat, index generation * add map checks, heartbeat * fix up per-beat processors, add new logic for data_streams * try to make linter happy * still making linter happy * change fallback types for auditbeat and packetbeat * fix test
What does this PR do?
This is a fix for elastic/elastic-agent#1807 . It fixes two different but related issues:
index
field was generated with the top-leveldata_stream
field, and not the stream-level field, which produced the wrong index string.type
.Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.