-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack monitoring isn't capturing APM Server metrics #7139
Comments
cc @jasonrhodes and @ruflin |
Could it be the same problem as elastic/beats#29880? There is a PR open for that: elastic/beats#30018 |
@droberts195 To me it seems like a different issue. The documents are in the index, but for some reason the visualizations are not working. If I understood correctly the Elasticsearch module issue is that the documents are not indexed at all. |
My understanding is, this is the data coming from Metricbeat inside the Elastic Agent container but NOT run by Elastic Agent. This data is shipped to the .monitoring-* index to be consumed by Stack Monitoring. I'm stating this to make sure we are all on the same page. If data is in the traces data stream only matters to confirm there was some load on the server and it should not be zero. The issues linked above are about logs and not metrics meaning it should not be related. @marclop Have you looked at the data in the .monitoring-* indices? Can you share by chance some example docs? Are the values all just 0? Lets figure out if the queries are wrong or the data is empty, this should bring us to the next step quickly. @ph @cmacknz We need to resolve the above quickly. I'm not aware of any recent change in Cloud or Beats which should cause this. |
I've reproduced this on an ECE environment and manually checked what the APM server $ curl --unix-socket /app/elastic-agent/data/tmp/es-containerhost/apm-server/apm-server.sock http:/localhost/stats
"libbeat": {
"output": {
"events": {
"acked": 0,
"active": 0,
"batches": 0,
"dropped": 0,
"duplicates": 0,
"failed": 0,
"toomany": 0,
"total": 0
},
"read": {
"bytes": 0,
"errors": 0
},
"type": "elasticsearch",
"write": {
"bytes": 0,
"errors": 0
}
},
... This seems to indicate that the issue is not on the collection side, but rather within the APM Server. I tried to reproduce this locally using the APM Server in standalone mode and it seemed to work just fine, after indexing a few thousand events I was able to get the metrics from the $ curl -s http://localhost:5066/stats | jq '.libbeat.output'
{
"events": {
"acked": 3326,
"active": 0,
"batches": 3,
"failed": 1,
"toomany": 0,
"total": 3327
},
"read": {
"bytes": 0,
"errors": 0
},
"type": "elasticsearch",
"write": {
"bytes": 0,
"errors": 0
}
} Then tried to reproduce it running under the Elastic Agent and it is easily reproducible when the APM Server is running in managed mode. From Lines 686 to 704 in 535f709
I'm not intimately familiar with how the registries work and how reloading the APM Server the way we do may affect the reference that we're using to register the modelindexer stats to, but it may be that we're losing the reference at some point? The EDIT: Oddly enough, I've injected a locally built binary from the latest |
What @marclop references in the previous message concerns only the "Output Events Rate" graph; I believe the reason the other graphs always render The requests made from kibana to get stack monitoring data are available here, under For example, The problem is the alias to the real data is missing the Kibana makes a request for "Output Events Rate" renders properly when it has data because it's actually nested under I think if the whole @klacabane @probakowski can you confirm if I'm reading and understanding the problem correctly? |
@stuartnelson3 Good catch
I'll give this a try Updated the mappings to have the |
You should be able to unexport or provide a bogus |
I believe the issues regarding libbeat metrics noted in #7139 (comment) are due to a race condition between apm-server replacing the libbeat metrics (see code in linked comment), and libbeat clearing/recreating the metrics registry when output config is reloaded (https://github.com/elastic/beats/blob/72a43be9ed23efc3dc2b371e2cadf5a7c575e429/libbeat/publisher/pipeline/module.go#L138-L146). |
I gave this a test today and was able to get everything except the response errors intake going. I triggered the output errors by marking the indices as read only
Unfortunately I get a 403 attempting to remove the block 😆 but I can just wipe the data instead. |
Opened elastic/elasticsearch#83305 for the index blocking issue I ran into |
The test script gets the "Active" rate to go up, but not the acked rate for instance graphs. The overview graph is a little odd. Total seems to trend higher than active, even though all I did was run the test app. And "Acked" is just a constant 50-ish/s I'll open up a new issue just for the above oddities. |
APM Server version (
apm-server version
):8.0.0-rc2
Description of the problem including expected versus actual behavior:
When enabling stack monitoring, most of the apm-server metrics aren't displayed in the UI. All the metrics appear to be 0.
For reference, 10000 traces have been ingested:
Steps to reproduce:
8.0.0-rc2
ELASTIC_APM_SERVER_URL
andELASTIC_APM_SECRET_TOKEN
environment variables):Provide logs (if relevant):
The text was updated successfully, but these errors were encountered: