Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix libbeat metrics replacement #7143

Merged
merged 2 commits into from
Jan 27, 2022
Merged

Conversation

axw
Copy link
Member

@axw axw commented Jan 27, 2022

Motivation/summary

Fix a race between apm-server and libbeat code replacing the libbeat.output.events metrics registries. When the output is reloaded, libbeat clears the libbeat registry, undoing any replacement made by apm-server code. Because the libbeat code caches the libbeat metrics registry at init time, we can remove it and replace it with a new libbeat metrics registry under our own control.

Checklist

- [ ] Update CHANGELOG.asciidoc (not yet released)
- [ ] Update package changelog.yml (only if changes to apmpackage have been made)
- [ ] Documentation has been updated

How to test these changes

  1. Run apm integration
  2. Ingest some events
  3. Check libbeat metrics in /stats output, or stack monitoring

Related issues

Closes #7139

@axw axw added v8.0.0 v8.1.0 backport-8.0 Automated backport with mergify labels Jan 27, 2022
@axw axw force-pushed the modelindexer-libbeat-metrics branch from 2673eec to 4a17630 Compare January 27, 2022 10:20
@axw axw requested a review from a team January 27, 2022 10:20
@axw axw marked this pull request as ready for review January 27, 2022 10:21
@axw axw changed the title Modelindexer libbeat metrics Fix libbeat metrics replacement Jan 27, 2022
@apmmachine
Copy link
Contributor

apmmachine commented Jan 27, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Reason: null

  • Start Time: 2022-01-27T12:09:01.917+0000

  • Duration: 63 min 17 sec

  • Commit: 4a17630

Test stats 🧪

Test Results
Failed 0
Passed 5625
Skipped 20
Total 5645

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /hey-apm : Run the hey-apm benchmark.

  • /package : Generate and publish the docker images.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@axw axw enabled auto-merge (squash) January 27, 2022 11:18
@axw
Copy link
Member Author

axw commented Jan 27, 2022

/test

1 similar comment
@axw
Copy link
Member Author

axw commented Jan 27, 2022

/test

@axw axw merged commit c01f7b4 into elastic:master Jan 27, 2022
mergify bot pushed a commit that referenced this pull request Jan 27, 2022
* systemtest: test elasticsearch metrics

* beater: fix race with libbeat metrics replacement

(cherry picked from commit c01f7b4)

# Conflicts:
#	beater/beater.go
axw added a commit that referenced this pull request Jan 27, 2022
* Fix libbeat metrics replacement (#7143)

* systemtest: test elasticsearch metrics

* beater: fix race with libbeat metrics replacement

(cherry picked from commit c01f7b4)

# Conflicts:
#	beater/beater.go

* Fix merge conflict

Co-authored-by: Andrew Wilkins <axw@elastic.co>
@axw axw added the test-plan label Feb 8, 2022
@mergify
Copy link
Contributor

mergify bot commented Feb 8, 2022

This pull request is now in conflicts. Could you fix it @axw? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b modelindexer-libbeat-metrics upstream/modelindexer-libbeat-metrics
git merge upstream/master
git push upstream modelindexer-libbeat-metrics

@axw axw deleted the modelindexer-libbeat-metrics branch February 8, 2022 04:43
@marclop marclop added the backport-7.17 Automated backport with mergify to the 7.17 branch label Feb 9, 2022
@marclop
Copy link
Contributor

marclop commented Feb 9, 2022

@Mergifyio refresh

@mergify
Copy link
Contributor

mergify bot commented Feb 9, 2022

refresh

✅ Pull request refreshed

@marclop
Copy link
Contributor

marclop commented Feb 9, 2022

It seems that the current diff cannot be back ported as is to 7.17.

@stuartnelson3
Copy link
Contributor

confirmed with RC2:

start apm-integration-test:

./scripts/compose.py start 8.1.0 --bc 024d23aa  --remove-orphans

in the fleet ui:

  • create agent policy
  • copy the service token + server policy id for enrolling

create an elastic-agent.yml that enables http monitoring:

agent:
  monitoring:
    enabled: true
    use_output: default
    namespace: default
    logs: true
    metrics: true
    http:
      enabled: true
      host: localhost
      port: 6791

Start the elastic-agent using the service token, policy id, and elastic-agent.yml:

docker run --name elastic-agent-local -it \
    --env KIBANA_FLEET_SETUP=1  \
    --env FLEET_SERVER_ENABLE=1 \
    --env ELASTICSEARCH_HOST="http://localhost:9200" \
    --env FLEET_SERVER_SERVICE_TOKEN=$TOKEN \
    --env FLEET_SERVER_POLICY_ID=$POLICY_ID \
    --env FLEET_INSECURE=1 \
    --env FLEET_ENROLL=1  \
    --env KIBANA_FLEET_HOST="http://localhost:5601" \
    --env KIBANA_FLEET_PASSWORD="changeme" \
    --env KIBANA_FLEET_USERNAME="admin"  \
    --env ELASTIC_APM_LOG_LEVEL=debug \
    --env ELASTIC_APM_LOG_FILE=stderr \
    --network host \
    -v $(pwd)/elastic-agent.yml:/usr/share/elastic-agent/elastic-agent.yml \
    --rm docker.elastic.co/beats/elastic-agent:8.1.0

add the apm-integration to the agent policy

confirm the apm-server has been started: curl http://localhost:6791/processes | jq .

start metricbeat with the following modules.d/beat-xpack.yml:

# modules.d/beat-xpack.yml
- module: beat
  xpack.enabled: true
  period: 10s
  hosts: ["http://localhost:6791"]
  basepath: "/processes/apm-server-default"

navigate to the stack monitoring ui

ingest events with the following program:

package main

import (
	"context"
	"fmt"
	"os"
	"time"

	"go.elastic.co/apm"
)

func main() {
	version := "undefined"
	if len(os.Args) > 1 {
		version = os.Args[1]
	}
	name := fmt.Sprintf("apm-server-%s", version)

	for i := 0; i < 1000; i++ {
		tx := apm.DefaultTracer.StartTransaction(name, "type")
		ctx := apm.ContextWithTransaction(context.Background(), tx)
		span, ctx := apm.StartSpan(ctx, name, "type")
		span.Duration = time.Second
		span.End()
		tx.Duration = 2 * time.Second
		tx.End()
		<-time.After(time.Millisecond)
	}
	<-time.After(time.Second)
	apm.DefaultTracer.Flush(nil)
	fmt.Printf("%s: %+v\n", name, apm.DefaultTracer.Stats())
}

verify that events appear in the output events rate graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport with mergify to the 7.17 branch backport-8.0 Automated backport with mergify test-plan test-plan-ok v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stack monitoring isn't capturing APM Server metrics
5 participants