Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: start hiding beats monitoring behind otel abstraction #15360

Merged
merged 7 commits into from
Jan 24, 2025

Conversation

kruskall
Copy link
Member

Motivation/summary

Split from #15094

part 1

Checklist

For functional changes, consider:

  • Is it observable through the addition of either logging or metrics?
  • Is its use being published in telemetry to enable product improvement?
  • Have system tests been added to avoid regression?

How to test these changes

Related issues

@kruskall kruskall requested a review from a team as a code owner January 24, 2025 01:08
Copy link
Contributor

mergify bot commented Jan 24, 2025

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.17 is the label to automatically backport to the 7.17 branch.
  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • backport-8.x is the label to automatically backport to the 8.x branch.

Copy link
Contributor

mergify bot commented Jan 24, 2025

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 24, 2025
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting this out, it's a lot more consumable!

Mostly looks good, just a few comments

internal/beater/server_test.go Show resolved Hide resolved
internal/beater/beater.go Outdated Show resolved Hide resolved
internal/beatcmd/reloader.go Outdated Show resolved Hide resolved
internal/beatcmd/beat.go Outdated Show resolved Hide resolved
axw added 2 commits January 24, 2025 10:09
- Only adapt go-docappender metrics when using the Elasticsearch output
- Increment processed events properly.
- Create object hierarchy rather than dotted metric names so _source
  remains backwards compatible. Revert system test changes.
- Report elasticsearch.indexers.active; use correct names for go-docappender
  indexer creation/destruction OTel metrics.

... and add a unit test for all of that
axw
axw previously approved these changes Jan 24, 2025
Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we'll need to make sure stack monitoring & self-instrumentation are manually tested I think.

@kruskall kruskall requested a review from a team January 24, 2025 07:39
Copy link
Contributor

@simitt simitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me; only left one minor comment.
Did review the metrics mapping in detail and nothing stood out.

@kruskall before merging, please leave a note on the PR how you tested this manually (see @axw 's comment above).

for _, dp := range data.DataPoints {
status, ok := dp.Attributes.Value(attribute.Key("status"))
if !ok {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log a warning here? Afaics this wouldn't be expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this shouldn't be happening and should be covered by tests in go-docappender since they own the metrics

@kruskall
Copy link
Member Author

LGTM, we'll need to make sure stack monitoring & self-instrumentation are manually tested I think.

To validate:

  • enable instrumentation and monitoring in apm-server config
  • start apm-server and send data with apmsoak
  • verify stack monitoring UI is working
  • verify monitoring index + instrumentation index contains data

I'm merging this

@kruskall kruskall enabled auto-merge (squash) January 24, 2025 16:14
@kruskall kruskall merged commit 3048b8f into elastic:main Jan 24, 2025
12 checks passed
@kruskall kruskall deleted the feat/hide-beats-monitoring-pt1 branch January 24, 2025 16:22
mergify bot pushed a commit that referenced this pull request Jan 24, 2025
* feat: start hiding beats monitoring behind otel abstraction

* lint: remove unused methods

* Remove MetricReader from RunnerParams

* Various fixes

- Only adapt go-docappender metrics when using the Elasticsearch output
- Increment processed events properly.
- Create object hierarchy rather than dotted metric names so _source
  remains backwards compatible. Revert system test changes.
- Report elasticsearch.indexers.active; use correct names for go-docappender
  indexer creation/destruction OTel metrics.

... and add a unit test for all of that

* Fix flaky test

---------

Co-authored-by: Andrew Wilkins <axw@elastic.co>
(cherry picked from commit 3048b8f)
carsonip added a commit that referenced this pull request Jan 27, 2025
@carsonip
Copy link
Member

for the record this introduced a regression where apmbench events/s is 0 (via expvar). @kruskall is working on a fix.

@kruskall
Copy link
Member Author

for the record this introduced a regression where apmbench events/s is 0 (via expvar). @kruskall is working on a fix.

this doesn't happen locally and I'm not able to reproduce it

@carsonip
Copy link
Member

As explained in #13738 (comment) , I'm confident that the regression is caused by this PR. @kruskall can you study how this PR interacts with EA and instrumentation config on ESS? They are suspect.

mergify bot added a commit that referenced this pull request Jan 31, 2025
…#15372)

* feat: start hiding beats monitoring behind otel abstraction

* lint: remove unused methods

* Remove MetricReader from RunnerParams

* Various fixes

- Only adapt go-docappender metrics when using the Elasticsearch output
- Increment processed events properly.
- Create object hierarchy rather than dotted metric names so _source
  remains backwards compatible. Revert system test changes.
- Report elasticsearch.indexers.active; use correct names for go-docappender
  indexer creation/destruction OTel metrics.

... and add a unit test for all of that

* Fix flaky test

---------

Co-authored-by: Andrew Wilkins <axw@elastic.co>
(cherry picked from commit 3048b8f)

Co-authored-by: kruskall <99559985+kruskall@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
@endorama endorama mentioned this pull request Feb 4, 2025
1 task
@endorama endorama self-assigned this Feb 11, 2025
@endorama endorama added v8.19.0 and removed v8.18.0 labels Feb 11, 2025
@inge4pres inge4pres mentioned this pull request Feb 11, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify test-plan test-plan-ok v8.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants