Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cmd/mdatagen] Introduce metric groups #19440

Closed
dmitryax opened this issue Mar 10, 2023 · 6 comments
Closed

[cmd/mdatagen] Introduce metric groups #19440

dmitryax opened this issue Mar 10, 2023 · 6 comments

Comments

@dmitryax
Copy link
Member

dmitryax commented Mar 10, 2023

Problem

Currently, the hostmetrics receiver has several scrapers, each of them having its metadata.yaml file. There are a couple of problems with this approach:

  • It prevents the adoption of other receiver-specific metadata fields that require only one metadata.yaml file per receiver Move README status table to separate machine-readable file #19175

  • The current hostmetrics receiver scrapers configuration interface doesn't allow overriding values, e.g., "disable a scraper that is enabled in a config that I want to override". It's a pretty important feature in the helm chart where the host metrics collection is enabled by one option, but the default set of scrapers cannot be reduced, so users have to disable all metrics in the detectors that they don't need one by one.

Proposed solution

Because of these issues and to solve other user needs to be described below, I'm suggesting another interface for scraping receivers with metrics that can be split into meaningful groups. Each group can be disabled/enabled like it's currently done for metrics.

metadata.yaml interface

metadata.yaml interface will get an extra field called metric_groups and all the metrics definitions will be moved under particular groups.

  • Each metric group has a set of metrics assigned to the group under metrics field.

  • Additional field to be added in the metrics group enabled:

    • if enabled: true, all the default metrics within the group (defined with enabled: true) will be emitted by default.
    • if enabled: false, none of the metrics within the group will be emitted by default.
  • Additionally, we may have an optional warnings field similar to what each metric field currently can use. Using that field, we can proactively warn users about upcoming changes in the group, such as turning the group from optional to default, deprecation, replacement, etc.

The updated metadata.yaml schema will look the following:

metric_groups:
  metric_group_1:
    enabled: <true|false>
    metrics:
      metric_1:
        enabled: <true|false>
      metric_2:
        enabled: <true|false>
    warnings:
      if_enabled: "a warning that will be displayed if the metric group is enabled in user config."
      if_enabled_not_set: "a warning that will be displayed if `enabled_metrics` field is not set explicitly in user config."
      if_configured: "a warning that will be displayed if a user configures the metrics group in any way."

If a receiver doesn't have logical groups to split the metrics, all the metrics should be defined in a group with default name.

User config interface

Splitting metrics into groups gives users more flexibility in controlling what metrics they want to collect. Using groups, we can allow new patterns of disabling/enabling metrics, such as enabling all metrics from a particular group, enabling default metrics from a particular group, or disabling all group's metrics. For that, we need more than just a boolean enabled field in the user config for metrics groups. The following interface can be provided to user:

metric_groups:
  metric_group_1:

    # none: none of the group's metrics are emitted
    # default: only the group's default metrics are emitted
    # all: all group's default metrics are emitted
    enabled_metrics: <none|default|all>

    metrics:
      metric_1
        # this field always has the highest precedence to allow use cases like "enable all metrics from group_1 except for metric_1"
        enabled: <true|false>

Developer interface

Developer API should not be changed in any way other than introducing additional configuration structures for metric groups. Most existing receivers don't need to change anything because of how the metadata.MetricsBuilderConfig struct is added to the receivers config. But, if a receiver author wants to customize the user configuration interface, they can still use the embedded structures directly instead of metadata.MetricsBuilderConfig. That customization would be required for hostmetrics receiver to add additional configuration options for the metric groups that are currently defined for each scraper, e.g., IncludeVirtualFS in the filesystem scraper.

Migration

The main problem of the migration is moving all the existing user settings for metrics from the receivers config layer to a metrics group layer, probably metric_groups.default for most cases. cmd/mdatagen tool can help with that migration by generating temporary support for the metrics field on both layers and generating depreciation messages for users to update their configs.

@atoulme
Copy link
Contributor

atoulme commented Mar 10, 2023

Can you also define warnings at the level of a single metric in a group?

@dmitryax
Copy link
Member Author

dmitryax commented Mar 10, 2023

Yes, that's already supported

@github-actions
Copy link
Contributor

github-actions bot commented May 9, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2023

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 8, 2023
@dmitryax dmitryax reopened this Aug 10, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 9, 2023
Copy link
Contributor

github-actions bot commented Dec 8, 2023

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants