Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ottl: Efficiently test if a value is in a list or map #30420

Open
ringerc opened this issue Jan 11, 2024 · 18 comments
Open

ottl: Efficiently test if a value is in a list or map #30420

ringerc opened this issue Jan 11, 2024 · 18 comments
Labels
enhancement New feature or request help wanted Extra attention is needed pkg/ottl priority:p2 Medium

Comments

@ringerc
Copy link

ringerc commented Jan 11, 2024

Component(s)

pkg/ottl

Describe the issue you're reporting

Currently OTTL doesn't offer an efficient way to test if a value is in a list of possible values. It has no in or contains operator, nor (AFAICS) does it have map literals.

Sometimes it's necessary to have filters that use fairly long lists of values to match, e.g. metric names to allow-list. These should be efficient to match, preferably without re-parsing and sorting the whole list every time.

Possible implementations could be:

  • Map literals with lookup: {"metric1": true, "metric2": true, "metric3": true}[name] . A means of coalescing null to false would be required, because it looks like OTTL gets upset when passed a null when it expects a boolean.
  • in or contains operator on arrays: name in ["metric1", "metric2"]

Current workarounds

  • Use a long list of ... where name == "metric1" , ... where name == "metric2" etc statements or a compound conditional with or
  • Use a statement with a long list of where (name == "metric1" or name == "metric2") chained conditions
  • Use IsMatch with a regex like IsMatch(name,"^(metric1|metric2)$")
@ringerc ringerc added the needs triage New item requiring triage label Jan 11, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth TylerHelmuth added priority:p2 Medium enhancement New feature or request and removed needs triage New item requiring triage labels Jan 11, 2024
@TylerHelmuth
Copy link
Member

@ringerc I agree that OTTL needs to support this functionality. My initial thought is a function like Contains(target, item) that returns true if the item is in the target. Target could be a slice or map.

@ringerc
Copy link
Author

ringerc commented Jan 11, 2024

@TylerHelmuth That sounds sensible, straightforward to implement and would be fairly discoverable.

I haven't been able to work out if OTTL has a map literal syntax, but array/list literals would do the job. For maps it's possible to test the looked up key for nil, so array is more important anyway.

It would be interesting to be able to pre-parse, sort and cache long array literals for this purpose, but I don't know if that's even remotely feasible or worthwhile with how the OTTL runtime works.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Jan 11, 2024

It would be interesting to be able to pre-parse, sort and cache long array literals for this purpose, but I don't know if that's even remotely feasible or worthwhile with how the OTTL runtime works.

Literals in OTTL are created during collector startup and then reused. If you end up doing something like Contains([1, 2, 3, 4], attributes["item"]), the [1, 2, 3, 4] array is created on startup and reused.

For maps it's possible to test the looked up key for nil, so array is more important anyway.

I agree, maybe this function should only allow target to be a slice.

@ringerc
Copy link
Author

ringerc commented Jan 11, 2024

So an optional "presorted" bool flag could be provided, to attest that the input is a sorted literal slice. If false (default) the list will be scanned sequentially or will be sorted before execution.

There's no map literal, right? A pre-parsed, pre-hashed map would probably by quicker to use for an exists/contains test, with the minor inconvenience of having to provide dummy values for each key.

@TylerHelmuth
Copy link
Member

We don't have a map literal. Can you describe more about your use case and how a map solves it over a slice? Is it a lot of values?

@ringerc
Copy link
Author

ringerc commented Jan 11, 2024

Sure, and thanks for your time looking at this.

TL;DR: I need to use an allow-list / keep-list for metric names. A bit like keep_keys(...) but for the metric itself, not attributes. Existing processors do not seem to support this. OTTL has keep_keys(...) for metrics attributes but the transform processor can't directly drop metrics, nor set a metric attribute like "keep" or "drop" based on whether the metric name matches entries in a list of names.

So far it's no more than 30 values per set of metrics matched by common prefix. Maybe 200 metrics in total but likely to grow.

Currently I'm trying to use the transform processor to e.g.

set(attributes["keep"], false)
where IsMatch(metric.name, "^otelcol_*")
   and not IsMatch(metric.name, "^(otelcol_exporter_queue_capacity|otelcol_exporter_queue_size|otelcol_exporter_sent_metric_points|otelcol_process_cpu_seconds|otelcol_process_memory_rss|otelcol_process_runtime_heap_alloc_bytes|otelcol_process_runtime_total_alloc_bytes|otelcol_process_runtime_total_sys_memory_bytes|otelcol_process_uptime|otelcol_processor_batch_batch_size_trigger_send|otelcol_processor_batch_timeout_trigger_send|otelcol_receiver_accepted_metric_points|otelcol_receiver_refused_metric_points|otelcol_scraper_errored_metric_points|otelcol_scraper_scraped_metric_points|otelcol_processor_batch_batch_send_size|otelcol_processor_dropped_metric_points|otelcol_processor_refused_metric_points)$")

... then a filterprocessor to drop metrics with attributes["keep"] != true and another transformprocessor to drop the keep attribute before sending. It's clunky and verbose, and I suspect rather inefficient.

Having a contains operation wouldn't help with the need to tag for keep/drop, filter, and drop attribute. But it'd mean the allow -list wouldn't have to be a regex, it could be a hopefully more efficient sorted slice like:

# this doesn't work right now and is imaginary syntax. The boolean param indicates it's a pre-sorted list.
set(attributes["keep"], false)
where IsMatch(metric.name, "^otelcol_*")
   and not Contains(metric.name, [
     "otelcol_exporter_queue_capacity", "otelcol_exporter_queue_size", "otelcol_exporter_sent_metric_points",
     "otelcol_process_cpu_seconds", "otelcol_process_memory_rss", "otelcol_processor_batch_batch_send_size",
     "otelcol_processor_batch_batch_size_trigger_send", "otelcol_processor_batch_timeout_trigger_send",
     "otelcol_processor_dropped_metric_points", "otelcol_processor_refused_metric_points",
     "otelcol_process_runtime_heap_alloc_bytes", "otelcol_process_runtime_total_alloc_bytes",
     "otelcol_process_runtime_total_sys_memory_bytes", "otelcol_process_uptime", "otelcol_receiver_accepted_metric_points",
     "otelcol_receiver_refused_metric_points", "otelcol_scraper_errored_metric_points", 
     "otelcol_scraper_scraped_metric_points"
], true)

The reason it's required is hosted provider billing.

One of the metrics destinations I'm using is Datadog, which has an "interesting" billing and usage model. They bill metrics very steeply by the total cardinality across a 1 hour window, not data volume or unique datapoint count. But a bunch of metrics are special cased as claimed by specific "datadog integrations"; these are not billed at all irrespective of cardinality. Those metrics are matched by metric name. Billing is also done by unique monitored host, so these aren't free, they just don't get double-charged.

Datadog's list of metrics in their OTLP integration is incomplete and stale. They use custom names specific to their own agent for most other metrics, but some of them overlap with OTLP hostmetrics and kubeletstats names.

To prevent unpleasant billing surprises I need to "allow-list" only a specific set of named metrics to be delivered. These allow-lists are extracted from DD's metadata files, like https://github.com/DataDog/integrations-core/blob/master/otel/metadata.csv and https://github.com/DataDog/integrations-core/blob/master/container/metadata.csv

(On a side note I've been meaning to have a look at the metric transform processor to see if it can be patched to support a drop operation for metric datapoints, avoiding the need for the separate filterprocessor)

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Jan 12, 2024

I think a Contains is need, but I wonder if you could solve your use case with only a filterprocessor config:

processors:
  filter/ottl:
    error_mode: ignore
    metrics:
      metric:
          # each statement is ORed together - if any statement returns true the metric is dropped.
          - name = "otelcol_exporter_queue_capacity"
          - name = "otelcol_exporter_queue_size"
          - name = "otelcol_exporter_sent_metric_points"
          - name = "otelcol_process_cpu_seconds"
          - name = "otelcol_process_memory_rss"
          - name = "otelcol_processor_batch_batch_send_size"
          - name = "otelcol_processor_batch_batch_size_trigger_send"
          - name = "otelcol_processor_batch_timeout_trigger_send"
          - name = "otelcol_processor_dropped_metric_points"
          - name = "otelcol_processor_refused_metric_points"
          - name = "otelcol_process_runtime_heap_alloc_bytes"
          - name = "otelcol_process_runtime_total_alloc_bytes"
          - name = "otelcol_process_runtime_total_sys_memory_bytes"
          - name = "otelcol_process_uptime"
          - name = "otelcol_receiver_accepted_metric_points"
          - name = "otelcol_receiver_refused_metric_points"
          - name = "otelcol_scraper_errored_metric_points"
          - name = "otelcol_scraper_scraped_metric_points"

Or with IsMatch:

processors:
  filter/ottl:
    error_mode: ignore
    metrics:
      metric:
          - IsMatch(name, "^otelcol_")

These options would definitely be more efficient that calling a Contains function.

@ringerc
Copy link
Author

ringerc commented Jan 15, 2024

@TylerHelmuth The filterprocessor won't help unless I've grossly misunderstood something, because it can only express "drop if ..." conditions. I'm trying to express an allow-list not a drop-list - a "retain this explicit subset of container.* metrics but drop all other container.* metrics" rule.

I don't think the filterprocessor has a sensible way to express that? It doesn't AFAICS have a way to say "mark this set of datapoints to be kept, then drop all matching this expression unless previously marked to be kept".

The issue is writing "keep a_b and a_c but drop all other a_*" - where the list of keeps may be long.

so I could do a filterprocessor like

processors:
  filter/ottl:
    error_mode: ignore
    metrics:
      metric:
        - IsMatch(name, "^otelcol_*") and not IsMatch(name, "^(otelcol_exporter_queue_capacity|otelcol_exporter_queue_size|otelcol_exporter_sent_metric_points|otelcol_process_cpu_seconds|otelcol_process_memory_rss|otelcol_process_runtime_heap_alloc_bytes|otelcol_process_runtime_total_alloc_bytes|otelcol_process_runtime_total_sys_memory_bytes|otelcol_process_uptime|otelcol_processor_batch_batch_size_trigger_send|otelcol_processor_batch_timeout_trigger_send|otelcol_receiver_accepted_metric_points|otelcol_receiver_refused_metric_points|otelcol_scraper_errored_metric_points|otelcol_scraper_scraped_metric_points|otelcol_processor_batch_batch_send_size|otelcol_processor_dropped_metric_points|otelcol_processor_refused_metric_points)$")

but it's not a lot better really. For this specific use it's probably better than setting a keep attribute in the transform processor then dropping in the filter processor but using a regex just seems like the wrong tool for the job here.

Good to know that a long list of individual statements may be more efficient than testing for list-containment though. I'd assumed that testing for membership, at least in a pre-sorted list, should be significantly faster but it was just that - an assumption. I've not had a chance to do much performance testing.

@evan-bradley
Copy link
Contributor

@TylerHelmuth This may be a good use-case for a filter processor configuration option to keep if there's a match instead of dropping on a match (essentially wrapping all statements in a not) or an option to use AND logic instead of OR logic. These would be logically equivalent in terms of the end result, but could both be good options.

I'd assumed that testing for membership, at least in a pre-sorted list, should be significantly faster

@ringerc OTTL currently doesn't optimize string lists for fast lookup, I don't believe a pre-sorted list will help in most circumstances since an O(n) search would still be required. Usually the datasets OTTL is concerned with aren't large enough to justify any optimizations, but this may change in the future.

@ringerc
Copy link
Author

ringerc commented Jan 16, 2024

@evan-bradley Right. Checking in a list would only make sense if it could be marked as pre-sorted in the argument list so a faster search could be used. But then Bad Confusing Things would happen if the user's sort order for the locale they used for sorting didn't match the comparator that ottl uses, so it's probably not worth supporting anyway.

Re filter processor what I've been wondering about is the feasibility of marking a metric/datapoint to be dropped instead of eagerly dropping it, then having a way to unmark subsets. So it'd be possible to say "mark all x to be dropped" then "clear the to-drop flag on this subset".

I emulate this at the moment by using a transform processor to set and clear a "to-drop" attribute, then a filterprocessor to actually apply the final decisions.

This also means I can disable the filter processor to see what would be done by sending the dataset to a debug exporter, file exporter, etc. When dropping directly in the filter processor it's difficult to debug rules because there's no sort of "no-op" mode that (e.g.) sets an attribute instead of actually dropping.

The filter processor actually overlaps a lot with the transform processor, so it might make sense to give the transform processor a "drop" operation instead.

IDK.

Anyway, this has gone off the rails of the original suggestion a bit. It doesn't sound like it's simple to do. Shall I just close it?

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Jan 16, 2024

One more thought on the tangent, you can currently do:

processors:
  filter/ottl:
    error_mode: ignore
    metrics:
      metric:
          - |
             not ( 
               name = "otelcol_exporter_queue_capacity" or 
               name = "otelcol_exporter_queue_size" orname = "otelcol_exporter_sent_metric_points" or 
               ... or 
               name = "otelcol_scraper_scraped_metric_points"
            )

@evan-bradley I agree that an option to set a top-level not or use AND could improve these ergonomics.

@ringerc we got sidetracked, but the original goal is still needed: we should have a Contains function for checking if an item is in a slice.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@bezmax
Copy link

bezmax commented Apr 25, 2024

I can add another use-case into the bunch.

We have some high-dimensionality infra-level attributes that we do not want to ship by default (cost optimization), but in few rare cases we want to keep the ability to do so. The problem is, the "few rare cases" are difficult to statically define, sometimes they are a function of things like environment, execution context, specific call place, etc.

Therefore, instead of trying to encapsulate that in otel-config, we built this special attribute that the client can send us - "_infra_labels". Client can specify extra labels they want included for this specific data-point in this attribute in a form of an array, for example _infra_labels = [hostname, ip_address, ...].

Then, on the otel-collector side we check each of the labels for presence in that list, like set(attributes['hostname'], cache['hostname']) where attributes['_infra_labels'] contain 'hostname'.

In absence of "contains" operator, we are currently forced to do String(array) and then match each candidate with IsMatch, which is not ideal.

@JasmineCA
Copy link

Is this feature planned to be implemented soon? The issue #32691 would be resolved by implementing a Contains operator

@TylerHelmuth
Copy link
Member

I don't have time right now to implement it myself, but open to PRs

Copy link
Contributor

github-actions bot commented Jul 3, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 3, 2024
@evan-bradley evan-bradley added help wanted Extra attention is needed and removed Stale labels Jul 3, 2024
Copy link
Contributor

github-actions bot commented Sep 2, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed pkg/ottl priority:p2 Medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants