Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTel-Arrow receiver admission limits: redesign #36033

Closed
wants to merge 12 commits into from

Conversation

jmacd
Copy link
Contributor

@jmacd jmacd commented Oct 28, 2024

Description

Redesigns the internal/otelarrow/admission package. There was a defect & regression. This follows the solution outlined in #36074.

OTel-Arrow receiver:

  • Stop using otlp-pdata-size header.
  • Acquire resources once per request, not in two steps.
  • admission::waiter_limit configuration (measured in requests) was ineffective, replace with admission::waiting_limit_mib measured in bytes.

Admission controller:

  • Fix resource leak in context cancellation
  • Avoid use of fallible APIs
  • Use LIFO order

Link to tracking issue

Fixes #36074.

Testing

New end-to-end test added. Explicitly test for proper span production when OTLP admission control actions fail.

Documentation

Updated.

Copy link

codecov bot commented Oct 29, 2024

Codecov Report

Attention: Patch coverage is 99.32432% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.37%. Comparing base (c5b6746) to head (e9b115a).
Report is 57 commits behind head on main.

Files with missing lines Patch % Lines
receiver/otelarrowreceiver/otelarrow.go 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #36033      +/-   ##
==========================================
+ Coverage   80.34%   80.37%   +0.02%     
==========================================
  Files        2138     2142       +4     
  Lines      201222   201669     +447     
==========================================
+ Hits       161673   162090     +417     
- Misses      34274    34304      +30     
  Partials     5275     5275              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +117 to +120
// We were also admitted, which can happen
// concurrently with cancellation. Make sure
// to release since no one else will do it.
bq.releaseLocked(pending)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review notes: this is a new code path, fixes the race condition.

Comment on lines +108 to +111
case <-waiter.notify.Chan():
parentSpan.AddEvent("admission accepted (slow path)", pendingAttr)
return bq.releaseFunc(pending), nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review notes: This is the same logic as before, but we return a release function to avoid fallible interfaces.

bq.lock.Unlock()
ctx, span := bq.tracer.Start(ctx, "admission_blocked",
trace.WithAttributes(attribute.Int64("pending", pendingBytes)))
ctx, span := bq.tracer.Start(ctx, "admission_blocked", pendingAttr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the use of list.List avoids the need to generate a new key. This simplification is possible because we always remove in LIFO order and don't require random access.

sizer *plog.ProtoMarshaler
logger *zap.Logger
}

// New creates a new Receiver reference.
func New(logger *zap.Logger, nextConsumer consumer.Logs, obsrecv *receiverhelper.ObsReport, bq *admission.BoundedQueue) *Receiver {
func New(logger *zap.Logger, nextConsumer consumer.Logs, obsrecv *receiverhelper.ObsReport, bq admission.Queue) *Receiver {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note there are identical changes in each of the otlp.go and otlp_test.go files below, once each for traces, logs, and metrics.

@@ -5,209 +5,330 @@ package admission

import (
"context"
"fmt"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this file has been rewritten from scratch because of the FIFO->LIFO change and the fallibility change. This file brings us to 100% test coverage.

@jmacd jmacd closed this Oct 29, 2024
andrzej-stencel pushed a commit that referenced this pull request Oct 31, 2024
#### Description

Adds a no-op implementation of the BoundedQueue used by the OTel-Arrow
receiver for admission control.

#### Link to tracking issue

Part of #36074.

#### Testing

Adds a new end-to-end test to verify admission control with/without
waiters and unbounded. The test was taken from #36033.

#### Documentation

Added: "0" request_limit_mib indicates no admission control.
djaglowski pushed a commit that referenced this pull request Nov 1, 2024
#### Description

Simplifies the admission control logic for OTAP payloads. We call
Acquire() once after uncompressing the data, instead of once with
compressed size and once with the difference.

#### Link to tracking issue

Part of #36074.

#### Testing

One test is replaced with logic to verify certain BoundedQueue actions.

~Note: the OTel-Arrow test suite will not pass with this PR until it
merges with #36078.~ Originally developed in #36033.

#### Documentation

Not user-visible.
ArthurSens pushed a commit to ArthurSens/opentelemetry-collector-contrib that referenced this pull request Nov 4, 2024
…try#36081)

#### Description

Adds a no-op implementation of the BoundedQueue used by the OTel-Arrow
receiver for admission control.

#### Link to tracking issue

Part of open-telemetry#36074.

#### Testing

Adds a new end-to-end test to verify admission control with/without
waiters and unbounded. The test was taken from open-telemetry#36033.

#### Documentation

Added: "0" request_limit_mib indicates no admission control.
ArthurSens pushed a commit to ArthurSens/opentelemetry-collector-contrib that referenced this pull request Nov 4, 2024
…telemetry#36082)

#### Description

Simplifies the admission control logic for OTAP payloads. We call
Acquire() once after uncompressing the data, instead of once with
compressed size and once with the difference.

#### Link to tracking issue

Part of open-telemetry#36074.

#### Testing

One test is replaced with logic to verify certain BoundedQueue actions.

~Note: the OTel-Arrow test suite will not pass with this PR until it
merges with open-telemetry#36078.~ Originally developed in open-telemetry#36033.

#### Documentation

Not user-visible.
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…try#36081)

#### Description

Adds a no-op implementation of the BoundedQueue used by the OTel-Arrow
receiver for admission control.

#### Link to tracking issue

Part of open-telemetry#36074.

#### Testing

Adds a new end-to-end test to verify admission control with/without
waiters and unbounded. The test was taken from open-telemetry#36033.

#### Documentation

Added: "0" request_limit_mib indicates no admission control.
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…telemetry#36082)

#### Description

Simplifies the admission control logic for OTAP payloads. We call
Acquire() once after uncompressing the data, instead of once with
compressed size and once with the difference.

#### Link to tracking issue

Part of open-telemetry#36074.

#### Testing

One test is replaced with logic to verify certain BoundedQueue actions.

~Note: the OTel-Arrow test suite will not pass with this PR until it
merges with open-telemetry#36078.~ Originally developed in open-telemetry#36033.

#### Documentation

Not user-visible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OTel-Arrow receiver admission control broken in several ways
1 participant