Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter] Flip on queue batcher #11637

Merged
merged 6 commits into from
Dec 2, 2024

Conversation

sfc-gh-sili
Copy link
Contributor

@sfc-gh-sili sfc-gh-sili commented Nov 9, 2024

Description

This PR solves #10368.

Previously we use a pushing model between the queue and the batch, resulting the batch size to be constrained by the sending_queue.num_consumers, because the batch cannot accumulate more than sending_queue.num_consumers blocked goroutines provide.

This PR changes it to a pulling model. We read from the queue until threshold is met or timeout, then allocate a worker to asynchronously send out the request.

Link to tracking issue

Fixes #10368
#8122

Testing

This PR swaps out batch_sender directly and still passes all the existing tests.

Documentation

@sfc-gh-sili sfc-gh-sili force-pushed the sili-flip-on branch 7 times, most recently from 2900101 to 55aae5c Compare November 11, 2024 08:30
Copy link

codecov bot commented Nov 11, 2024

Codecov Report

Attention: Patch coverage is 70.00000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 91.43%. Comparing base (32abecb) to head (2982282).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
exporter/exporterhelper/internal/queue_sender.go 68.42% 4 Missing and 2 partials ⚠️
exporter/internal/queue/default_batcher.go 25.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11637      +/-   ##
==========================================
- Coverage   91.45%   91.43%   -0.02%     
==========================================
  Files         447      447              
  Lines       23721    23743      +22     
==========================================
+ Hits        21694    21710      +16     
- Misses       1653     1657       +4     
- Partials      374      376       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sfc-gh-sili sfc-gh-sili marked this pull request as ready for review November 11, 2024 08:58
@sfc-gh-sili sfc-gh-sili requested a review from a team as a code owner November 11, 2024 08:58
@sfc-gh-sili sfc-gh-sili requested a review from songy23 November 11, 2024 08:58
@songy23 songy23 requested review from bogdandrutu and dmitryax and removed request for songy23 November 11, 2024 15:08
Comment on lines 98 to 102
// Shutdown ensures that queue and all Batcher are stopped.
func (qb *BaseBatcher) Shutdown(_ context.Context) error {
qb.stopWG.Wait()
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the other comment.

Comment on lines 127 to 138
qb.currentBatchMu.Lock()
if qb.currentBatch == nil || qb.currentBatch.req == nil {
qb.currentBatchMu.Unlock()
continue
}
batchToFlush := *qb.currentBatch
qb.currentBatch = nil
qb.currentBatchMu.Unlock()

// flushAsync() blocks until successfully started a goroutine for flushing.
qb.flushAsync(batchToFlush)
qb.resetTimer()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this, can we do this in a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Here it is: #11666
batch_sender_test helped me detect that the original implementation is missing a flush on shutdown.

@sfc-gh-sili sfc-gh-sili changed the title [exporter] Flip on queue batcher [PAUSED] [exporter] Flip on queue batcher Nov 12, 2024
@dmitryax
Copy link
Member

Given the impact of this change (every collector user with sending_queue enabled, which is the default), I suggest we introduce it with a feature gate, e.g. exporter.batchingQeueue.

@sfc-gh-sili sfc-gh-sili force-pushed the sili-flip-on branch 8 times, most recently from 555baa2 to 6bb9b7f Compare November 15, 2024 01:29
@sfc-gh-sili sfc-gh-sili changed the title [PAUSED] [exporter] Flip on queue batcher [exporter] Flip on queue batcher Nov 15, 2024
@sfc-gh-sili
Copy link
Contributor Author

@dmitryax Hi Dimitrii, I wonder if you know what would be a better way to make sure existing tests pass with both feature gate on and off. Manually enabling then disabling in every single exporter test could work but I wonder if there's other option

@sfc-gh-sili sfc-gh-sili force-pushed the sili-flip-on branch 3 times, most recently from 844cfc6 to d158ac8 Compare November 20, 2024 01:56
"go.opentelemetry.io/collector/pipeline"
)

var usePullingBasedExporterQueueBatcher = featuregate.GlobalRegistry().MustRegister(
"telemetry.UsePullingBasedExporterQueueBatcher",
featuregate.StageBeta,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why starting with Beta? That sounds too aggressive. Let's start with Alpha

bogdandrutu pushed a commit that referenced this pull request Nov 22, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR proceeds
#11637. It
* Introduces a noop feature gate that will be used for queue batcher.
* Updates exporter tests to run with both the feature gate on and off.

<!-- Issue number if applicable -->
#### Link to tracking issue
#10368
#8122


<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->
@sfc-gh-sili sfc-gh-sili force-pushed the sili-flip-on branch 3 times, most recently from 97bd281 to 415dcb8 Compare November 22, 2024 21:29
.chloggen/11637-exporter-queue-batcher.yaml Outdated Show resolved Hide resolved
exporter/exporterhelper/internal/base_exporter.go Outdated Show resolved Hide resolved
.chloggen/11637-exporter-queue-batcher.yaml Outdated Show resolved Hide resolved
exporter/exporterhelper/internal/base_exporter.go Outdated Show resolved Hide resolved
Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about tests. Otherwise LGTM

sfc-gh-sili and others added 6 commits December 2, 2024 12:44
Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
@dmitryax dmitryax merged commit 4782ad0 into open-telemetry:main Dec 2, 2024
49 of 50 checks passed
@github-actions github-actions bot added this to the next release milestone Dec 2, 2024
@sfc-gh-sili sfc-gh-sili deleted the sili-flip-on branch December 4, 2024 00:10
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this pull request Dec 19, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR proceeds
open-telemetry#11637. It
* Introduces a noop feature gate that will be used for queue batcher.
* Updates exporter tests to run with both the feature gate on and off.

<!-- Issue number if applicable -->
#### Link to tracking issue
open-telemetry#10368
open-telemetry#8122


<!--Describe what testing was performed and which tests were added.-->
#### Testing

<!--Describe the documentation added.-->
#### Documentation

<!--Please delete paragraphs that you did not use before submitting.-->
HongChenTW pushed a commit to HongChenTW/opentelemetry-collector that referenced this pull request Dec 19, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR solves
open-telemetry#10368.

Previously we use a pushing model between the queue and the batch,
resulting the batch size to be constrained by the
`sending_queue.num_consumers`, because the batch cannot accumulate more
than `sending_queue.num_consumers` blocked goroutines provide.

This PR changes it to a pulling model. We read from the queue until
threshold is met or timeout, then allocate a worker to asynchronously
send out the request.

<!-- Issue number if applicable -->
#### Link to tracking issue
Fixes
open-telemetry#10368
open-telemetry#8122

---------

Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporterhelper] Awkwardness due to API between queue sender and batch sender
3 participants