Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] New exporter helper #8762

Closed
wants to merge 2 commits into from

Conversation

dmitryax
Copy link
Member

Draft PR combining #8275 and #8685

@dmitryax dmitryax changed the title New exporter end state [DRAFT] New exporter helper Oct 26, 2023
@codecov
Copy link

codecov bot commented Oct 26, 2023

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (b0f618e) 91.49% compared to head (9b8f74c) 91.56%.
Report is 72 commits behind head on main.

Files Patch % Lines
exporter/exporterhelper/common.go 90.00% 3 Missing and 1 partial ⚠️
exporter/exporterhelper/batch_sender.go 98.08% 2 Missing and 1 partial ⚠️
...porter/exporterhelper/internal/queue_controller.go 94.23% 2 Missing and 1 partial ⚠️
exporter/exporterhelper/queue.go 89.28% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8762      +/-   ##
==========================================
+ Coverage   91.49%   91.56%   +0.06%     
==========================================
  Files         316      319       +3     
  Lines       17181    17441     +260     
==========================================
+ Hits        15720    15969     +249     
- Misses       1165     1173       +8     
- Partials      296      299       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jmacd
Copy link
Contributor

jmacd commented Oct 27, 2023

@dmitryax Before reviewing this code, I would love to see a high-level description of the design after your changes. My mental model of the existing code is that there are three parts, Timeout, QueueSender, and RetrySettings. I understand the arguments in favor of moving batch processor into the exporter helper, but I'm afraid this code has a serious "accidental" complexity problem and I think it strongly deserves some refactoring attention, to split apart these separate behaviors.

The layering of these behaviors is significant because currently if affects how we instrument the exporters. Separately, I will make an argument that system-wide observability signals should not be computed in the *helpers, but in the core libraries, so let's presume that counting of success/failure and generally monitoring the pipeline health eventually moves out of exporterhelper. This alone will improve the code's clarity.

The concern I have, is that I've been waiting for this PR to land not because of a dire need for batching. We have the existing batch processor, but it has several defects. The high-level description I'm looking for, here, will help me know if the three defects I'm aware of are entirely solved.

The defects we want fixed relative to the legacy batch processor:

  1. Backpressure is not correct. When the first few requests are received by the channel, they immediately return success. I want an option to block until the entire batch succeeds or fails. When requests arrive and the queues are full, they should block, not fail. I must NOT be required to use the existing queue_sender functionality to benefit from batching, because existing queue-sender behavior drops data when the queue is full. However, I'm concerned that we are limited to a single export thread unless the queue sender is also enabled, so we need to be sure that the facility for num_consumers is available while using batching, with functional backpressure.
  2. Error returns are not correct. When the batch fails, I want all the requests that entered it to fail. This should be an option so that callers can see how their requests are failing, in order to respond correctly. We want the option not to return success when the request fails, which relies on blocking (as described above).
  3. Export concurrency is limited, unless the queue_sender is configured. Without the queue sender, there is no export concurrency and the batch processor can export only one batch at a time. I want the ability to export multiple batches at once, with functional back pressure, and with error returns.

Please update the PR description, ideally with diagrams explaining how the code here works. It'll greatly help reviewers, thank you!

@dmitryax
Copy link
Member Author

dmitryax commented Oct 28, 2023

@jmacd, thank you for the feedback!

My mental model of the existing code is that there are three parts, Timeout, QueueSender, and RetrySettings. I understand the arguments in favor of moving batch processor into the exporter helper, but I'm afraid this code has a serious "accidental" complexity problem and I think it strongly deserves some refactoring attention, to split apart these separate behaviors.

If I understand correctly, you are referring to the initial state of the exporter helper in which all the code was glued together. I think I addressed that problem to some extent in my previous PRs to make those parts separate internal components (senders). Now, the batching capability is being added as another sender in front of the chain of the existing senders. Each sender's behavior is controlled by the exporter helper configuration and can be enabled or disabled. The code is still a bit clunky because the queue implementation has not migrated to the new exporter helper yet (I need #8275 to be merged)

Replying to (1) and (2)

As the first step, in this PR, the batching capabilities are moved mostly as is with some generalization of the batch sharding capability. We can introduce blocking capability as well. It can probably be controlled by enabling/disabling the queue: if the queue is disabled, the batching is blocking. In that case, if the batch fails to be exported, all the blocked calls get an error.

When requests arrive and the queues are full, they should block, not fail.

This ask is a bit unclear to me. Why should we block if the queue is full? Isn't replying with an error that is translated to backpressure on the receiver side (e.g. 429 for http receivers) would be a better option?

Replying to (3)

I think I understand the ask. You want to start building a new batch as soon as the previous one is ready to be exported without waiting for the request to complete, right?

I was mainly focusing on the goals outlined in #8122, primarily relying on the queue sender. I see your goals and will try to incorporate them or at least make sure the new design doesn't restrict achieving them. I agree that it's better to have a design defining the end state, I'll work on it.

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 12, 2023
@dmitryax dmitryax removed the Stale label Nov 12, 2023
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 27, 2023
@dmitryax dmitryax removed the Stale label Nov 27, 2023
@dmitryax dmitryax force-pushed the new-exporter-end-state branch 2 times, most recently from bf4f496 to 46e6316 Compare December 1, 2023 01:38
Configuration of the queue capacity in the new exporter helper interface is based on items (spans, metric data points, or log records) instead of batches/requests as in the current exporter helper. It helps users to have better control over the queue size. It also allows putting the optional batcher sender after the queue sender, so the batcher sender doesn't need an additional queue.
@dmitryax dmitryax force-pushed the new-exporter-end-state branch 2 times, most recently from 7b49d5c to 3167d18 Compare December 10, 2023 20:48
This change introduces new experimental batching functionality to the exporter helper
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 26, 2023
Copy link
Contributor

github-actions bot commented Jan 9, 2024

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants