Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(outputs): Implement partial write errors #16146

Merged
merged 3 commits into from
Dec 4, 2024

Conversation

srebhan
Copy link
Member

@srebhan srebhan commented Nov 5, 2024

Summary

This PR implements specification TSD-008 for partial write errors.

Checklist

  • No AI generated code was used in this PR

Related issues

related to #11942
related to #14802
related to #15908

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Nov 5, 2024
@srebhan srebhan force-pushed the partial_write_error branch from 5608625 to 0dbf506 Compare November 5, 2024 18:08
@srebhan srebhan added plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. labels Nov 7, 2024
Copy link
Member

@DStrand1 DStrand1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I tested this thoroughly, especially with the disk buffer and didn't find any issues. This included some manual tests, testing other plugin test cases with the disk buffer, and coverage tests. I added a couple comments as well

models/buffer_disk.go Show resolved Hide resolved
models/buffer_disk.go Show resolved Hide resolved
internal/errors.go Show resolved Hide resolved
@srebhan srebhan force-pushed the partial_write_error branch from 0dbf506 to f2090a4 Compare November 25, 2024 12:22
@srebhan srebhan requested a review from DStrand1 November 25, 2024 12:22
models/buffer_disk.go Outdated Show resolved Hide resolved
Comment on lines +20 to +35
type Transaction struct {
// Batch of metrics to write
Batch []telegraf.Metric

// Accept denotes the indices of metrics that were successfully written
Accept []int
// Reject denotes the indices of metrics that were not written but should
// not be requeued
Reject []int

// Marks this transaction as valid
valid bool

// Internal state that can be used by the buffer implementation
state interface{}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "transaction" is the wrong word to use here. In computing, a transaction is a sequence of steps that is executed atomically. Either it runs fully, or not at all. The structure Transaction here records partially completed writes. It is explicitly providing support for a sequence of actions that is allowed to partially complete. Maybe I would call it PartialBatch, or something else to indicate it need not all succeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm IMO it is a transaction on the buffer. The buffer is not modified until EndTransaction is called on the buffer. Let's keep this for now.

Comment on lines +48 to +49
MetricsAccept []int
MetricsReject []int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using bit arrays here would lead to memory savings and performance improvements. Switching to them would be a significant change to the PR and I think the code is sound as it is, so not recommending that, but maybe as a future performance enhancement.

I would continue to use separate vars for "accept" and "reject", just switch them to bit arrays. They would be fixed in size at len(metrics)/8+1 bytes long, vs len(accepted)*64, which I think in many cases would be approaching len(metrics)*64. Not sure what the typical use case is here, would take some measurement of real-world scenarios, but I think chances are good it would be a decent savings.

The other advantage is that it can support simplified operations. The metrics to drop from the WAL after a single write can be found with the bit-wise union of the two bit arrays for accept and reject. Then you union that again with the long-running mask for the WAL. If all the bits are 1 the whole WAL is done. If not, to find out what prefix to remove from the WAL, you find the first non-zero bit in the mask.

Here's an example of a bit-array library you could use as a reference. I would not use a library, but instead write a simplified one that supplies just the operations needed here. When bit-array data structures are not made general they can be surprisingly small in code.

https://github.com/yourbasic/bit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do agree that using a bitmask is better it seems a premature optimization at this point. We really need to optimize this further but IMO this also must include optimizing disk I/O as we are currently scratching the disk badly. In this step, the removal-masking should move into a dedicated WAL implementation and be converted to a bit-mask as you suggest. What do you think?

@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Dec 2, 2024

@srebhan srebhan merged commit 0ea4c14 into influxdata:master Dec 4, 2024
27 checks passed
@srebhan srebhan deleted the partial_write_error branch December 4, 2024 20:55
@github-actions github-actions bot added this to the v1.33.0 milestone Dec 4, 2024
justinwwhuang pushed a commit to justinwwhuang/telegraf_fork that referenced this pull request Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants