Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spooler): Add capacity check for the envelope buffer #3925

Merged
merged 24 commits into from
Aug 28, 2024

Conversation

iambriccardo
Copy link
Member

@iambriccardo iambriccardo commented Aug 13, 2024

This PR adds the capability for checking the capacity of the envelope buffer. Such capacity is computed differently based on the strategy in use:

  • Disk: the disk usage is computed asynchronously and periodically updated in the background.
  • Memory: the memory usage is checked via the MemoryChecker.

Closes: https://github.com/getsentry/team-ingest/issues/517, https://github.com/getsentry/team-ingest/issues/521

}

/// A provider of [`EnvelopeStack`] instances that is responsible for creating them.
pub trait StacksManager: std::fmt::Debug {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to StacksManager to make it more meaningful w.r.t. the mansions of the trait implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manager in a type name is usually a red flag for unclear separation of concerns. In this case it creates new stacks parametrized with the underlying storage backend, and queries the storage backend for information. Maybe we should call this StorageBackend, and / or unite it with the other EnvelopeStore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that for me they are two separate entities, it's a coincidence that some of them forward the methods.

The dependency is:

flowchart TD
	StacksManager --> EnvelopeStack --> EnvelopeStore
Loading

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StacksManager holds a reference to EnvelopeStore, right? So there is already a coupling between the manager and the store:

classDiagram
StacksManager "1" o-- "1" EnvelopeStore
StacksManager --> "creates" EnvelopeStack
EnvelopeStack "n" o-- "1" EnvelopeStore
Loading

IMO this can be simplified to

classDiagram
EnvelopeStack o-- "uses" EnvelopeStore
EnvelopeStore --> "creates" EnvelopeStack
Loading

(store creates stack, stack uses store)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like having the store interact with the stack is not ideal, for me the store is just offering data to some consumers, the stack is dealing with the data in a domain-specific way and the stack manager deals with creating stacks.

@iambriccardo iambriccardo marked this pull request as ready for review August 14, 2024 11:50
@iambriccardo iambriccardo requested a review from a team as a code owner August 14, 2024 11:50
@iambriccardo iambriccardo requested a review from jjbayer August 14, 2024 12:05
Comment on lines 6 to 10
/// Enum representing the current capacity of the [`StacksManager`] to accept new [`Envelope`]s.
pub enum Capacity {
Free,
Full,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't expect capacity to ever have more than two unit variants, I would remove this and give the trait a has_capacity method instead. That makes the intent more clear IMO.

}

/// A provider of [`EnvelopeStack`] instances that is responsible for creating them.
pub trait StacksManager: std::fmt::Debug {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manager in a type name is usually a red flag for unclear separation of concerns. In this case it creates new stacks parametrized with the underlying storage backend, and queries the storage backend for information. Maybe we should call this StorageBackend, and / or unite it with the other EnvelopeStore?

@iambriccardo iambriccardo requested a review from jjbayer August 27, 2024 08:25
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about await, rest are nits.

@@ -307,12 +307,14 @@ fn queue_envelope(

match state.envelope_buffer() {
Some(buffer) => {
if !buffer.has_capacity().await {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second review, I think we should make this method sync, because we should avoid blocking the request handler on disk read / writes under all circumstances. One option to make it sync would be to change GuardedEnvelopeBuffer::has_capacity to a try_lock, and cache the resulting bool:

        match self.inner.try_lock() {
            Ok(guard) => {
                let c = guard.backend.has_capacity();
                self.cached_capacity.store(c);
                c
            },
            Err(_) => self.cached_capacity.load()
        }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this could be an idea, given the intrinsic delay of memory and disk usage calculation, adding one by caching it should not be a big deal. My main worry with this is that we can't guarantee updated values in any way if lock contention is arbitrarily high.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds acceptable to me. We can emit a counter metric to track how often we emit cached vs fresh capacity. Once we transform this to a service, the cached capacity will be updated differently, anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I added the metric because I am interested.

pub mod sqlite;

/// Trait that models a store of [`Envelope`]s.
pub trait EnvelopeStore {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As discussed, this does not need to be a trait. It might be beneficial to remove it, to avoid reader's expectations like "surely there is also a MemoryEnvelopeStore".

@iambriccardo iambriccardo requested a review from jjbayer August 27, 2024 09:12
@iambriccardo iambriccardo changed the title feat(spooler): Add EnvelopeStore trait and capacity check feat(spooler): Add capacity check for the envelope buffer Aug 27, 2024
Co-authored-by: Joris Bayer <joris.bayer@sentry.io>
@iambriccardo iambriccardo merged commit 6198771 into master Aug 28, 2024
24 checks passed
@iambriccardo iambriccardo deleted the riccardo/feat/envelope_store branch August 28, 2024 07:15
jan-auer added a commit that referenced this pull request Aug 28, 2024
Re-applies #3951, which was overwritten by accident in #3925
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants