-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(s2n-quic-core): add aggregate metrics support #2364
Conversation
9a46404
to
2b346d1
Compare
14fe512
to
628a2ab
Compare
628a2ab
to
7a3dfc0
Compare
dc/s2n-quic-dc/events/connection.rs
Outdated
total_len: usize, | ||
|
||
/// The amount that was written | ||
#[measure("bytes", "b")] | ||
#[counter("bytes.total", "b")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: total being less than provided feels like a bit of a weird "total" measurement? Maybe "committed" bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i like committed
better
impl<'a> Str<'a> { | ||
/// # Safety | ||
/// | ||
/// The provided value must end in a `\0` character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: safety condition includes no interior null bytes as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good call
|
||
/// A str that is also a [`CStr`] | ||
#[derive(Clone, Copy)] | ||
pub struct Str<'a>(usize, &'a CStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: at least today the length is already stored in CStr as a separate field, so we're duplicating that storage here. Technically there's desire to change that in some future Rust release, but it seems unlikely to happen all that soon. Maybe we should store as &'a str with the knowledge that there's a null terminator at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't want to rely on the CStr::count_bytes
being constant time: https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.count_bytes.
Maybe we should store as &'a str with the knowledge that there's a null terminator at the end?
I wouldn't be able to impl AsRef<CStr>
at that point, though, right?
In either case this is always a 'static
so I'm not super worries about the extra storage size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can still impl AsRef with this same code? The lifetimes should still match up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up making it a Str(str)
which seems to work
#[inline(always)] | ||
fn count(&self, info: usize, id: usize, value: u64) { | ||
let info = unsafe { INFO.get_unchecked(info) }; | ||
let counter = unsafe { self.counters.get_unchecked(id) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to either have unsafe fn count or a wrapper around the indices that guarantees they've been checked already. AFAICT we always have constant arguments, right? So maybe we can even make this safe and instead just rely on the compiler to drop useless comparisons if we store counters, measures, etc. as Box<[...; length]>
-- it seems like we know the needed capacity statically already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The INFO
len should be a known value. But the counters aren't currently, though I could make them a Box<[...; length]>
, it's just a bit trickier to initialize. Let me try to refactor it to an array and then i can get rid of the unsafe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up just using TryInto<[T; LEN]> for Vec<T>
and it worked! Thanks for the suggestion :)
@@ -85,6 +87,8 @@ struct FrameReceived<'a> { | |||
struct PacketLost<'a> { | |||
packet_header: PacketHeader, | |||
path: Path<'a>, | |||
#[measure("bytes_lost", "b")] | |||
#[counter("bytes_lost.total", "b")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to incorporate lost in the name here? It seems like this ought to be already behind some path in any metric etc. (e.g., packet_lost.bytes.total or equivalent).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's a good point. i'll just do bytes
and bytes.total
)); | ||
|
||
record.extend(quote!( | ||
self.recorder.increment_counter(#snake, self.#counter #counter_load as _); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll want counter_swap(0) or something like that here -- I guess it depends on the metric system, some want each counter emission to have distinct counts (e.g., if you have +1, +1, +0, +1 and publishing every 60 seconds you'd emit =1, =1, =0, =1, not =1, =2, =2, =3). Others want the running sum (e.g., Prometheus largely expects a running sum).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These metrics are connection level metrics that we have today, which are recorded once at the end of the connection. I just refactored the code to be in a single location rather than spread across everything. See
s2n-quic/quic/s2n-quic-events/src/parser.rs
Line 358 in 65d55a4
self.recorder.increment_counter(#snake, self.#counter #counter_load as _); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so swap or load is the same since we expect this to be terminal (i.e. no more increments afterwards).
Description of changes:
This change adds an initial generated metric aggregator for all events that are emitted. This makes it easy to automatically pick up new metrics once they are added without doing any initial work in the subscriber implementation.
Events are emitted as counters automatically. Additionally, they can specify how to aggregate individual fields into the various metric types.
I've also added an example probe metric registry that uses
s2n_quic_core::probe::define
to emit metrics, which provides USDTs in the compiled application. There are 2 different types of registries. One isdynamic
in that it emitsregister
calls with the names and units, and then later callsrecord
for the actual data. The other one is a static mode where all of the probes are defined for each event and known at build time.Call-outs:
The
s2n-quic-events
script and its generated output was growing a bit beyond maintainable so I took a minute to refactor it a bit to make things a bit more sustainable. I'll do a follow-up to refactor it the rest of the way.Testing:
I wired up the new subscriber to the map events tests and verified it printed out, as expected.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.