-
Notifications
You must be signed in to change notification settings - Fork 98
Worker heartbeat: New in-memory metrics mechism, plumb rest of heartbeat data #1023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: worker-heartbeat
Are you sure you want to change the base?
Worker heartbeat: New in-memory metrics mechism, plumb rest of heartbeat data #1023
Conversation
core-api/src/telemetry/metrics.rs
Outdated
} | ||
pub type Counter = LazyBoundMetric< | ||
|
||
pub type CounterType = LazyBoundMetric< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding Type
as the suffix here to these reads confusingly to me. Maybe Impl
works better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, CounterType was lazy me literally seeing the word type
and just appending it
kinda like https://www.youtube.com/watch?v=6haBMbtXSLg
metric: inner, | ||
attributes: MetricAttributes::Empty, | ||
bound_cache: OnceLock::new(), | ||
primary: LazyBoundMetric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably easier to just unify these constructors and take in_memory
as optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps completely mask the concept of in memory metrics from the CoreMeter
trait. Moving this into new()
would mean callers would have to pass in None
for the in_memory
field, whereas how things currently are, there's no exposure of the concept of in_mem
core/src/pollers/poll_buffer.rs
Outdated
shutdown: CancellationToken, | ||
num_pollers_handler: Option<impl Fn(usize) + Send + Sync + 'static>, | ||
options: WorkflowTaskOptions, | ||
last_successful_poll_time: Arc<parking_lot::Mutex<Option<SystemTime>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would invert this. Rather than passing these in, add a get_last_successful_poll_handle() -> Arc<AtomicCell<Option<SystemTime>
(btw you can use https://docs.rs/crossbeam/latest/crossbeam/atomic/struct.AtomicCell.html since SystemTime
is Copy
here which will be better than a lock)
Which you can call after these are built in worker setup, and then pass them into the heartbeat manager. One fewer thing to add to these mega arg lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sold that this is better, style wise. To plumb the metric out to the caller new_with_pollers
, we'd need to create a new struct return type to house
last_successful_poll_time: Arc<AtomicCell<Option<SystemTime>>>
sticky_last_successful_poll_time: Option<Arc<AtomicCell<Option<SystemTime>>>>
and
impl Stream<
Item = Result<
(
ValidPollWFTQResponse,
OwnedMeteredSemPermit<WorkflowSlotKind>,
),
tonic::Status,
>,
> + Sized
+ 'static,
Seems easier to pass in the params?
core/src/worker/mod.rs
Outdated
); | ||
} | ||
self.shutdown_token.cancel(); | ||
*self.status.lock() = WorkerStatus::ShuttingDown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might hold the lock for the rest of this scope. Maybe doesn't, but, better to be safe and just put this line inside it's own { }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do!
Shouldn't the lock only last for this line, since we aren't assigning the lock() to any local var, once that line passes the MutexGuard drops and the lock unlocks?
…g, counter_with_in_mem and and_then()
…, remove stale metric previously added
fn new() -> (Arc<Self>, Arc<AtomicU64>) { | ||
let used = Arc::new(AtomicU64::new(0)); | ||
(Self { used: used.clone() }, used) | ||
(Arc::new(Self { used: used.clone() }), used) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Test Helper Signature Change Causes Compilation Errors
The FakeMIS::new()
test helper function's signature changed to return (Arc<Self>, Arc<AtomicU64>)
. This creates a type mismatch with existing call sites (likely in tests) that still expect (Self, Arc<AtomicU64>)
, leading to a compilation error.
What was changed
NOTE: targeting
worker-heartbeat
feature branch.New in memory mechanism to keep track of certain metrics for worker heartbeating.
Why?
Checklist
Closes
How was this tested:
Note
Adds in-memory metrics backing and wires full worker heartbeat reporting (pollers, slots, cache, status, timing), plus a client API to list workers, with supporting poller/tuner and API changes.
HeartbeatMetricType
,WorkerHeartbeatMetrics
) and dual-recording instruments (counter_with_in_memory
,gauge_with_in_memory
,histogram_duration_with_in_memory
).MetricAttributes
/NoOp meter to support label value access and in-memory attrs; add per-metric label extraction.Arc
, simplify shared-namespace worker callback handling.worker_instance_key()
toWorker
; importuuid
.RealSysInfo
to a background refresh thread; expose sys info via tuner.list_workers
RPC wrapper; enhance worker client heartbeat/shutdown methods.Written by Cursor Bugbot for commit d956784. This will update automatically on new commits. Configure here.