Context
In CorrelationEngine::cleanup_expired(), when a pending cluster expires and is erased from pending_clusters_, the fault codes that belonged to that cluster are not removed from fault_to_cluster_. This leaves stale map entries.
Current behavior
for (const auto & rule_id : expired_pending) {
pending_clusters_.erase(rule_id);
// fault_to_cluster_ entries for faults in this cluster are not cleaned up
}
Expected behavior
Before erasing the pending cluster, iterate pending.data.fault_codes and erase each from fault_to_cluster_:
for (const auto & rule_id : expired_pending) {
auto it = pending_clusters_.find(rule_id);
if (it != pending_clusters_.end()) {
for (const auto & code : it->second.data.fault_codes) {
fault_to_cluster_.erase(code);
}
pending_clusters_.erase(it);
}
}
Impact
Low — stale entries are harmless since process_clear() cleans them up individually. But it's inconsistent and could cause brief confusion if a fault code is looked up in fault_to_cluster_ after its cluster has expired.
Notes
Identified during review of #211 by @mfaferek93.