-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/ruler (take 2) #2458
Feature/ruler (take 2) #2458
Conversation
ruler memhistory encapsulates metricsHistory removes mutex from memhistory ruler module
Codecov Report
@@ Coverage Diff @@
## master #2458 +/- ##
==========================================
- Coverage 63.43% 63.03% -0.41%
==========================================
Files 166 168 +2
Lines 14584 14879 +295
==========================================
+ Hits 9252 9379 +127
- Misses 4595 4759 +164
- Partials 737 741 +4
|
@@ -28,6 +28,11 @@ type QueryParams interface { | |||
GetShards() []string | |||
} | |||
|
|||
// implicit holds default implementations | |||
type implicit struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: these changes are unrelated, but I snuck them in 😈
// Calling Start will set the RuleIter, unblock the MemStore, and start the run() function in a separate goroutine. | ||
func (m *MemStore) Start(iter RuleIter) { | ||
m.mgr = iter | ||
close(m.initiated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn if a panic is ok if someone calls Start twice? or would it be better to track if running and just do nothing if called twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. It's definitely a usage error if this is called twice and the caller ultimately comes from cortex which isn't expecting an error return type:
type ManagerFactory = func(
ctx context.Context,
userID string,
notifier *notifier.Manager,
logger log.Logger,
reg prometheus.Registerer,
) *rules.Manager
// This could be more efficient (logarithmic instead of linear) | ||
for ts, tsMap := range c.data { | ||
if ts < ns { | ||
delete(c.data, ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is dangerous? deleting from a slice while iterating over it? I think the pattern I usually see is to make a list of items to remove and then iterate that list to remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be fine (and is included in effective go)
cache.Set(m.ts, forStateVec) | ||
|
||
// Finally return the series if it exists | ||
smpl, ok = cache.Get(m.ts, ls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems redundant, didn't we just put this timestamp in the cache 2 lines ago?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We previously stored the entire vector resulting from evaluating the rule. When we request it from the cache we're looking for one possible element from that vector. We could iterate through the vector and label-match without using the cache, but it'd result in a lot of duplicate logic that's better contained in the cache IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Second take on the ruler after getting some upstream changes merged into Prometheus. This initial version hacks an in-memory series store which evaluates alerts upon request and temporarily caches the result locally instead of using a Prometheus metrics store for the
ALERTS_FOR_STATE
metric.Followups include: