-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/rule: Fix bugs where rules were out of sync. #1439
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think that makes sense but have one suggestion
Thanks for helping 👍
On top of the suggestion below I think we really need more unit tests to:
- Detect those bug early
- Do not repeat same bug anymore
- Confirm fix works (:
Do you think we can add some tests?
pkg/rule/rule.go
Outdated
@@ -113,7 +113,10 @@ func (r RuleGroup) MarshalYAML() (interface{}, error) { | |||
func (m *Managers) Update(dataDir string, evalInterval time.Duration, files []string) error { | |||
var ( | |||
errs = tsdberrors.MultiError{} | |||
filesMap = map[storepb.PartialResponseStrategy][]string{} | |||
filesMap = map[storepb.PartialResponseStrategy][]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I think that will work, but to make it verbose, clearer and bit more efficient I would suggest add one more loop at the end:
for s, updater := range *m {
if _, ok := filesMap[s]; ok {
// We already updated those above.
continue
}
// Rules removed update updater if needed.
if len(updater.RuleGroups()) == 0 {
continue
}
if err := updater.Update(evalInterval, []string{}, nil); err != nil {
errs = append(errs, err)
}
}
What do you think? (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it 's more clear. how about merge the last loop with your loop?
for s, updater := range *m {
fs := filesMap[s]
if err := updater.Update(evalInterval, fs, nil); err != nil {
errs = append(errs, err)
continue
}
}
if ok, i will update the pr.
I will add a testcase for this.
Yes, it is necessary to add test cases. |
@johncming do you have any free time to finish up this PR? :p |
@johncming do you have time to rebase? |
fixed. |
b75d521
to
5eae069
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pkg/rule/rule_test.go
Outdated
err = m.Update(1*time.Second, []string{ | ||
filepath.Join(dir, "no_strategy.yaml"), | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) remove blank line
pkg/rule/rule_test.go
Outdated
dir, err := ioutil.TempDir("", "test_rule_rule_groups") | ||
testutil.Ok(t, err) | ||
defer func() { testutil.Ok(t, os.RemoveAll(dir)) }() | ||
err = os.MkdirAll(filepath.Join(dir, "subdir"), 0775) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line and the next one aren't needed AFAICT.
I think you'll need to update the CHANGELOG and reference #1739 too. |
Signed-off-by: johncming <johncming@yahoo.com>
@johncming can you rebase? |
rebased. |
@@ -44,6 +44,7 @@ Compactor now properly handles partial block uploads for all operation like rete | |||
- [#1882](https://github.com/thanos-io/thanos/pull/1882) Receive: upload to object storage as 'receive' rather than 'sidecar'. | |||
- [#1907](https://github.com/thanos-io/thanos/pull/1907) Store: Fixed the duration unit for the metric `thanos_bucket_store_series_gate_duration_seconds`. | |||
- [#1931](https://github.com/thanos-io/thanos/pull/1931) Compact: Fixed the compactor successfully exiting when actually an error occurred while compacting a blocks group. | |||
- [#1439](https://github.com/thanos-io/thanos/pull/1439) Rule: Fix bugs where rules were out of sync. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should go in the Unreleased
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -213,6 +213,21 @@ func (m *Manager) Update(evalInterval time.Duration, files []string) error { | |||
continue | |||
} | |||
} | |||
|
|||
for s, mgr := range m.mgrs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth adding a comment explaining that this section removes the rules from a manager when a strategy has no more rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sorry for lag in review. Let's fix @simonpasquier comments and merge 👍
LGTM after fixes.
@johncming thanks for your persistence! Care to fix the code according to the last comments so that we could merge it and fix it? |
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions. |
Any reason why this PR was never merged, was this issue fixed by another PR? Thanks! @johncming @bwplotka |
@lilic it has been closed by the bot because it has seen no activity in a while. @johncming would you be able to reopen it and address the comments? |
@johncming if you don't have time to address the comments for any reason, I can also takeover and pick up this PR from where you leftover with your commits of course? Thanks! |
Thanos rule loading alerts about out-of-sync rule files
Specifically, the following scenario:
Rule file rule. Yaml, which contains some rules, through the reload command, thanos rule load rules.
If you clear the contents of rule. Yaml, but leave the file in, the old rules in thanos rule will not be cleared.
This bug leads to a common situation (all rules in one file)
If thanos rule has 100 rules, it can only clear 99 rules at most.
Closes #1739