-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
returns error messages when trigger reload with http #1848
Changes from 3 commits
7d8f413
3dea587
3ae2be2
fef63bc
44f0e05
7324528
1f6f3f5
9048e0c
dbfca4e
cd8d601
e6f5d97
084682f
9583043
daca661
6def413
94a653a
315e3c3
f397bed
56334f3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -472,49 +472,26 @@ func runRule( | |||||||||
} | ||||||||||
|
||||||||||
// Handle reload and termination interrupts. | ||||||||||
reload := make(chan struct{}, 1) | ||||||||||
{ | ||||||||||
cancel := make(chan struct{}) | ||||||||||
reload <- struct{}{} // Initial reload. | ||||||||||
|
||||||||||
g.Add(func() error { | ||||||||||
//initialize rules. | ||||||||||
if err := reloadRules(logger, ruleFiles, ruleMgr, evalInterval, configSuccess, configSuccessTime, rulesLoaded); err != nil { | ||||||||||
level.Error(logger).Log("msg", "initialize rules failed", "err", err) | ||||||||||
//returns when initialize with invalid pattern error. | ||||||||||
if _, ok := err.(*errInvalidPattern); ok { | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider using recently introduced errors API. https://blog.golang.org/go1.13-errors // Similar to:
// if e, ok := err.(*QueryError); ok { … }
var e *QueryError
if errors.As(err, &e) {
// err is a *QueryError, and e is set to the error's value
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. Will look into it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kakkoyun According to the coverstaion, I should not change the behavior the the rule reloading. This code will be removed. |
||||||||||
return err | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This changes behaviour compared to the previous implementation. Previously, it was logging and proceeding to other rule files. if err != nil {
// The only error can be a bad pattern.
level.Error(logger).Log("msg", "retrieving rule files failed. Ignoring file.", "pattern", pat, "err", err)
continue
} If this is what we want, you should update the CHANGELOG accordingly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kakkoyun This comes from orignal logic. I wrote it to keep same logic with orignal one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No actually. That's what I'm trying to tell, it doesn't keep the same logic if I'm not mistaken. If you check your var files []string
for _, pat := range ruleFiles {
fs, err := filepath.Glob(pat)
if err != nil {
// Check errInvalidPattern when initialize.
return &errInvalidPattern{err, pat}
}
files = append(files, fs...)
} The old code, just prints an error and continue with the execution and checks other files. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kakkoyun Sorry for my misunderstanding. Cannot remeber why change like this, since it's long time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kakkoyun Hi, the code comes from prometheus. Please have a look at the link There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The change is definitely reasonable, all I'm saying is that we should document the behaviour change. And of course, if also the maintainers agree. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my side, I think @kakkoyun is right.
Yes, if we are changing behavior... I think we should not.
My opinion is that we should fail and proceed to another rule. No point in failing everything and stop as it does not crash (and should not!) ruler. Also we should be consistent to Prometheus behavior. (: Can we change the logic here to what was previously? @00arthur00 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bwplotka Thanks for your reply. I will revert the logic. I.e, keep log and no termination. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kakkoyun committed. Changes include logic revert and changelog modification. Please help to review. Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bwplotka I think we should error and exit out if rule file includes faulty rules when we are starting component for the first time. And then for any subsequent update, it should just log the error and continue running without any crash. What do you think? |
||||||||||
} | ||||||||||
} | ||||||||||
for { | ||||||||||
select { | ||||||||||
case <-cancel: | ||||||||||
return errors.New("canceled") | ||||||||||
case <-reload: | ||||||||||
case <-reloadSignal: | ||||||||||
} | ||||||||||
|
||||||||||
level.Debug(logger).Log("msg", "configured rule files", "files", strings.Join(ruleFiles, ",")) | ||||||||||
var files []string | ||||||||||
for _, pat := range ruleFiles { | ||||||||||
fs, err := filepath.Glob(pat) | ||||||||||
if err != nil { | ||||||||||
// The only error can be a bad pattern. | ||||||||||
level.Error(logger).Log("msg", "retrieving rule files failed. Ignoring file.", "pattern", pat, "err", err) | ||||||||||
continue | ||||||||||
if err := reloadRules(logger, ruleFiles, ruleMgr, evalInterval, configSuccess, configSuccessTime, rulesLoaded); err != nil { | ||||||||||
level.Error(logger).Log("msg", "reload rules by sighup failed", "err", err) | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not only sighup TBH, can be HTTP There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bwplotka Yes. These code hanlde sighup only. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice to treat signal and HTTP reload exactly the same way (as it was before). Why not? We could reuse There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@bwplotka If so, we need another channel to receive the error message for webhandler. So a new struct should wrap reloadSignal and errMsg. Maybe it is redundant for the current implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a new struct to handler webhandler. Since we have a reloadSignal as an input paramter, we need select reloadSignal always. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LGTM! |
||||||||||
} | ||||||||||
|
||||||||||
files = append(files, fs...) | ||||||||||
} | ||||||||||
|
||||||||||
level.Info(logger).Log("msg", "reload rule files", "numFiles", len(files)) | ||||||||||
|
||||||||||
if err := ruleMgr.Update(evalInterval, files); err != nil { | ||||||||||
configSuccess.Set(0) | ||||||||||
level.Error(logger).Log("msg", "reloading rules failed", "err", err) | ||||||||||
continue | ||||||||||
} | ||||||||||
|
||||||||||
configSuccess.Set(1) | ||||||||||
configSuccessTime.SetToCurrentTime() | ||||||||||
|
||||||||||
rulesLoaded.Reset() | ||||||||||
for _, group := range ruleMgr.RuleGroups() { | ||||||||||
rulesLoaded.WithLabelValues(group.PartialResponseStrategy.String(), group.File(), group.Name()).Set(float64(len(group.Rules()))) | ||||||||||
case <-cancel: | ||||||||||
return errors.New("canceled") | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know it's not yours, but can we move to |
||||||||||
} | ||||||||||
|
||||||||||
} | ||||||||||
}, func(error) { | ||||||||||
close(cancel) | ||||||||||
|
@@ -564,7 +541,10 @@ func runRule( | |||||||||
} | ||||||||||
|
||||||||||
router.WithPrefix(webRoutePrefix).Post("/-/reload", func(w http.ResponseWriter, r *http.Request) { | ||||||||||
reload <- struct{}{} | ||||||||||
if err := reloadRules(logger, ruleFiles, ruleMgr, evalInterval, configSuccess, configSuccessTime, rulesLoaded); err != nil { | ||||||||||
level.Error(logger).Log("msg", "reload rules by webhandler failed", "err", err) | ||||||||||
http.Error(w, err.Error(), http.StatusInternalServerError) | ||||||||||
} | ||||||||||
}) | ||||||||||
|
||||||||||
flagsMap := map[string]string{ | ||||||||||
|
@@ -758,3 +738,47 @@ func addDiscoveryGroups(g *run.Group, c *http_util.Client, interval time.Duratio | |||||||||
cancel() | ||||||||||
}) | ||||||||||
} | ||||||||||
|
||||||||||
type errInvalidPattern struct { | ||||||||||
err error | ||||||||||
pat string | ||||||||||
} | ||||||||||
|
||||||||||
func (e *errInvalidPattern) Error() string { | ||||||||||
return errors.Wrapf(e.err, "retrieving rule files failed. Ignoring file. pattern %s", e.pat).Error() | ||||||||||
} | ||||||||||
00arthur00 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
func reloadRules(logger log.Logger, | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function has too many parameters, it makes it harder to read. And most of the parameters are metrics, consider using a struct to collect metrics as it had done in thanos/cmd/thanos/downsample.go Lines 52 to 55 in 021f623
and pass it around. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kakkoyun Committed. Please help to review. Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good now 👍 |
||||||||||
ruleFiles []string, | ||||||||||
ruleMgr *thanosrule.Manager, | ||||||||||
evalInterval time.Duration, | ||||||||||
configSuccess prometheus.Gauge, | ||||||||||
configSuccessTime prometheus.Gauge, | ||||||||||
rulesLoaded *prometheus.GaugeVec) error { | ||||||||||
level.Debug(logger).Log("msg", "configured rule files", "files", strings.Join(ruleFiles, ",")) | ||||||||||
var files []string | ||||||||||
for _, pat := range ruleFiles { | ||||||||||
fs, err := filepath.Glob(pat) | ||||||||||
if err != nil { | ||||||||||
//check errInvalidPattern when initialize. | ||||||||||
return &errInvalidPattern{err, pat} | ||||||||||
} | ||||||||||
|
||||||||||
files = append(files, fs...) | ||||||||||
} | ||||||||||
|
||||||||||
level.Info(logger).Log("msg", "reload rule files", "numFiles", len(files)) | ||||||||||
|
||||||||||
if err := ruleMgr.Update(evalInterval, files); err != nil { | ||||||||||
configSuccess.Set(0) | ||||||||||
return errors.Wrap(err, "reloading rules failed") | ||||||||||
} | ||||||||||
|
||||||||||
configSuccess.Set(1) | ||||||||||
configSuccessTime.Set(float64(time.Now().UnixNano()) / 1e9) | ||||||||||
|
||||||||||
rulesLoaded.Reset() | ||||||||||
for _, group := range ruleMgr.RuleGroups() { | ||||||||||
rulesLoaded.WithLabelValues(group.PartialResponseStrategy.String(), group.File(), group.Name()).Set(float64(len(group.Rules()))) | ||||||||||
} | ||||||||||
return nil | ||||||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the initial load, we can actually fail and return the combined multierror.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kakkoyun The maintainer doesn't agree the to stop the everything.
please check the comment here. #1848 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinions about it, both works for me. In any case, I've pinged @bwplotka on the thread for the final decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, initial can fail indeed (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwplotka @kakkoyun Let's focus on the reload process first in this PR. Since the failure in iniliazation is breaking change.
After the reload related code is merged, I will start a new PR for the initialization failure only.
Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, happy with that.