-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos Rule fails to load all rules files after reload - numFiles is dropping #4445
Labels
Comments
sevagh
changed the title
Thanos Rule (0.22.0-rc0) fails to load all rules files after reload - numFiles is dropping
Thanos Rule fails to load all rules files after reload - numFiles is dropping
Jul 14, 2021
I think duplicate of #4432 |
Like was suggested in #4432, using |
Thanks for testing and the detailed report! This bug appeared somewhere around the 0.19 version so you've hit it after upgrading. Let's try to fix it with #4442 |
Fixed by #4442. It's now covered by tests so it's working 100% 💪 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanos version: 0.22.0-rc.0 downloaded from the GitHub releases: https://github.com/thanos-io/thanos/releases/tag/v0.22.0-rc.0
OS: Debian 10 Buster, AMD64 architecture
The problem: I upgraded Thanos from v0.16.0 to v0.22.0. Since the upgrade, I noticed the Thanos Rule instances are failing to load all of the rules files.
If you do a
systemctl restart
, all the rules files are loaded. Then after the nextsystemctl reload
signal, the rules files drop:The total number of correct rules is the higher value, a little over 3000 rules. It only got that high immediately after a fresh daemon restart. After the next reload (every 1 hour, we sync new rules files from the self-service git repo onto the Thanos Rule machines and reload the daemon), the rules files dropped.
Initially I assumed it was a file descriptor problem and raised it through systemd, but I don't think that's related. The logs don't show any errors:
From the logs, we can numFiles is dropping ever since upgrading to 0.22.0-rc0 and sending a reload. The correct numFiles is 527:
Those are historical logs from the old version. From the upgrade to 0.22.0, we can see numFiles dropping below 527 after reloads:
The text was updated successfully, but these errors were encountered: