-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evenly spread queriers across available nodes #6415
Conversation
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0.6% |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
- distributor -0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
As suggested by @dannykopping in the design doc, I added an entry to the changelog and the upgrade guide. |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
- distributor -0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good @salvacorts, but I think we should make it configurable.
If you want to keep running up to one querier per node, you will need to revert the changes for
production/ksonnet/loki/querier.libsonnet
made at 6415.
How about we offer both the node affinity & the topology spread via a config value (defaulted to topology spread), and allow configuring the max skew?
This could later be expanded for all components (defaulted to node affinity for now)
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
@dannykopping That's a good idea. I just added two new config parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Two minor nits
docs/sources/upgrading/_index.md
Outdated
@@ -36,8 +36,7 @@ The output is incredibly verbose as it shows the entire internal config struct u | |||
#### Evenly spread queriers across kubernetes nodes | |||
|
|||
We now evenly spread queriers across the available kubernetes nodes, but allowing more than one querier to be scheduled into the same node. | |||
If you want to keep running up to one querier per node, you will need to revert the changes for `production/ksonnet/loki/querier.libsonnet` | |||
made at [6415](https://github.com/grafana/loki/pull/6415). | |||
If you want to keep running up to one querier per node, set `$._config.querier.use_topology_spread` to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to keep running up to one querier per node, set `$._config.querier.use_topology_spread` to false. | |
If you want to run at most a single querier per node, set `$._config.querier.use_topology_spread` to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
- querier/queryrange -0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0.4%
+ loki 0% |
* Evenly spread queriers across available nodes * Fix lint issue * Add entry to the CHANGELOG and the Upgrade Guide * Make topology spread configurable * Apply CR feedback
This reverts commit 5abd9d2.
What this PR does / why we need it:
Currently, queriers run on different nodes. We do this by using the following anti-affinity rule:
The original reasoning behind this is to make our systems more resilient to node failures. For example, if we have 5 queriers, all scheduled in the same node, and the node fails, we would not be able to serve queries until the pods get rescheduled and running on another node.
When new queriers are created, they need to be scheduled into different nodes. If there are not enough nodes in the cluster, new nodes need to be provisioned. This has two implications:
This PR uses Kubernetes's
TopologySpreadConstraints
to evenly spread queriers across the available nodes, but allows more than one querier to be scheduled into the same node.Checklist
CHANGELOG.md
.docs/sources/upgrading/_index.md