-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add /ready endpoint to table-manager #1736
Add /ready endpoint to table-manager #1736
Conversation
pkg/loki/modules.go
Outdated
@@ -287,6 +288,21 @@ func (t *Loki) initTableManager() error { | |||
} | |||
|
|||
t.tableManager.Start() | |||
|
|||
t.server.HTTP.Path("/ready").Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | |||
// time.Sleep(60 * time.Seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we wait here for some time to allow table-manager
to sync completely first?
pkg/loki/modules.go
Outdated
|
||
t.server.HTTP.Path("/ready").Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
// time.Sleep(60 * time.Seconds) | ||
err := t.tableManager.SyncTables(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a safe way to do. The loop here https://github.com/cortexproject/cortex/blob/master/pkg/chunk/table_manager.go#L190 always does syncTables
, we might overlap the function call.
Other way would be we can call ListTables
and check if it gives some number without error. However, cortex doesn't give us a way to check if the returned number is same as expected number.
Any other better ways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready should only verify the process is still running, so you should use a ready we already have somewhere else returning only 200 and doing nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my clarity. SyncTables
returns errors at different stage. So, if SyncTables
hasn't gone through correctly then the table manager won't be able to do any action. We might say table-manager is ready since process is running but it might not be actually ready to do any action. Is this behaviour OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should only check if the sync manager is running properly. Unfortunately, cortex doesn't expose if the manager is still running although that should be easy to add reading their code:
func (m *TableManager) IsRunning() bool {
select {
case <-m.done:
return false
default:
return true
}
Then you can use that in your ready handler.
I'd suggest for now an empty handler like I've linked you.
Then you can create an issue to improve:
- first send a PR to cortex
- come back here and add the check, if it's not running then it should fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Thanks
7bd9f74
to
c65ae3f
Compare
pkg/loki/modules.go
Outdated
|
||
// Once the execution reaches this point, all initialization has been | ||
// done and the query-frontend is ready to serve, so we're just returning a 200. | ||
t.server.HTTP.Path("/ready").Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is interesting to have this for the frontend too but not sure we want to do it in this PR. Mainly because I think it might be different.
Currently it returns
curl localhost:9090/ready
no org id
So we might just need to change that. but for now let's remove it so we can close the issue for the table manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Will remove frontend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you can create an issue though I think it's a good thing to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
c65ae3f
to
ffc943e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ffc943e
to
b6c4023
Compare
Just a small conflict @adityacs if you have time. |
ddf14d6
to
5f3a190
Compare
5f3a190
to
9b14726
Compare
@cyriltovena resolved the conflicts |
Codecov Report
@@ Coverage Diff @@
## master #1736 +/- ##
==========================================
- Coverage 64.05% 64.01% -0.05%
==========================================
Files 121 121
Lines 9123 9129 +6
==========================================
Hits 5844 5844
- Misses 2867 2873 +6
Partials 412 412
|
Just a heads up... Cortex now uses different readiness handler (that works no matter the target) that I plan to bring over to Loki as well. |
What do you suggest ? I don't think we were aware of that. |
If you want to include this in 1.4, go for it. For release after that, I should be finished with my changes, I will then consolidate these individual handlers into one. (This is part of bringing services model to Loki. Upgrading Cortex was step 1, next step is converting Loki code to services) |
One more note ... these individual /ready handlers will fight each other in single-binary mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ho ! |
I think you're right Peter, but they won't really fight, the last one will take over. Looking forward to your PR anyways ! |
Right, only one of them will be active. As long as they all have the same logic, it’s not a problem, but that’s not the case in Cortex (haven’t verified Loki yet). Still, overwriting handlers like that is a design smell. |
What this PR does / why we need it:
Add
/ready
endpoint to table-manager and query-frontendWhich issue(s) this PR fixes:
Fixes #1722