Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /ready endpoint to table-manager #1736

Merged
merged 1 commit into from
Mar 11, 2020

Conversation

adityacs
Copy link
Contributor

@adityacs adityacs commented Feb 25, 2020

What this PR does / why we need it:
Add /ready endpoint to table-manager and query-frontend

Which issue(s) this PR fixes:
Fixes #1722

@@ -287,6 +288,21 @@ func (t *Loki) initTableManager() error {
}

t.tableManager.Start()

t.server.HTTP.Path("/ready").Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// time.Sleep(60 * time.Seconds)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we wait here for some time to allow table-manager to sync completely first?


t.server.HTTP.Path("/ready").Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// time.Sleep(60 * time.Seconds)
err := t.tableManager.SyncTables(context.Background())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a safe way to do. The loop here https://github.com/cortexproject/cortex/blob/master/pkg/chunk/table_manager.go#L190 always does syncTables, we might overlap the function call.

Other way would be we can call ListTables and check if it gives some number without error. However, cortex doesn't give us a way to check if the returned number is same as expected number.

Any other better ways?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready should only verify the process is still running, so you should use a ready we already have somewhere else returning only 200 and doing nothing else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my clarity. SyncTables returns errors at different stage. So, if SyncTables hasn't gone through correctly then the table manager won't be able to do any action. We might say table-manager is ready since process is running but it might not be actually ready to do any action. Is this behaviour OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only check if the sync manager is running properly. Unfortunately, cortex doesn't expose if the manager is still running although that should be easy to add reading their code:

func (m *TableManager) IsRunning() bool {
     select {
      case <-m.done:
        return false
      default:
        return true
}

Then you can use that in your ready handler.

I'd suggest for now an empty handler like I've linked you.

Then you can create an issue to improve:

  • first send a PR to cortex
  • come back here and add the check, if it's not running then it should fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thanks


// Once the execution reaches this point, all initialization has been
// done and the query-frontend is ready to serve, so we're just returning a 200.
t.server.HTTP.Path("/ready").Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is interesting to have this for the frontend too but not sure we want to do it in this PR. Mainly because I think it might be different.

Currently it returns

curl localhost:9090/ready
no org id

So we might just need to change that. but for now let's remove it so we can close the issue for the table manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Will remove frontend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you can create an issue though I think it's a good thing to fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@adityacs adityacs force-pushed the 1722_table_manager_endpoint branch from c65ae3f to ffc943e Compare March 9, 2020 15:58
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adityacs adityacs force-pushed the 1722_table_manager_endpoint branch from ffc943e to b6c4023 Compare March 9, 2020 15:59
@cyriltovena
Copy link
Contributor

Just a small conflict @adityacs if you have time.

@adityacs adityacs force-pushed the 1722_table_manager_endpoint branch 2 times, most recently from ddf14d6 to 5f3a190 Compare March 10, 2020 13:57
@adityacs adityacs force-pushed the 1722_table_manager_endpoint branch from 5f3a190 to 9b14726 Compare March 10, 2020 14:05
@adityacs
Copy link
Contributor Author

@cyriltovena resolved the conflicts

@codecov-io
Copy link

Codecov Report

Merging #1736 into master will decrease coverage by 0.04%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1736      +/-   ##
==========================================
- Coverage   64.05%   64.01%   -0.05%     
==========================================
  Files         121      121              
  Lines        9123     9129       +6     
==========================================
  Hits         5844     5844              
- Misses       2867     2873       +6     
  Partials      412      412              
Impacted Files Coverage Δ
pkg/loki/modules.go 10.29% <0.00%> (-0.24%) ⬇️

@pstibrany
Copy link
Member

Just a heads up... Cortex now uses different readiness handler (that works no matter the target) that I plan to bring over to Loki as well.

@cyriltovena
Copy link
Contributor

Just a heads up... Cortex now uses different readiness handler (that works no matter the target) that I plan to bring over to Loki as well.

What do you suggest ? I don't think we were aware of that.

@pstibrany
Copy link
Member

pstibrany commented Mar 10, 2020

What do you suggest ? I don't think we were aware of that.

If you want to include this in 1.4, go for it. For release after that, I should be finished with my changes, I will then consolidate these individual handlers into one.

(This is part of bringing services model to Loki. Upgrading Cortex was step 1, next step is converting Loki code to services)

@pstibrany
Copy link
Member

pstibrany commented Mar 11, 2020

One more note ... these individual /ready handlers will fight each other in single-binary mode.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena cyriltovena merged commit 89263b0 into grafana:master Mar 11, 2020
@cyriltovena
Copy link
Contributor

One more note ... these individual /ready handlers will fight each other in single-binary mode.

ho !

@cyriltovena
Copy link
Contributor

cyriltovena commented Mar 11, 2020

I think you're right Peter, but they won't really fight, the last one will take over. Looking forward to your PR anyways !

@pstibrany
Copy link
Member

I think you're right Peter, but they won't really fight, the last one will take over. Looking forward to your PR anyways !

Right, only one of them will be active. As long as they all have the same logic, it’s not a problem, but that’s not the case in Cortex (haven’t verified Loki yet). Still, overwriting handlers like that is a design smell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Table-manager endpoint /ready gives 404
4 participants