Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.*: Apply webrouteprefix on debug and metric endpoints #4471

Closed
wants to merge 12 commits into from

Conversation

wbh1
Copy link
Contributor

@wbh1 wbh1 commented Jul 22, 2021

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Fixes #2727

Changes

This is a continuation of the work started by @ranjithkumar007 in #2524.

I've completed the rebase, fixed prefix handling in Thanos Store, added prefix handling logic to Thanos Compact (the flags were there but ignored, previously), and added additional testing around prefix behaviors in e2e tests.

Related issues:

Essentially, it prefixes the debug and metrics endpoints with the value of --web.route-prefix if available.

For those like me who have Thanos components behind a reverse proxy (e.g. https://example.com/thanos/query) then this will break their access to some endpoints (e.g. /metrics, /-/ready and /-/healthy) if they hard coded around it like we did (see below).

Bad:

   # This will break
    location = /thanos/query/metrics {
       proxy_pass http://127.0.0.1:10902/metrics;
    }

Good:

   # This will work
    location /thanos/query/ {
        proxy_pass http://127.0.0.1:10902/thanos/query/;
    }

Verification

Verified with local tests and deployment of a build to our own servers to verify that the endpoints are served as expected both with and without --web.external-prefix (and thereby --web.route-prefix) being set.

Additionally, added more e2e tests to verify this works as expected on the components that support setting a prefix.

@wbh1 wbh1 force-pushed the webroute branch 2 times, most recently from e1990ff to 41f9805 Compare July 22, 2021 20:17
@wbh1
Copy link
Contributor Author

wbh1 commented Jul 22, 2021

I've rerun the failed e2e test TestCompactWithStoreGateway without issue locally without any changes, so I think that might just be flaky. I've been unable to make TestRule succeed, but I don't see anything that I changed that would've affected it.

@ranjithkumar007
Copy link
Contributor

Thank you @wbh1 for fixing this.

wiardvanrij
wiardvanrij previously approved these changes Jul 28, 2021
Copy link
Member

@wiardvanrij wiardvanrij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It LGTM, but I also have that feeling about "can't we do this smarter?". Still, looks good and thank you!

CHANGELOG.md Outdated
@@ -14,6 +14,8 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re

### Fixed

[#4471](https://github.com/thanos-io/thanos/pull/4471) **breaking:** Query/Rule/Compact/Store: Serve `/debug/*`, `/-/ready`, `/-/healthy`, and `/metrics` routes under the prefix from `--web.route-prefix`, if set.
Copy link
Member

@wiardvanrij wiardvanrij Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify it, let's start with **breaking** if --web.route-prefix is set: serves endpoints x,y,z now actually under the route-prefix. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. I'll update it!

Copy link
Contributor Author

@wbh1 wbh1 Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wiardvanrij - updated ✅ . Thanks!

@wbh1
Copy link
Contributor Author

wbh1 commented Jul 28, 2021

It LGTM, but I also have that feeling about "can't we do this smarter?". Still, looks good and thank you!

I agree that there's likely a better way to do it that involves more refactoring -- maybe with including a webConfig for all components? Something to be addressed in a future PR, perhaps 😛

thanos/cmd/thanos/config.go

Lines 156 to 161 in 83419bc

type webConfig struct {
routePrefix string
externalPrefix string
prefixHeaderName string
disableCORS bool
}

@wbh1 wbh1 requested a review from wiardvanrij July 28, 2021 13:44
@wbh1 wbh1 force-pushed the webroute branch 2 times, most recently from 2a4788b to a2fed5b Compare July 29, 2021 17:00
@wbh1
Copy link
Contributor Author

wbh1 commented Aug 4, 2021

Hey @wiardvanrij - anything else that you need for this?

@wiardvanrij
Copy link
Member

Hey @wiardvanrij - anything else that you need for this?

No, still LGTM. I will ping others to get this reviewed/merged :) Thanks for picking this up!

@wiardvanrij
Copy link
Member

It LGTM, but I also have that feeling about "can't we do this smarter?". Still, looks good and thank you!

I agree that there's likely a better way to do it that involves more refactoring -- maybe with including a webConfig for all components? Something to be addressed in a future PR, perhaps 😛

thanos/cmd/thanos/config.go

Lines 156 to 161 in 83419bc

type webConfig struct {
routePrefix string
externalPrefix string
prefixHeaderName string
disableCORS bool
}

I think this makes sense. Would you be interested to write an issue for it and then perhaps implement it? I'm asking for an issue so that it can be (re)viewed before you work a lot on it.

@yeya24
Copy link
Contributor

yeya24 commented Aug 5, 2021

The code change looks good. But this is really a breaking change and might impact our users. cc @thanos-io/thanos-maintainers for review.

@wbh1
Copy link
Contributor Author

wbh1 commented Sep 10, 2021

Resurfacing this discussion @yeya24 😄 Could you ping again?

@stale
Copy link

stale bot commented Nov 25, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Nov 25, 2021
@wbh1
Copy link
Contributor Author

wbh1 commented Nov 25, 2021

Still relevant. Needs review.

@stale stale bot removed the stale label Nov 25, 2021
@stale
Copy link

stale bot commented Mar 2, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Mar 2, 2022
@wbh1
Copy link
Contributor Author

wbh1 commented Mar 14, 2022

Still relevant. Needs review.

@stale stale bot removed the stale label Mar 14, 2022
@GiedriusS
Copy link
Member

Will take a look soon, sorry for the delay. Could you please rebase this?

Signed-off-by: Will Hegedus <whegedus@linode.com>
wbh1 added 2 commits April 20, 2022 15:34
Signed-off-by: Will Hegedus <whegedus@linode.com>
Signed-off-by: Will Hegedus <whegedus@linode.com>
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work! However, historically we've had many "broken" implementations of the prefix functionality. So, could you please add some test cases to TestQueryExternalPrefixWithoutReverseProxy or some similar test that would test out these changes? Namely, we are looking to automatically test whether /metrics and /debug endpoints are successfully accessible with these changes just like we would expect.

@pull-request-size pull-request-size bot added size/L and removed size/M labels May 8, 2022
wbh1 added 6 commits May 8, 2022 14:04
This adds additional endpoints that were not previously behind the routePrefix
to the testing suite and improves documentation of those tests.

Signed-off-by: Will Hegedus <whegedus@linode.com>
This will now return errors if an HTTP error (status >= 400) is returned by the
chrome driver.

Additionally, the Chrome context is now modified to inherit the deadline from the running test suite.

Signed-off-by: Will Hegedus <whegedus@linode.com>
Validate that the --web.route-prefix flag adjusts expected routes.

Signed-off-by: Will Hegedus <whegedus@linode.com>
Thanos Compact supports the flags for webconf, but previously was not actually doing anything with them.

Signed-off-by: Will Hegedus <whegedus@linode.com>
Refactors Thanos Store to use the registerFlag function to ensure it receives all
flags that are applicable to webConfig. Previously, only 3/4 were manually defined.

Signed-off-by: Will Hegedus <whegedus@linode.com>
Previously, routes were not getting the value of the --web.*-prefix flags applied.

Signed-off-by: Will Hegedus <whegedus@linode.com>
@wbh1 wbh1 force-pushed the webroute branch 2 times, most recently from e03c279 to e9e8bf0 Compare May 8, 2022 21:24
wbh1 added 3 commits May 9, 2022 09:20
Signed-off-by: Will Hegedus <whegedus@linode.com>
Signed-off-by: Will Hegedus <whegedus@linode.com>
Signed-off-by: Will Hegedus <whegedus@linode.com>
@wbh1
Copy link
Contributor Author

wbh1 commented May 9, 2022

Alright, @GiedriusS -- this should be ready for review now.

I've added additional e2e testing for the components that actually support the prefix (Compact, Query, Rule, and Store), which helped catch bugs in how Store and Compact were handling route prefixes so thank you for that 😁.


Unrelated to the merging of this PR, but related to the topic of it:

In working on this PR, I've realized how each component largely handles the setting up of its routing configuration which makes changes like this difficult. Which leads me to want to make an effort in the future to consolidate the HTTP server/routing configuration for all components to ensure they're all consistent with the way they serve their routes.

Part of this work is already done in how some components use the webConfig struct, and also the prometheus/common/route package (example in Store). This leads to a lot of repeated code (see the similarities in Rule and Query).

Is this an idea that should go through the proposal process? Looking for next steps before starting any effort towards it.

@stale
Copy link

stale bot commented Jul 10, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jul 10, 2022
@wbh1
Copy link
Contributor Author

wbh1 commented Jul 11, 2022

Not stale. Just need to get around to rebasing it again.

@GiedriusS
Copy link
Member

Sorry for the delay, I was on vacation. Will take a look in a bit & respond to you

@stale stale bot removed the stale label Jul 11, 2022
@stale
Copy link

stale bot commented Sep 21, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 21, 2022
@stale stale bot closed this Oct 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compact Web: Missing --web.route-prefix and --web.external-prefix wrong
5 participants