-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: Added Proposal for Discovery Endpoint idea (InfoAPI) #3703
Conversation
4a2093f
to
a1913e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Very much looking forward to having this unified! I do think the alternative with the comments I added are more feasible, but I don't feel too strongly, I could see either working.
Minor side note, regardless of what we choose to do, we should make sure to document a setup once https://github.com/thanos-io/thanos/blob/master/docs/proposals/202005_scalable-rule-storage.md is also implemented, as that will be a bit odd of a setup, requiring potentially multiple --endpoint
flags as receive and rule component may be within the same Pod (in the case of the user using the fallback solution proposed in that other proposal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work, super clean design doc 💪🏽
Let's iterate over it. Feel free to disagree with my points, added few thoughts 🤗
@yeya24 @pracucci @kakkoyun @GiedriusS @squat feedback wanted - it's major API redesign so let's make it carefully
|
||
Add a new flag called `--endpoint` to Thanos query, and auto-discover what services that endpoint is serving based on metadata each gRPC server exposes. | ||
|
||
The plan would be for the initial discovery of gRPC services available to happen via gRPC reflection. To get further information, we would be adding an Info method to the Rules API. This would contain similar info or metadata information as we get right now from the Store Info method. Potentially in the future, we would implement this same Info method for other services in the, for example, the Targets, Examplers, etc. As mentioned the discovery would happen via reflection initially, the Info method in the Rules API would expose extra metadata about external labels for now, and we can always add more data to it in the future as needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think that might work and is clean, however, there are few other downsides. Can list more ups & downs between this and alternatives? 🤗 Right now we talked only about upgradability.
-
For example what about overhead? In this case, what if the API server implements both Store and Rules (and maybe Exemplars and Metadata as well someday)? Which Info do we check for healthiness and metadata propagation? Both? That introduces 4x more very often made microservices call. 🤔
-
What about boilerplate when adding a new API?
Additionally: Important details that might be missing here is health checking
as well. We use Info
to check if the underlying API is available or not. Do we change that, or does it stays in?
|
||
### Questions to answer | ||
|
||
1. What guarantees do we give to the users, when migrating between versions? Can we just remove flags or do we need to fallback to existing ones? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change flags, but requiring strict order of upgrade would be nice to avoid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, for now it then make more sense to keep the flags and upgrade would not need to be ordered and we can remove later on. I added this part to the work plan, let me know what you think!
|
||
### Work plan | ||
|
||
Start by deprecating the `--store=<address> --rule=<address>` and replace with `--endpoint=<address>` in the Thanos Query component itself and perform discovery in the Thanos Query component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need to write down upgradability plan (or why it's not needed if it's not) for chosen option (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, let me know how that looks and if there is anything else I should add there? Thanks!
|
||
Currently, in Thanos Query, the discovery of rules APIs happens via Store API's Info method, this makes it harder if we ever want to not have a coupling to the Store API (which is already planned for [scalable ruler proposal](https://github.com/thanos-io/thanos/blob/master/docs/proposals/202005_scalable-rule-storage.md)). | ||
|
||
We also require passing two different flags to the Thanos Query component `--store=<address>` and `--rule=<address>` and perform DNS discovery on often the same address multiple times, this can cause occasional DNS issues. Which is especially confusing when one works and the other doesn't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also require passing two different flags to the Thanos Query component `--store=<address>` and `--rule=<address>` and perform DNS discovery on often the same address multiple times, this can cause occasional DNS issues. Which is especially confusing when one works and the other doesn't. | |
We also require passing two different flags to the Thanos Query component `--store=<address>` and `--rule=<address>` and perform DNS discovery on often the same address multiple times, **this can cause occasional DNS issues**. Which is especially confusing when one works and the other doesn't. |
Maybe we can make this bold? Am I correctly understanding that this is basically the entire problem with the current approach? A sentence or two on what explicitly were the problems with DNS would be good to know/learn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem is that we have to perform calls often twice because in a lot of use cases the endpoints we pass for ruler and store to the querier component, tend to be the same address. This for us at least caused DNS resolving issues, and one got resolved another did not.
Another motivation is to also unite the discovery to be able to just pass an endpoint and not needing to couple the ruler to the store APIs as it is currently in the thanos querier, that is described in the initial issue and something we found as well when going through the code. So if just ruler is discovered we can at least potentially fetch rules and serve those. I believe this is also one of the uses cases for the work you are doing?
So I would say to sum up, both of those are the main motivation, not sure I would highlight one over the other, wdyt?
Are we ready for another review round? 🤗 |
No sorry, I was extremely busy with other company internal things and did not have a chance to address any of the comments yet. But will do it today! If this is super urgent to be implemented (e.g. this week) and someone has more time they can take over as well, let me know and I will close this out! Thanks for patience 😅 |
fd63713
to
510e9a0
Compare
510e9a0
to
dc9d993
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like how short and precise this turned out. I think it's a testament to how well this was thought out. Nice work!
lgtm
Signed-off-by: Lili Cosic <cosiclili@gmail.com>
dc9d993
to
8f35ad2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid. LGTM. Agree with @brancz - good discussion.
Minor typo only, not a blocker 🤗
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
* CHANGELOG.md: add v0.16.0 link (thanos-io#3697) This small commit fixes the changelog entry for release v0.16.0 so that the section title includes a link to the actual release. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Delete deletion-mark.json at last when deleting a block (thanos-io#3661) * Delete deletion-mark.json at last when deleting a block Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed linter Signed-off-by: Marco Pracucci <marco@pracucci.com> * fixed broken links (thanos-io#3645) Signed-off-by: Namanl2001 <namanlakhwani@gmail.com> * chore(tutorials): fix some broken links in 2-lts (thanos-io#3702) Signed-off-by: Bradley <bwilsonhunt@gmail.com> * Fix race condition in BinaryReader.LookupSymbol() (thanos-io#3705) * Fix race condition in BinaryReader.LookupSymbol() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed BinaryReader receiver and added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * ui: make old bucket viewer UI work with vanilla blocks (thanos-io#3700) * ui: make old UI work with vanilla blocks Make the old bucket viewer UI work with vanilla Prometheus blocks by checking whether the `Thanos` part exists before formatting the HTML. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * CHANGELOG: update Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * ui/bucket: fix according to @squat's suggestions Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * ui: update bindata after latest changes Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Query-frontend CMD: use detailed text description Signed-off-by: dmaiocchi <sodar@riseup.net> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Upgrade Cortex and remove SwiftConfig (thanos-io#3708) Signed-off-by: Marco Pracucci <marco@pracucci.com> * chore(deps): upgrading hugo (v0.80.0) fixes thanos-io#3653 (thanos-io#3714) Signed-off-by: Bradley <bwilsonhunt@gmail.com> * store: Make more S3 http.Transport settings configurable. (thanos-io#3657) The http.Transport is not auto-tuning and one size does not seem to fit all cases. In order to respond to a query a store gateway might need to fetch a large (thousands) of postings, series and chunks. While the number of idle connections has been increased recently it can still be too low or expose bursty (opening, closing) behavior. Allow to tune most of the http.Transport parameters. I considered embedding the full Transport but there is already a Transport member. Signed-off-by: Holger Hans Peter Freyther <holger@freyther.de> * added definiton of chunk and referenced it in the docs (thanos-io#3629) * Update design.md Update store.md Update design.md Update storage.md Update troubleshooting.md Signed-off-by: Biswajit Ghosh <biswajitghosh98@iitkgp.ac.in> * Update design.md Update store.md Update design.md Update troubleshooting.md Update storage.md Signed-off-by: Biswajit Ghosh <biswajitghosh98@iitkgp.ac.in> * Upgraded to newest bingo. (thanos-io#3718) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Update help messages with a consistent use of capitals and periods. (thanos-io#3727) * Update help messages with a consistent use of capitals and periods. Signed-off-by: Matt Whitney <mattjwhitney@gmail.com> * Update documentation to reflect message changes. Signed-off-by: Matt Whitney <mattjwhitney@gmail.com> * Run `make docs` to correct the documentation rather than manual edits. Signed-off-by: Matt Whitney <mattjwhitney@gmail.com> * docs: update examples/dashboards/dashboards.md (thanos-io#3733) Signed-off-by: Mert Acikportali <mertacikportali@gmail.com> * pkg/errutil: correct the multierror file suffix (thanos-io#3736) This commit changes the file suffix from .go.go to simply .go. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Fixed website and added new step for release process (tmp). (thanos-io#3738) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * api: Added Proposal for Discovery Endpoint idea (InfoAPI) (thanos-io#3703) * docs/proposals/210701_endpoint_discovery.md: Add initial proposal Signed-off-by: Lili Cosic <cosiclili@gmail.com> * Update docs/proposals/210701_endpoint_discovery.md Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Introduce "block-meta-fetch-concurrency" flag for compact and store (thanos-io#3752) * Introduce "block-meta-fetch-concurrency" flag for compact and store Furthermore consolidate the fetcher concurrency by moving the fetcherConcurrency const to the fetcher code of the block package. Signed-off-by: Johannes Frey <johannes.frey@daimler.com> * Document flags Signed-off-by: Johannes Frey <johannes.frey@daimler.com> * Add `--query-range.request-downsampled` flag to Query Frontend (thanos-io#2641) (thanos-io#3723) * Add `--query-range.request-downsampled` flag to Query Frontend (thanos-io#2641) Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Vladimir Kononov <krya-kryak@users.noreply.github.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Pinned to newer busybox on quay. (thanos-io#3762) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix panic on concurrent index-header lazy reader usage and unload (thanos-io#3760) * Fix panic on concurrent index-header lazy reader usage and unload Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review comments Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review comments Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed Planner.Plan() description (thanos-io#3767) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed panic on concurrent index-header lazy load / unload (thanos-io#3759) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Pad compaction planner size check (thanos-io#3773) Pad the index size check in the compaction planner by 15% to avoid situations where the sum of index bytes bloats to be larger than the sum of the original index sizes. * thanos-io#3724 * thanos-io#3750 Signed-off-by: Ben Kochie <superq@gmail.com> * pkg/server/http: Allow passing http.ServeMux with a server option (thanos-io#3769) Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * Update Katacoda Official Link (thanos-io#3768) Signed-off-by: soniasingla <soniasingla.1812@gmail.com> * Nomitate kakkoyun for the v0.20.0 release (thanos-io#3778) Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix typo in query component documentation for store api link (thanos-io#3675) * Fix wrong store api link Signed-off-by: Junyoung, Sung <junyoung.sung@navercorp.com> * Fix store api link to absolute Signed-off-by: Junyoung, Sung <junyoung.sung@navercorp.com> * e2e: Resolved Flaky tests (thanos-io#3777) * added retry function Signed-off-by: Abhishek357 <abhisheksinghchauhan357@gmail.com> * fixed CI test Signed-off-by: Abhishek357 <abhisheksinghchauhan357@gmail.com> * restructured code Signed-off-by: Abhishek357 <abhisheksinghchauhan357@gmail.com> * Add an exempt label to ignore issue that has a PR Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * in memory cache for caching bucket (thanos-io#3579) * in memeory cache for caching bucket Signed-off-by: Sudharshann D <sudhar287@gmail.com> * addressing comments Signed-off-by: Sudharshann D <sudhar287@gmail.com> * Introduce "allow overlapping blocks" flag for Thanos receiver (thanos-io#3792) * add tsdb.allow-overlapping-blocks flag to receiver Signed-off-by: Max Chandler <max_chandler@apple.com> * make format Signed-off-by: Max Chandler <max_chandler@apple.com> * cleanup whitespace Signed-off-by: Max Chandler <max_chandler@apple.com> * cleanup whitespace Signed-off-by: Max Chandler <max_chandler@apple.com> * make docs Signed-off-by: Max Chandler <max_chandler@apple.com> * fix changelog conflict Signed-off-by: Max Chandler <max_chandler@apple.com> Co-authored-by: Matt Lawrence <m.law@apple.com> * Reduce memory allocations in bucketBlock.readChunkRange() (thanos-io#3796) Signed-off-by: Marco Pracucci <marco@pracucci.com> * update prometheus version Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * update cortex Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * add new property to LabelValuesRequest Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * set label matchers in LabelValueRequest Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * make docs Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Allow to customise S3 SSE on a per-request basis (thanos-io#3783) * Allow to customise S3 SSE on a per-request basis Signed-off-by: Marco Pracucci <marco@pracucci.com> * Addressed review comments Signed-off-by: Marco Pracucci <marco@pracucci.com> * Refactoring: allow to pass BytesPool to store.NewBucketStore() (thanos-io#3801) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Allow to customise the partitioner used by the BucketStore (thanos-io#3802) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix button display when there is no panels (thanos-io#3694) Signed-off-by: Namanl2001 <namanlakhwani@gmail.com> * fix tests Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Allow downstream projects to customise the Partitioner (thanos-io#3808) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Makefile: Fix to command to find React source files (thanos-io#3805) Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Merge release 0.18 (thanos-io#3809) * CHANGELOG.md: release v0.18.0 Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * VERSION,tutorials: bump versions Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Revert "VERSION,tutorials: bump versions" This reverts commit 5a27c70. The "thanos:" prefix for the container images was accidentally removed. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * VERSION,tutorials: bump version for all images Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * CHANGELOG: bump release date Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * pkg/rules/proxy: fix hotlooping when receiving client errors Currently, if we receive an error from the underlying client stream, we continue with trying to receive additional data. This causes a hotloop as we will receive the same error again. This fixes it by returning in the error case and adds a unit test for the proxy logic. Fixes thanos-io#3717 Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> * CHANGELOG.md: fix changelog to incorporate new fix (thanos-io#3737) This commit fixes the order of the changelog to properly reflect a recent change. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Fixed website and added new step for release process (tmp). (thanos-io#3738) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * CHANGELOG.md: update v0.18.0 release date Signed-off-by: Lucas Servén Marín <lserven@gmail.com> Co-authored-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * reloader: try to fix test flakiness (thanos-io#3798) This is an attempt of fixing the `TestReloader_DirectoriesApply` test flakiness. I call this an attempt because I cannot reproduce it locally. However, I have noticed that during runs where this fails the logs look like this: ``` --- FAIL: TestReloader_DirectoriesApply (3.04s) reloader_test.go:256: Performing step number 0 reloader_test.go:256: Performing step number 1 reloader_test.go:256: Performing step number 2 reloader_test.go:256: Performing step number 3 reloader_test.go:256: Performing step number 4 reloader_test.go:256: Performing step number 6 reloader_test.go:343: reloader_test.go:343: exp: 5 got: 6 ``` It immediately jumps to another value. This gave me a hint and I think that this is happening because potentially `i` can be written to/read from by multiple goroutines. On very resource constrained systems like the CircleCI runners, it could just happen that the `if` doesn't do what it is supposed to. Try to fix this problem by protecting the whole HTTP handler with a mutex. Since this is only a test and not a performance critical path, I think this is a reasonable change to do. Add extra check for reload failures. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * addressing PR comments Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> * Block Viewer: Move overlapping blocks to separate rows (thanos-io#3729) * Block Viewer: overlapping blocks issue resolved Signed-off-by: Namanl2001 <namanlakhwani@gmail.com> * small changes and make assets Signed-off-by: Namanl2001 <namanlakhwani@gmail.com> * Add flag to set default step Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Add support to use default step in classic ui Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Add support to use default step in new ui Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Lint fixes Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * change ParseStep function to be consistent with ui Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Improve description of query.default-step flag Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Minor fixes Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Use default value when flag is undefined Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * minor fixes Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Minor fixes Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Update changelog Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Fix typo Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Receive: Improve handling of empty time series from clients (thanos-io#3815) * exit early if a request has no timeseries data Signed-off-by: Matt Lawrence <m.law@apple.com> Co-authored-by: Max Chandler <max_chandler@apple.com> * update changelog Signed-off-by: Matt Lawrence <m.law@apple.com> * update changelog Signed-off-by: Matt Lawrence <m.law@apple.com> * Move constant to right side of comparison Co-authored-by: Lucas Servén Marín <lserven@gmail.com> Signed-off-by: Matt Lawrence <m.law@apple.com> Co-authored-by: Max Chandler <max_chandler@apple.com> Co-authored-by: Lucas Servén Marín <lserven@gmail.com> * Reduced allocated memory by chunks reader in the store gateway at query time (thanos-io#3814) * Reduced allocated memory by chunks reader in the store gateway at query time Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed linter issues Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed linter (hopefully) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed function Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated code comment Signed-off-by: Marco Pracucci <marco@pracucci.com> * promclient: fix error's message (thanos-io#3824) Use the provided `method` in the error messages. I got scared reading the logs that the requests are still being sent using GET instead of POST which I had specified. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * pkg/ui/react-app: update dependencies (thanos-io#3818) Fix lodash security issue. Signed-off-by: Simon Pasquier <spasquie@redhat.com> * Add objstore.List() recursive support (thanos-io#3823) * Add objstore.List() recursive support Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed CachingBucket Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix Cortex compilation issue Signed-off-by: Marco Pracucci <marco@pracucci.com> * Tidy go.mod/sum Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fix linter Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed TestMetricBucket_Close Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upgraded Cortex Signed-off-by: Marco Pracucci <marco@pracucci.com> * Inlined struct initialisation Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upgrade Cortex again Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upgrade Cortex (thanos-io#3828) Signed-off-by: Marco Pracucci <marco@pracucci.com> * Truncated S3 "get object" response is reported as error (thanos-io#3795) * Add test to ensure S3 truncated response is an error Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added missing copyright Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upgraded Minio Signed-off-by: Marco Pracucci <marco@pracucci.com> * Upgraded Minio again Signed-off-by: Marco Pracucci <marco@pracucci.com> * Implement federated metric metadata API (thanos-io#3686) * support federated metadata API Signed-off-by: Ben Ye <yb532204897@gmail.com> * update comments Signed-off-by: yeya24 <yb532204897@gmail.com> * use parseInt Signed-off-by: yeya24 <yb532204897@gmail.com> * address Prem's comments Signed-off-by: yeya24 <yb532204897@gmail.com> * update proto comment Signed-off-by: yeya24 <yb532204897@gmail.com> * add changelog Signed-off-by: yeya24 <yb532204897@gmail.com> * Fixed TestBucketStore_ManyParts_e2e (thanos-io#3841) https://app.circleci.com/pipelines/github/thanos-io/thanos/5120/workflows/efd2a21d-13b7-4035-99e3-cb1af8023694/jobs/13809 This fail is only visible for anyone from Thanos Team proposing PR, due to extra tests against bucket providers. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Reinforcing errutil and fixed ugly bug which caused empty []error to te treated as error, (thanos-io#3836) Previous multi-error implementation could cause very ugly bug of returnig empty multi-error that should be treated as success not error by API, but if .Err() is not invoked it will be used as non nil error. Once we merge this, we can do cleaner solution that slighly change nesting behaviour: thanos-io#3833 There were 9 places where we had this bug in handler due to MultiError lib allowing to do so. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Export metadata fetcher metrics (thanos-io#3660) Signed-off-by: Marco Pracucci <marco@pracucci.com> * store: Cleaned up API for test/benchmark purposes. (thanos-io#3650) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Allow Cortex to fully reuse FetcherMetrics (thanos-io#3842) Signed-off-by: Marco Pracucci <marco@pracucci.com> * downsample: ensure consistent order (thanos-io#3843) Ensure that we have a consistent order of blocks that we are going to downsample. `range` over maps doesn't enforce any particular order on purpose. This is needed for https://github.com/thanos-io/thanos/pull/3031/files. ATM in that PR before downsampling we delete all directories which do not match blocks ULIDs in the remote object storage. Ideally, we should only keep around the files of a block which we are about to downsample. It is impossible to do that properly ATM if during another iteration we'd start from a different block. Thus, let's have a consistent order. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * [Fix] Replace unsupported %v macro in logs (thanos-io#3847) Signed-off-by: Stéphane Brillant <stephane.brillant@jive.com> * Update ui npm dependencies to latest versions (thanos-io#3813) * Update ui npm deps to latest Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix linting errors Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Change camelcase lint rule Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix CI configs and scripts to work with main branch (thanos-io#3848) Signed-off-by: Prem Saraswat <prmsrswt@gmail.com> * block: precalculate hashes if enabled and use them during compaction (downloading) (thanos-io#3031) * block: precalculate hashes if enabled and use them during compaction Added the possibility to ignore certain directories in objstore.{Download,DownloadDir}. Do not download files which have the same hash as in remote object storage. Wire up `--hash-func` so that writers could specify what hash function to use when uploading. There is no performance impact if no hash function has been explicitly specified. Clean up the removal of files logic in Thanos Compact to ensure we do not remove something that exists on disk already. Tested manually + new tests cover all of this more or less. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * block: expose GatherFileStats and use it Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Revert "block: expose GatherFileStats and use it" This reverts commit 259c70b. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * block: do not calc hash for dirs, add locks Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * docs/tools: update Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * shipper: pass s.hashFunc Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Fix according to Bartek's comments Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * compact: clean up comment Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * block: close with log on error Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * *: remove unused FNs Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * compact: add e2e test for new hash functionality Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Fix according to Bartek's comments Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * mixin: Upgrade jsonnet tooling (thanos-io#3855) * Upgrade jsonnet tooling Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix bingo version issue Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix unintended removal issue Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Update reference of master to main in docs (thanos-io#3849) * Update reference of master to main in docs Signed-off-by: Prem Saraswat <prmsrswt@gmail.com> * Update image tag in tutorials/kubernetes-helm Signed-off-by: Prem Saraswat <prmsrswt@gmail.com> * Cut v0.19.0-rc.0 (thanos-io#3860) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed missed changelog comments. (thanos-io#3865) Missed from: thanos-io#3860 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * rename Metadata API to MetricMetadata API (thanos-io#3877) Signed-off-by: yeya24 <yb532204897@gmail.com> * Fix parseStep bug introduced in PR thanos-io#3740 (thanos-io#3887) (thanos-io#3889) * Fix bug introduced in PR thanos-io#3740 Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> * Minor change in test Signed-off-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> Co-authored-by: Hitanshu Mehta <44025541+hitanshu-mehta@users.noreply.github.com> * Tools: rewrite delete can delete series only if it matches all matchers (thanos-io#3886) (thanos-io#3890) * rewrite delete should delete a series only if it matches all matchers in a deletion request Signed-off-by: yeya24 <yb532204897@gmail.com> * add test case Signed-off-by: yeya24 <yb532204897@gmail.com> Co-authored-by: Ben Ye <yb532204897@gmail.com> * cmd/thanos/receive.go: Receive client infers TLS (thanos-io#3899) Currently, the thanos Receiver infers whether it should use TLS for the gRPC clients that forwards time series to other receivers based on whether the remote-write HTTP server uses TLS. This is not correct, as the HTTP server may use TLS without the gRPC server using TLS. This commit fixes the inference. Longer-term, this will be taken care of by the receive/router split. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * Cut v0.19.0-rc.1 (thanos-io#3900) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * tools: Fix partial and empty matchers in rewrite (thanos-io#3891) * Fix partial and empty matchers match Signed-off-by: yeya24 <yb532204897@gmail.com> * add testcase for non-equal matchers Signed-off-by: yeya24 <yb532204897@gmail.com> * v0.19.0 patch: Added receive benchmark; Fixed Receiver excessive mem usage introduced in 0.17 (thanos-io#3943) * Added receive benchmark, baseline. ``` goos: linux goarch: amd64 pkg: github.com/thanos-io/thanos/pkg/receive BenchmarkHandlerReceiveHTTP BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them. BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./OK-12 22260 1550152 ns/op 1380340 B/op 6093 allocs/op BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them./conflict_errors-12 6619 6430408 ns/op 4522487 B/op 26118 allocs/op BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them. BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./OK-12 2695 17208794 ns/op 15072963 B/op 60441 allocs/op BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them./conflict_errors-12 474 72533286 ns/op 46396932 B/op 260141 allocs/op BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12 270 137050518 ns/op 226595379 B/op 132 allocs/op BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12 21 1616025443 ns/op 698724321 B/op 408 allocs/op PASS Process finished with exit code 0 ``` Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Copy labels. ``` GOROOT=/home/bwplotka/.gvm/gos/go1.15 #gosetup GOPATH=/home/bwplotka/Repos/thanosgopath #gosetup /home/bwplotka/.gvm/gos/go1.15/bin/go test -c -o /tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive github.com/thanos-io/thanos/pkg/receive #gosetup /tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive -test.v -test.bench ^\QBenchmarkHandlerReceiveHTTP\E$ -test.run ^$ -test.benchmem -test.benchtime=30s goos: linux goarch: amd64 pkg: github.com/thanos-io/thanos/pkg/receive BenchmarkHandlerReceiveHTTP BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/OK-12 25887 1537262 ns/op 1380023 B/op 6092 allocs/op BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_500_of_them/conflict_errors-12 4237 7547968 ns/op 4522583 B/op 26118 allocs/op BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/OK-12 2205 16513380 ns/op 15071092 B/op 60420 allocs/op BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_5000_of_them/conflict_errors-12 525 67278233 ns/op 46396645 B/op 260141 allocs/op BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/OK-12 285 148049189 ns/op 226596168 B/op 132 allocs/op BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_10_of_them/conflict_errors-12 20 1731361499 ns/op 698722550 B/op 401 allocs/op PASS Process finished with exit code 0 ``` Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addded bench., Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Improved API. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Changelog. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed Lucas comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * compact: clean up directories thoroughly (thanos-io#3869) * compact: clean up directories properly I couldn't stop thinking about this code for some reason and I have figured that I had missed one case in thanos-io#3031. We need to also clean up the directories in compaction groups. A compaction could fail leaving some new directory with a random ULID on the disk. Before attempting to do another compaction loop, we need to remove it as well because the compaction process always produces a new, unique directory. Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com> * Remove all non expected dirs. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comment. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added benchmark, Moved minio-deps to fork without race fix we don't need. (thanos-io#3968) Fixes: thanos-io#3917 Long term fix: thanos-io#3967 Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Cut v0.19.0-rc.2 (thanos-io#3969) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * pkg/rules: fix deduplication of equal alerts with different labels (thanos-io#3960) (thanos-io#3999) Currently, if an alerting rule having the same name with different severity labels is being returned from different replicas then they are being treated as separate alerts. Given the following alerts a1,a2 with severities s1,s2 returned from replicas r1,2: a1[s1,r1] a1[s2,r1] a1[s1,r2] a1[s2,r2] Then, currently, the algorithm deduplicates to: a1[s1] a1[s2] a1[s1] a1[s2] Instead of the intendet result: a1[s1] a1[s2] This fixes it by removing replica labels before sorting labels for deduplication. Signed-off-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> # Conflicts: # CHANGELOG.md Co-authored-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> * Cut v0.19.0 (thanos-io#3998) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed VERSION file. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Lucas Servén Marín <lserven@gmail.com> Co-authored-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Naman Lakhwani <namanlakhwani@gmail.com> Co-authored-by: Bradley <bwilsonhunt@gmail.com> Co-authored-by: Giedrius Statkevičius <giedriuswork@gmail.com> Co-authored-by: dmaiocchi <dmaiocchi@suse.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Ben Ye <yb532204897@gmail.com> Co-authored-by: Holger Freyther <holger@freyther.de> Co-authored-by: Biswajit Ghosh <34703680+Biswajitghosh98@users.noreply.github.com> Co-authored-by: Matt W <mattjwhitney@gmail.com> Co-authored-by: Mert Açıkportalı <mertacikportali@gmail.com> Co-authored-by: Lili Cosic <cosiclili@gmail.com> Co-authored-by: Johannes Frey <johannes.frey@daimler.com> Co-authored-by: Vladimir Kononov <krya-kryak@users.noreply.github.com> Co-authored-by: Ben Kochie <superq@gmail.com> Co-authored-by: Matthias Loibl <mail@matthiasloibl.com> Co-authored-by: Sonia Singla <soniasingla.1812@gmail.com> Co-authored-by: Kemal Akkoyun <kakkoyun@users.noreply.github.com> Co-authored-by: Junyoung, Sung <junyoung.sung@navercorp.com> Co-authored-by: Abhishek Singh Chauhan <abhisheksinghchauhan357@gmail.com> Co-authored-by: Sudhar287 <sudhar287@gmail.com> Co-authored-by: Max Chandler <59826076+Max-Chandler@users.noreply.github.com> Co-authored-by: Matt Lawrence <m.law@apple.com> Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Hitanshu Mehta <44025541+hitanshu-mehta@users.noreply.github.com> Co-authored-by: Sergiusz Urbaniak <sergiusz.urbaniak@gmail.com> Co-authored-by: Hitanshu Mehta <hitanshu99amehta@gmail.com> Co-authored-by: Matt Lawrence <79317313+m-lawre@users.noreply.github.com> Co-authored-by: Max Chandler <max_chandler@apple.com> Co-authored-by: Simon Pasquier <spasquie@redhat.com> Co-authored-by: Stephane Brillant <stephane.brillant@logmein.com> Co-authored-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Prem Saraswat <prmsrswt@gmail.com>
Changes
Add the proposal for unified endpoint discovery as per discussion in #2600.
I highlighted two options, but again up to the maintainers which one you choose, happy to work on any of the ones that we agree! 🎉
I was not sure what to put under work plan and there was no fixed format for proposals that I could find, might have missed it? I copied from some existing ones if there is anything missing let me know! 🙂