Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix pushdown with deduplication on #4987

Merged
merged 14 commits into from
Feb 4, 2022
Merged

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Dec 22, 2021

Add some cases for pushdown with deduplication to test how it works.

There are some bugs here :'/

Fix them by creating a new series iterator that performs the given function over all replicas.

@GiedriusS GiedriusS changed the title e2e: add dedup cases for pushdown e2e: add dedup cases for pushdown (shows bugs :/) Dec 22, 2021
Copy link
Member Author

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some test cases to show some bugs in pushdown, unfortunately :/ cc @fpetkovski

test/e2e/query_test.go Outdated Show resolved Hide resolved
test/e2e/query_test.go Outdated Show resolved Hide resolved
@GiedriusS GiedriusS changed the title e2e: add dedup cases for pushdown (shows bugs :/) e2e: add dedup cases for pushdown (shows bugs? :/) Dec 22, 2021
Copy link
Member Author

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think what happens is that query_range aligns the timestamps for us so then the deduplication doesn't work properly since they are all the same and the dedup depends on the timestamps :/

@fpetkovski
Copy link
Contributor

Hm, I wonder if we could inject a synthetic label when a query is evaluated in the sidecar.

@GiedriusS
Copy link
Member Author

I think we could leverage series response hints for that. I'd prefer not to add metadata to labels to avoid allocations. In my opinion, we could solve the problem shown in these tests with a custom iterator that performs these functions (max/min) on top of all series from each replica to replicate the old behavior. What do you think, @fpetkovski ?

@GiedriusS GiedriusS changed the title e2e: add dedup cases for pushdown (shows bugs? :/) Fix pushdown with deduplication on Dec 30, 2021
@GiedriusS
Copy link
Member Author

I have pushed a new pushdown series iterator that performs the given function over all replicas before returning the result. What do you think about this approach @bwplotka @fpetkovski ? Just need to decide how we'd like to indicate in SeriesResponse that results have been precomputed - via labels, via hints, or some other mechanism. I'd like to know your thoughts 🤗

@fpetkovski
Copy link
Contributor

Can we add a test for the case where Querier is pushing a query to 2 different sets of replicated sidecars. For example, s1 and s2 being replicas of each other, and s3 and s4 also being replicas of each other. I am wondering what would happen if s1 and s3 have overlapping series, would the new iterator deduplicate them?

pkg/query/querier.go Outdated Show resolved Hide resolved
pkg/pushdown/iter.go Outdated Show resolved Hide resolved
Add some cases for pushdown with deduplication to test how it works.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Add a pushdown series iterator that performs functions over replicas
before returning control to the promql engine.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS
Copy link
Member Author

Can we add a test for the case where Querier is pushing a query to 2 different sets of replicated sidecars. For example, s1 and s2 being replicas of each other, and s3 and s4 also being replicas of each other. I am wondering what would happen if s1 and s3 have overlapping series, would the new iterator deduplicate them?

Could you please elaborate on this with some examples? Because the deduplication should work the same.

I have pushed v2 of this change:

  • Now the single dedup iterator handles interleaved cases no matter from where the data came from;
  • All pushed-down series are passed through dedupSeriesSet. All pushed down series are shoved into pushdownSeriesIterator and then it becomes a regular series in dedupSeries.

I think this approach is much cleaner and handles all cases. PTAL @bwplotka @fpetkovski 🙏

Do not reset the slice if that is not needed.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS
Copy link
Member Author

@yeya24 maybe this would be interesting to you as well

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Change the name to not have clashes.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Don't call Seek() directly, iterate gently.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS
Copy link
Member Author

Untitled-2022-01-17-1805

Made a small picture showing how it works. @fpetkovski @bwplotka friendly ping 🤗

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Feb 2, 2022
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS
Copy link
Member Author

GiedriusS commented Feb 3, 2022

Given the upcoming release and no other comments, I'm likely to merge this to avoid shipping a broken feature. Plus, everything is behind ifs so there should be no performance impact if a user does not have push down enabled. Next up on the list: #5069.

@GiedriusS GiedriusS merged commit d218e60 into thanos-io:main Feb 4, 2022
@GiedriusS GiedriusS deleted the weird_pushdown branch February 4, 2022 08:20
Nicholaswang pushed a commit to Nicholaswang/thanos that referenced this pull request Mar 6, 2022
* e2e: add dedup cases for pushdown

Add some cases for pushdown with deduplication to test how it works.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* query: add pushdown series iterator

Add a pushdown series iterator that performs functions over replicas
before returning control to the promql engine.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* pushdown: add package

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* pushdown: fix bug in iterator

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* pushdown: fix grammar mistake

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* *: rework pushdown iterator

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* dedup: cleanups

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* dedup: add if

Do not reset the slice if that is not needed.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* dedup: use helpers

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* e2e: fix test

Change the name to not have clashes.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* query: ensure pushdown label is at the end

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* dedup: avoid infinite loop

Don't call Seek() directly, iterate gently.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* dedup: fix bugs with gaps, add test cases

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Nicholaswang <wzhever@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants