Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove CollectorManager#forSequentialExecution #13790

Merged

Conversation

javanna
Copy link
Contributor

@javanna javanna commented Sep 14, 2024

We have recently (see #13735) introduced this utility method that creates a collector manager which only works when a searcher does not have an executor set, otherwise it throws exception once we attempt to create a new collector for more than one slice.

While we discussed it should be safe to use in some specific scenarios like the monitor module, we should be careful exposing this utility publicly, because while we'd like to ease migration from the search(Query, Collector) method, we may end up making users life even worse, in that it exposes them to failures whenever an executor is set and there is more than one slice created, which is hard to follow and does not provide a good user experience. What is these users do have an executor set to their searcher and think that this method provides a collector manager that is able to turn off concurrency (like search(Query, Collector) does currently), which is not possible?

My proposal is that we use a similar collector manager locally, where safe and required, but we don't expose it to users. In most places, we should rather expose collector managers that do support search concurrency, rather than working around the lack of those. Even for us committers, it has not been entirely clear when this is safe and when it's not, which indicates that this is something that users will likely struggle with.

We have recently (see apache#13735) introduced this utility method that creates a
collector manager which only works when a searcher does not have an executor
set, otherwise it throws exception once we attempt to create a new collector
for more than one slice.

While we discussed it should be safe to use in some specific scenarios like the
monitor module, we should be careful exposing this utility publicly, because
while we'd like to ease migration from the search(Query, Collector) method, we
may end up making users like even worse, in that it exposes them to failures
whenever an executor is set and there are more than one slice created, which
is hard to follow and does not provide a good user experience.

My proposal is that we use a similar collector manager locally, where safe and
required, but we don't expose it to users. In most places, we should rather
expose collector managers that do support search concurrency, rather than working
around the lack of those.
@javanna javanna added this to the 9.12.0 milestone Sep 14, 2024
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@javanna
Copy link
Contributor Author

javanna commented Sep 16, 2024

@gsmiller how do you feel about this one?

@gsmiller
Copy link
Contributor

gsmiller commented Sep 16, 2024

tl;dr: I agree with removing this.

I was initially hesitant to add this for a lot of the same reasons, but was convinced when realizing we could at least explicitly fail by asserting that more than one collector is never created (so at least we wouldn't be silently creating tear-your-hair-out concurrency bugs). But looking at the bigger picture, the whole point of removing IndexSearcher#search(Query, Collector) is to help users avoid trappy situations where they think they're setup for concurrent search but not utilizing it. I agree it's a worse trap in a lot of ways to introduce this failure mode. I particularly do not like that this "helper" collector manager can't actually check if it's being used with a concurrency-ready IndexSearcher. So I can imagine a real failure case where IndexSearchers are being created with an Executor but only have a single segment and things appear to work—but then break later (e.g., users that are force-merging their segments, or users that have sparse test coverage that only includes testing over single segments, etc.).

@javanna
Copy link
Contributor Author

javanna commented Sep 17, 2024

Thanks for the feedback @gsmiller I will go ahead and merge this.

@javanna javanna merged commit a4c79c8 into apache:main Sep 17, 2024
3 checks passed
javanna added a commit that referenced this pull request Sep 17, 2024
We have recently (see #13735) introduced this utility method that creates a
collector manager which only works when a searcher does not have an executor
set, otherwise it throws exception once we attempt to create a new collector
for more than one slice.

While we discussed it should be safe to use in some specific scenarios like the
monitor module, we should be careful exposing this utility publicly, because
while we'd like to ease migration from the search(Query, Collector) method, we
may end up making users like even worse, in that it exposes them to failures
whenever an executor is set and there are more than one slice created, which
is hard to follow and does not provide a good user experience.

My proposal is that we use a similar collector manager locally, where safe and
required, but we don't expose it to users. In most places, we should rather
expose collector managers that do support search concurrency, rather than working
around the lack of those.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants