Remove CollectorManager#forSequentialExecution #13790

javanna · 2024-09-14T15:09:03Z

We have recently (see #13735) introduced this utility method that creates a collector manager which only works when a searcher does not have an executor set, otherwise it throws exception once we attempt to create a new collector for more than one slice.

While we discussed it should be safe to use in some specific scenarios like the monitor module, we should be careful exposing this utility publicly, because while we'd like to ease migration from the search(Query, Collector) method, we may end up making users life even worse, in that it exposes them to failures whenever an executor is set and there is more than one slice created, which is hard to follow and does not provide a good user experience. What is these users do have an executor set to their searcher and think that this method provides a collector manager that is able to turn off concurrency (like search(Query, Collector) does currently), which is not possible?

My proposal is that we use a similar collector manager locally, where safe and required, but we don't expose it to users. In most places, we should rather expose collector managers that do support search concurrency, rather than working around the lack of those. Even for us committers, it has not been entirely clear when this is safe and when it's not, which indicates that this is something that users will likely struggle with.

We have recently (see apache#13735) introduced this utility method that creates a collector manager which only works when a searcher does not have an executor set, otherwise it throws exception once we attempt to create a new collector for more than one slice. While we discussed it should be safe to use in some specific scenarios like the monitor module, we should be careful exposing this utility publicly, because while we'd like to ease migration from the search(Query, Collector) method, we may end up making users like even worse, in that it exposes them to failures whenever an executor is set and there are more than one slice created, which is hard to follow and does not provide a good user experience. My proposal is that we use a similar collector manager locally, where safe and required, but we don't expose it to users. In most places, we should rather expose collector managers that do support search concurrency, rather than working around the lack of those.

ChrisHegarty

LGTM

javanna · 2024-09-16T10:14:21Z

@gsmiller how do you feel about this one?

gsmiller · 2024-09-16T22:19:00Z

tl;dr: I agree with removing this.

I was initially hesitant to add this for a lot of the same reasons, but was convinced when realizing we could at least explicitly fail by asserting that more than one collector is never created (so at least we wouldn't be silently creating tear-your-hair-out concurrency bugs). But looking at the bigger picture, the whole point of removing IndexSearcher#search(Query, Collector) is to help users avoid trappy situations where they think they're setup for concurrent search but not utilizing it. I agree it's a worse trap in a lot of ways to introduce this failure mode. I particularly do not like that this "helper" collector manager can't actually check if it's being used with a concurrency-ready IndexSearcher. So I can imagine a real failure case where IndexSearchers are being created with an Executor but only have a single segment and things appear to work—but then break later (e.g., users that are force-merging their segments, or users that have sparse test coverage that only includes testing over single segments, etc.).

javanna · 2024-09-17T07:41:03Z

Thanks for the feedback @gsmiller I will go ahead and merge this.

We have recently (see #13735) introduced this utility method that creates a collector manager which only works when a searcher does not have an executor set, otherwise it throws exception once we attempt to create a new collector for more than one slice. While we discussed it should be safe to use in some specific scenarios like the monitor module, we should be careful exposing this utility publicly, because while we'd like to ease migration from the search(Query, Collector) method, we may end up making users like even worse, in that it exposes them to failures whenever an executor is set and there are more than one slice created, which is hard to follow and does not provide a good user experience. My proposal is that we use a similar collector manager locally, where safe and required, but we don't expose it to users. In most places, we should rather expose collector managers that do support search concurrency, rather than working around the lack of those.

javanna added this to the 9.12.0 milestone Sep 14, 2024

ChrisHegarty approved these changes Sep 15, 2024

View reviewed changes

gsmiller approved these changes Sep 16, 2024

View reviewed changes

gsmiller mentioned this pull request Sep 16, 2024

Remove all deprecated IndexSearcher#search(Query, Collector) usage / methods in the next major release #12892

Open

13 tasks

javanna merged commit a4c79c8 into apache:main Sep 17, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove CollectorManager#forSequentialExecution #13790

Remove CollectorManager#forSequentialExecution #13790

javanna commented Sep 14, 2024 •

edited

Loading

ChrisHegarty left a comment

javanna commented Sep 16, 2024

gsmiller commented Sep 16, 2024 •

edited

Loading

javanna commented Sep 17, 2024

Remove CollectorManager#forSequentialExecution #13790

Remove CollectorManager#forSequentialExecution #13790

Conversation

javanna commented Sep 14, 2024 • edited Loading

ChrisHegarty left a comment

Choose a reason for hiding this comment

javanna commented Sep 16, 2024

gsmiller commented Sep 16, 2024 • edited Loading

javanna commented Sep 17, 2024

javanna commented Sep 14, 2024 •

edited

Loading

gsmiller commented Sep 16, 2024 •

edited

Loading