Remove usage of IndexSearcher#Search(Query, Collector) from monitor package #13735

gsmiller · 2024-09-06T16:14:34Z

Relates to: #12892

…ackage

gsmiller · 2024-09-06T16:17:56Z

OK, I think this is still a safe implementation but the change is that multiple collection threads will now be concurrently calling CandidateMatcher#addMatch. I believe this is safe but need another set of eyes on it. The reason I think this works is that we should never be interleaving calls with the same doc value since these are global docIDs.

@romseygeek I think you originally authored most of this. Do you have time to have a look at this change?

javanna · 2024-09-06T18:46:22Z

I am not familiar with this code but I believe you are correct. This should work despite we use a plain ArrayList, because each thread should only get and compute its own docs. Testing would verify that, but I believe we never provide an executor to the searcher we use, neither in test nor in the prod code. Should we try and add that?

javanna · 2024-09-06T18:46:49Z

Also, thanks a lot for the help here, much appreciated!

romseygeek

LGTM, thanks @gsmiller

Concurrency in the monitor is handled in Parallel/PartitionMatcher outside of the index searcher, and the searcher is always built internally and not exposed to clients so there's no opportunity for an executor to be set on it.

Separately, I wonder if it's worth adding some sugar methods to CollectorManager that wrap up single-threaded collectors? I'm guessing that there are going to be many instances of the pattern "return a collector from newCollector and null from reduce", and it seems a bit of a retrograde step to replace one line of code with 13, especially if you don't actually use multithreading.

gsmiller · 2024-09-09T14:50:51Z

Thanks @romseygeek for providing more detail on the monitor implementation and confirming this is safe.

As for your suggestion of providing a sugar method in CollectorManager to do the wrapping for situations like this, I'm a bit conflicted. I took a crack at updating the PR here with a revision that does this for discussion. On one hand, I agree with you that it's pretty verbose (and annoying if I'm honest) for users to have to do their own Collector wrapping like this, so a convenience method would be nice. On the other hand, I'm concerned it could be trappy for users that don't read the "fine print" and end up creating concurrency bugs by incorrectly wrapping in a case where an IndexSearcher is using an Executor (and their Collector isn't thread-safe).

I think it's a balance of how often this single-threaded-collector pattern occurs and how potentially trappy this sugar method is. Right now, I'm leaning away from providing this sugar method. I want to avoid situations where separate user code is creating the CollectorManager and creating the IndexSearcher (which is easy to imagine). In that case, I worry that this could be done correct at first, only to have the IndexSearcher instantiation logic change to use an Executor, introducing tricky bugs. Thoughts?

Or maybe there's a better way to do this that's still convenient but less trappy?

javanna · 2024-09-09T14:57:06Z

There is a test that does the following:

QueryMatch.SIMPLE_MATCHER.createMatcher(searcher);

This uses public API and allows to provide a searcher externally, doesn't it? Do we need to enforce that the searcher used in monitor should not have an executor and/or should not try to apply concurrency?

lucene/core/src/java/org/apache/lucene/search/CollectorManager.java

javanna · 2024-09-09T15:03:35Z

lucene/core/src/java/org/apache/lucene/search/CollectorManager.java

+   * {@link IndexSearcher#IndexSearcher(IndexReader, Executor)}), or the provided collector is
+   * threadsafe.
+   */
+  static <C extends Collector> CollectorManager<C, ?> wrap(C in) {


This is not fantastic but I suspect we may lean on this solution for other leftover usages of Collector vs CollectorManager. Shall we clarify in the name that this is for sequential execution / single threaded?

Makes sense. I'll revise the name (but if you have suggestions, I'm open to them; naming can be tricky)

Naming is the most difficult bit :) Some options: forSequentialExecution , createSequentialManager, singleThreadedManager , or something along those lines.

javanna · 2024-09-09T15:04:18Z

lucene/monitor/src/java/org/apache/lucene/monitor/CollectingMatcher.java

@@ -37,7 +38,7 @@ abstract class CollectingMatcher<T extends QueryMatch> extends CandidateMatcher<
  @Override
  public void matchQuery(final String queryId, Query matchQuery, Map<String, String> metadata)
      throws IOException {
-    searcher.search(matchQuery, new MatchCollector(queryId, scoreMode));
+    searcher.search(matchQuery, CollectorManager.wrap(new MatchCollector(queryId, scoreMode)));


I left a comment about whether this is entirely safe, I was under the impression that one can provide an external searcher but I am not familiar with monitor code ;)

I guess you could call it with an externally built searcher, but it wouldn't be very useful as the whole point is that it's a searcher over a batch of documents submitted to the monitor for checking. It's public to allow clients to build wrapping matchers.

There may be a nicer way to do this though, let me experiment.

I think we can make the public-facing methods all take DocumentBatch here rather than IndexSearcher, which will make building the index searcher an implementation detail. I'll put together a PR.

gsmiller · 2024-09-10T19:18:31Z

OK, I think the latest iteration of this is ready to merge, pending any remaining feedback, concerns, etc. I voiced some concerns earlier about the static wrapping method, wanting to make sure we don't provide some trappy implementation for users, but the suggestions around error checking/asserts convinced me this is reasonable (still don't love it, but I think it's reasonable right now).

I'll get this merged in the next couple days if I don't hear any additional concerns, otherwise happy to iterate.

javanna

ship it! :) thanks again for picking this up. I think we will reuse the method you added in a couple of other places.

javanna · 2024-09-10T22:05:20Z

Heads up: I saw there were merge conflicts caused by my recent push to main hence I went ahead and addressed those.

…ackage (#13735)

gsmiller · 2024-09-11T13:39:51Z

Thanks @javanna, @romseygeek ! FYI @msfroh just merged and back ported. I saw you had another PR waiting on this.

javanna · 2024-09-14T14:56:03Z

I've been changing my mind a few times on the topic, but I am concluding that it probably was not a good idea to expose the sequential collector manager. I think we are ok using it internally at our own risk, for situations where we know for sure that we can't possibly have a searcher with an executor set. But exposing it publicly makes it too easy for users to get unexpected failures, and the disclaimer "don't use this with a searcher that has an executor set" is kind of odd. What if those users do have other queries that make already use of concurrency, and use the same searcher for these others that get converted to leveraging the sequential collector manager? It is also trappy that they get errors only once there's more than one slice, so the behaviour may be hard to follow for users.

I would propose that we make the new collector manager private with a bug disclaimer on when it should be used, and make a plan to remove it perhaps in the medium to long term. We should really design all new features with concurrency in mind, and migrate old ones to support concurrency.

There is some urgency around this especially for the 9.12 release, I am happy to open a PR.

We have recently (see apache#13735) introduced this utility method that creates a collector manager which only works when a searcher does not have an executor set, otherwise it throws exception once we attempt to create a new collector for more than one slice. While we discussed it should be safe to use in some specific scenarios like the monitor module, we should be careful exposing this utility publicly, because while we'd like to ease migration from the search(Query, Collector) method, we may end up making users like even worse, in that it exposes them to failures whenever an executor is set and there are more than one slice created, which is hard to follow and does not provide a good user experience. My proposal is that we use a similar collector manager locally, where safe and required, but we don't expose it to users. In most places, we should rather expose collector managers that do support search concurrency, rather than working around the lack of those.

We have recently (see #13735) introduced this utility method that creates a collector manager which only works when a searcher does not have an executor set, otherwise it throws exception once we attempt to create a new collector for more than one slice. While we discussed it should be safe to use in some specific scenarios like the monitor module, we should be careful exposing this utility publicly, because while we'd like to ease migration from the search(Query, Collector) method, we may end up making users like even worse, in that it exposes them to failures whenever an executor is set and there are more than one slice created, which is hard to follow and does not provide a good user experience. My proposal is that we use a similar collector manager locally, where safe and required, but we don't expose it to users. In most places, we should rather expose collector managers that do support search concurrency, rather than working around the lack of those.

Remove usage of IndexSearcher#Search(Query, Collector) from monitor p…

a98cd1d

…ackage

gsmiller requested a review from romseygeek September 6, 2024 16:18

gsmiller mentioned this pull request Sep 6, 2024

Remove all deprecated IndexSearcher#search(Query, Collector) usage / methods in the next major release #12892

Open

13 tasks

romseygeek approved these changes Sep 9, 2024

View reviewed changes

version with wrapping factory method in CM

3d5fd89

javanna reviewed Sep 9, 2024

View reviewed changes

pr feedback

46b96b9

msfroh mentioned this pull request Sep 9, 2024

Remove usage of IndexSearcher#search(Query, Collector) from join package #13747

Open

renaming

e79cc89

changes entry

c78e3b4

javanna approved these changes Sep 10, 2024

View reviewed changes

javanna added 2 commits September 10, 2024 23:26

Merge branch 'main' into GH/collector-dep/monitor

09f0f81

Merge branch 'main' into GH/collector-dep/monitor

9ea40f0

gsmiller merged commit c26c2d6 into apache:main Sep 11, 2024
3 checks passed

gsmiller deleted the GH/collector-dep/monitor branch September 11, 2024 13:26

gsmiller added a commit that referenced this pull request Sep 11, 2024

Remove usage of IndexSearcher#Search(Query, Collector) from monitor p…

b08252b

…ackage (#13735)

gsmiller added this to the 9.12.0 milestone Sep 11, 2024

javanna mentioned this pull request Sep 14, 2024

Remove CollectorManager#forSequentialExecution #13790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove usage of IndexSearcher#Search(Query, Collector) from monitor package #13735

Remove usage of IndexSearcher#Search(Query, Collector) from monitor package #13735

gsmiller commented Sep 6, 2024

gsmiller commented Sep 6, 2024 •

edited

Loading

javanna commented Sep 6, 2024

javanna commented Sep 6, 2024

romseygeek left a comment

gsmiller commented Sep 9, 2024 •

edited

Loading

javanna commented Sep 9, 2024

javanna Sep 9, 2024

gsmiller Sep 9, 2024

javanna Sep 10, 2024

javanna Sep 9, 2024

romseygeek Sep 9, 2024

romseygeek Sep 9, 2024

gsmiller commented Sep 10, 2024

javanna left a comment

javanna commented Sep 10, 2024 •

edited

Loading

gsmiller commented Sep 11, 2024

javanna commented Sep 14, 2024

Remove usage of IndexSearcher#Search(Query, Collector) from monitor package #13735

Remove usage of IndexSearcher#Search(Query, Collector) from monitor package #13735

Conversation

gsmiller commented Sep 6, 2024

gsmiller commented Sep 6, 2024 • edited Loading

javanna commented Sep 6, 2024

javanna commented Sep 6, 2024

romseygeek left a comment

Choose a reason for hiding this comment

gsmiller commented Sep 9, 2024 • edited Loading

javanna commented Sep 9, 2024

javanna Sep 9, 2024

Choose a reason for hiding this comment

gsmiller Sep 9, 2024

Choose a reason for hiding this comment

javanna Sep 10, 2024

Choose a reason for hiding this comment

javanna Sep 9, 2024

Choose a reason for hiding this comment

romseygeek Sep 9, 2024

Choose a reason for hiding this comment

romseygeek Sep 9, 2024

Choose a reason for hiding this comment

gsmiller commented Sep 10, 2024

javanna left a comment

Choose a reason for hiding this comment

javanna commented Sep 10, 2024 • edited Loading

gsmiller commented Sep 11, 2024

javanna commented Sep 14, 2024

gsmiller commented Sep 6, 2024 •

edited

Loading

gsmiller commented Sep 9, 2024 •

edited

Loading

javanna commented Sep 10, 2024 •

edited

Loading