Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usage of IndexSearcher#search(Query, Collector) from join package #13747

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

msfroh
Copy link
Contributor

@msfroh msfroh commented Sep 9, 2024

Description

Relates to #12892

For global ordinal-based join, we can support concurrent search. For numeric and term-based joins, we fail if we're called from a multithreaded searcher.

I can implement concurrent versions of the other join collectors, but wanted to get a first pass that removes the uses of IndexSearcher#search(Query, Collector).

Note that for the cases that still assume a single-threaded searcher, I used a version of CollectorManager#wrap(Collector) from #13735, with my guess for where it will end up based on feedback so far.

Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good approach. Let's prioritize removing those leftover usages and add the missing collector managers as a next step.

@javanna
Copy link
Contributor

javanna commented Sep 10, 2024

@msfroh thanks a lot for picking this up!!!

@msfroh
Copy link
Contributor Author

msfroh commented Sep 11, 2024

@javanna -- I've managed to get the remaining numeric / terms collectors in the Join module working with multiple search threads.

I can add them to this PR, but the diff is pretty massive. I'm thinking of holding off for another PR, but I'm happy to go either way.

There is probably value in "atomically" jumping from the current "always single-threaded" mode straight to "everything works with a multithreaded searcher", versus this PR's current state where global ordinal-based joins work with a multithreaded searcher but numeric/term-based joins don't.

Thanks a lot for the review!

Copy link
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple more comments, I am fine getting this in and focusing on the missing parallel collector managers as a follow-up. Thanks for the work you put into this!

For global ordinal-based join, we can support concurrent search. For
others, we fail if we're called from a multithreaded searcher.
Since I plan to implement numeric and term collectors that support
merging all collectors into a single collector, it makes sense to
move MergeableCollectorManager into its own top-level class.
This change removes MergeableCollector (and MergeableCollectorManager),
wraps all of the custom Collector classes in their own CollectorManager,
and removes all remaining occurrences of
CollectorManager.forSequentialExecution from the tests.

This also adds all of the other join collectors, bringing the join
module fully into the CollectorManager club.
@msfroh
Copy link
Contributor Author

msfroh commented Sep 20, 2024

Okay -- I wrapped all of the Collectors in CollectorManagers, and managed to remove all uses of CollectorManager.forSequentialExecution. I also went ahead and added the remaining Collectors to this PR.

Copy link

github-actions bot commented Oct 5, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants