Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search Pipelines] Split processor namespace by type (request vs response vs phase) #7576

Closed
msfroh opened this issue May 15, 2023 · 0 comments · Fixed by #7597
Closed

[Search Pipelines] Split processor namespace by type (request vs response vs phase) #7576

msfroh opened this issue May 15, 2023 · 0 comments · Fixed by #7597
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc v2.8.0 'Issues and PRs related to version v2.8.0'

Comments

@msfroh
Copy link
Collaborator

msfroh commented May 15, 2023

Is your feature request related to a problem? Please describe.
In my initial PR for search pipelines, I copied the example of IngestService that maintains a map from processor names to their factories.

Of course with search pipelines, we already have two different kinds of processors -- request processors and response processors. Thinking about something like the RenameResponseProcessor (that we may rename to RenameFieldResponseProcessor) with a possible key of rename or rename_field, I was thinking "What would we call a request processor that replaces all occurrences of a given field name with another name?" We would probably also call it something like rename_field. Right now, you wouldn't be able to register two processors (of different types) with the same name.

Describe the solution you'd like

I think SearchPipelineService should maintain separate maps for request and response processors (or any future processor type), effectively giving them their own namespaces. This also simplifies some of the logic around pipeline construction, since we no longer have to do instanceof checks to make sure that a processor with a given name corresponds to the correct type.

The biggest challenge I see is that search pipeline code went out with OpenSearch 2.7 behind a feature flag. That included a NodeInfo change so that nodes provide a flat list of supported processors (so we can check on pipeline creation that all nodes in the cluster have the required processors, since they may be provided by plugins). With this change, we would also need to split that output to list supported processors by type. Since we're not going to change 2.7 (since it's been released), we would probably need to flatten the list in the unlikely event that we're called from a 2.7 node. Similarly, if we receive a NodeInfo from a 2.7 node, I think we should pretend that all processors are (potentially) both request and response processors (since we have no way of knowing which they are).

Describe alternatives you've considered

The main alternative (I think) is to let my existing mistake stand. Going forward, we would never be able to register two search pipeline processors with the same name, regardless of what part of the search flow is being acted upon.

We could also establish (and enforce) a naming convention where all processors are named according to their type (with a prefix or suffix).

@msfroh msfroh added enhancement Enhancement or improvement to existing feature or request untriaged Search Search query, autocomplete ...etc labels May 15, 2023
@msfroh msfroh removed the untriaged label May 15, 2023
@msfroh msfroh moved this from 🆕 New to Now(This Quarter) in Search Project Board May 15, 2023
@msfroh msfroh added the v2.8.0 'Issues and PRs related to version v2.8.0' label May 15, 2023
msfroh added a commit to msfroh/OpenSearch that referenced this issue May 17, 2023
In the initial search pipelines commit, I threw request and response
processor factories into one combined map. I think that was a mistake.

We should embrace type-safety by making sure that the kind of processor
is clear from end to end. As we add more processor types (e.g. search
phase processor), throwing them all in one big map would get messier.

As a bonus, we'll be able to reuse processor names across different
types of processor.

Closes opensearch-project#7576

Signed-off-by: Michael Froh <froh@amazon.com>
@msfroh msfroh moved this from Now(This Quarter) to 🏗 In progress in Search Project Board May 17, 2023
msfroh added a commit to msfroh/OpenSearch that referenced this issue May 17, 2023
In the initial search pipelines commit, I threw request and response
processor factories into one combined map. I think that was a mistake.

We should embrace type-safety by making sure that the kind of processor
is clear from end to end. As we add more processor types (e.g. search
phase processor), throwing them all in one big map would get messier.

As a bonus, we'll be able to reuse processor names across different
types of processor.

Closes opensearch-project#7576

Signed-off-by: Michael Froh <froh@amazon.com>
@msfroh msfroh moved this from 🏗 In progress to 👀 In review in Search Project Board May 18, 2023
msfroh added a commit to msfroh/OpenSearch that referenced this issue May 19, 2023
In the initial search pipelines commit, I threw request and response
processor factories into one combined map. I think that was a mistake.

We should embrace type-safety by making sure that the kind of processor
is clear from end to end. As we add more processor types (e.g. search
phase processor), throwing them all in one big map would get messier.

As a bonus, we'll be able to reuse processor names across different
types of processor.

Closes opensearch-project#7576

Signed-off-by: Michael Froh <froh@amazon.com>
msfroh added a commit to msfroh/OpenSearch that referenced this issue May 22, 2023
In the initial search pipelines commit, I threw request and response
processor factories into one combined map. I think that was a mistake.

We should embrace type-safety by making sure that the kind of processor
is clear from end to end. As we add more processor types (e.g. search
phase processor), throwing them all in one big map would get messier.

As a bonus, we'll be able to reuse processor names across different
types of processor.

Closes opensearch-project#7576

Signed-off-by: Michael Froh <froh@amazon.com>
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Search Project Board May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc v2.8.0 'Issues and PRs related to version v2.8.0'
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant