[Search Pipelines] Split processor namespace by type (request vs response vs phase) #7576

msfroh · 2023-05-15T23:01:19Z

Is your feature request related to a problem? Please describe.
In my initial PR for search pipelines, I copied the example of IngestService that maintains a map from processor names to their factories.

Of course with search pipelines, we already have two different kinds of processors -- request processors and response processors. Thinking about something like the RenameResponseProcessor (that we may rename to RenameFieldResponseProcessor) with a possible key of rename or rename_field, I was thinking "What would we call a request processor that replaces all occurrences of a given field name with another name?" We would probably also call it something like rename_field. Right now, you wouldn't be able to register two processors (of different types) with the same name.

Describe the solution you'd like

I think SearchPipelineService should maintain separate maps for request and response processors (or any future processor type), effectively giving them their own namespaces. This also simplifies some of the logic around pipeline construction, since we no longer have to do instanceof checks to make sure that a processor with a given name corresponds to the correct type.

The biggest challenge I see is that search pipeline code went out with OpenSearch 2.7 behind a feature flag. That included a NodeInfo change so that nodes provide a flat list of supported processors (so we can check on pipeline creation that all nodes in the cluster have the required processors, since they may be provided by plugins). With this change, we would also need to split that output to list supported processors by type. Since we're not going to change 2.7 (since it's been released), we would probably need to flatten the list in the unlikely event that we're called from a 2.7 node. Similarly, if we receive a NodeInfo from a 2.7 node, I think we should pretend that all processors are (potentially) both request and response processors (since we have no way of knowing which they are).

Describe alternatives you've considered

The main alternative (I think) is to let my existing mistake stand. Going forward, we would never be able to register two search pipeline processors with the same name, regardless of what part of the search flow is being acted upon.

We could also establish (and enforce) a naming convention where all processors are named according to their type (with a prefix or suffix).

The text was updated successfully, but these errors were encountered:

In the initial search pipelines commit, I threw request and response processor factories into one combined map. I think that was a mistake. We should embrace type-safety by making sure that the kind of processor is clear from end to end. As we add more processor types (e.g. search phase processor), throwing them all in one big map would get messier. As a bonus, we'll be able to reuse processor names across different types of processor. Closes opensearch-project#7576 Signed-off-by: Michael Froh <froh@amazon.com>

msfroh added enhancement Enhancement or improvement to existing feature or request untriaged Search Search query, autocomplete ...etc labels May 15, 2023

github-project-automation bot moved this to 🆕 New in Search Project Board May 15, 2023

github-project-automation bot added this to Search Project Board May 15, 2023

msfroh removed the untriaged label May 15, 2023

msfroh moved this from 🆕 New to Now(This Quarter) in Search Project Board May 15, 2023

msfroh added the v2.8.0 'Issues and PRs related to version v2.8.0' label May 15, 2023

msfroh mentioned this issue May 17, 2023

[Search Pipelines] Split search pipeline processor factories by type #7597

Merged

6 tasks

msfroh moved this from Now(This Quarter) to 🏗 In progress in Search Project Board May 17, 2023

msfroh moved this from 🏗 In progress to 👀 In review in Search Project Board May 18, 2023

nknize closed this as completed in #7597 May 22, 2023

github-project-automation bot moved this from 👀 In review to ✅ Done in Search Project Board May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Search Pipelines] Split processor namespace by type (request vs response vs phase) #7576

[Search Pipelines] Split processor namespace by type (request vs response vs phase) #7576

msfroh commented May 15, 2023

[Search Pipelines] Split processor namespace by type (request vs response vs phase) #7576

[Search Pipelines] Split processor namespace by type (request vs response vs phase) #7576

Comments

msfroh commented May 15, 2023