Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search Pipelines] Modify search pipeline behavior based on authenticated user/role #11053

Open
msfroh opened this issue Nov 1, 2023 · 7 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Capabilities

Comments

@msfroh
Copy link
Collaborator

msfroh commented Nov 1, 2023

Is your feature request related to a problem? Please describe.
In a search pipeline request processor, it would be nice to be able to customize behavior based on the particular user/role making the call.

In particular, while #10938 proposes granular restrictions on specific query types, it would be even nicer if we could say "Users with role X can only run queries that satisfy these constraints, while admin users with role Y have these less-stringent constraints."

Describe the solution you'd like
I think I'd like to be able to access the current Subject from a search processor, if the Identity feature is enabled. Since the SearchPipelineService.resolvePipeline() method runs on the request thread (IIRC), we could probably inject IdentityService into SearchPipelineService, which could call identityService.getSubject() and add the Subject into the PipelinedRequestContext that we're adding in #9405. Then it would be up to a specific processor to decide if they want to look at the Subject and do something with it.

Alternatively, maybe I'm thinking about the problem from the wrong angle -- perhaps we should add authorization to use search pipelines within a search request. Under that model, you could define a search pipeline that imposes some restrictions on a search request, and say that users with role X can only submit queries using that search pipeline. (Meanwhile, users with a more admin-like role can use a different search pipeline or maybe specify the _none pipeline explicitly.) That feels like it might be the cleaner solution, because you're separating the question of which users can access which pipelines from deciding what the individual pipelines can do.

I'm not sure what the right implementation is and would appreciate suggestions from folks who know more from the security angle, like @scrawfor99 and @peternied.

Describe alternatives you've considered
I've also toyed with the idea of adding a search pipeline parameter to an index alias, similar to the existing filter parameter that can be used to make an index alias behave like a "view" over a subset of the index. You could create multiple aliases for each index, where each alias has a search pipeline that enables/disables some capabilities. Then you would grant specific users/roles access to the various aliases, rather than authorizing them to access specific indices. I kind of like that idea, because even as an admin, I might want to query my_index_with_guardrails most of the time, only querying my_index_that_shoots_me_in_the_foot when I absolutely have to.

Additional context
See also #10938.

@peternied
Copy link
Member

@msfroh Could you help me understand the scenario from the customer perspective, such as a set of http requests?

@cwperks
Copy link
Member

cwperks commented Nov 2, 2023

Users with role X can only run queries that satisfy these constraints

@msfroh Can you elaborate on what these constraints may be?

@msfroh
Copy link
Collaborator Author

msfroh commented Nov 3, 2023

Users with role X can only run queries that satisfy these constraints

@msfroh Can you elaborate on what these constraints may be?

Oh, sure -- that's what's described in #10938, which is a (possible) driver for this issue (but the features are orthogonal). Essentially, in that issue, I propose using a search pipeline request processor to provide a finer-grained alternative to the all-or-nothing allow_expensive_queries cluster setting.

Let's use a subset of one of my examples from that issue:

PUT /_search/pipeline/allowed_queries_pipeline
{
  "request_processors": [
    {
      "restrict_queries" : {
        "query_types": {
          "prefix" : [
            {
              "field": "foo",
              "value_length" : ">=3",
              "allow": true
            },
            {
              "deny" : true
            }
          ]
        }
      }
    }
  ]
}

So, with this pipeline, I'm saying that I generally don't want to allow prefix queries, because they can be expensive and I don't to risk bad queries overloading my cluster, but I'll allow them for field foo as long as the prefix has length at least 3 (so foo:cat* is cool, foo:c* is not).

In case you're not familiar with search pipelines, this can be activated on search either explicitly:

POST /my_index/_search?search_pipeline=allowed_queries_pipeline
{
  "query" : {
    "query_string: {
      "query": "foo:cat*"
    }
  }
}

Or it can be attached to at specific index:

PUT /my_index/_settings
{
  "index.search.default_pipeline": "allowed_queries_pipeline"
}

Now, in this (#11053) issue, I'm wondering if (orthogonally) there's a good way to do this with some authZ/authN if there's an authenticated user, so we can specifically say "these users can run these queries, but other users cannot".

I see three possible solutions (but maybe none of them is right).

Processor receives auth context

In this scenario, we propagate some (waving hands) auth context into the request processor and based on the processor's configuration, it makes decisions based on that auth context.

The PUT pipeline request could be something like:

PUT /_search/pipeline/allowed_queries_pipeline
{
  "request_processors": [
    {
      "restrict_queries" : {
        "query_types": {
          "prefix" : [
            {
              "field": "bar",
              "value_length" : ">=2",
              "auth_role": "poweruser", <----------
              "allow": true
            },
            {
              "field": "foo",
              "value_length" : ">=3",
              "allow": true
            },
            {
              "deny" : true
            }
          ]
        }
      }
    }
  ]
}

This might be the most user-friendly API (at least for this specific use-case), since we can restrict individual rules to specific users/roles, but it also means building the logic into the pipeline processor itself (which means that every processor that wants to deal with users would need its own logic to do so).

IMO, that con makes it feel a bit icky from a "separation of concerns" standpoint.

Authorized access to search pipelines

Another approach could involve adding authorization to search pipelines and then using different pipelines for different users/roles:

PUT /_search/pipeline/poweruser_allowed_queries_pipeline
{
  "request_processors": [
    {
      "restrict_queries" : {
        "query_types": {
          "prefix" : [
            {
              "field": "bar",
              "value_length" : ">=2",
              "allow": true
            },
            {
              "field": "foo",
              "value_length" : ">=3",
              "allow": true
            },
            {
              "deny" : true
            }
          ]
        }
      }
    }
  ]
}

Then you might limit specific roles to specific pipelines:

PUT _plugins/_security/api/roles/regular_user
{
    "index_permissions": [{
    "index_patterns": [
      "my_index"
    ],
    "allowed_actions": [
      "read"
    ],
    "search_pipelines: [
      "allowed_queries_pipeline"
    ]
  }]
}

PUT _plugins/_security/api/roles/poweruser
{
    "index_permissions": [{
    "index_patterns": [
      "my_index"
    ],
    "allowed_actions": [
      "read"
    ],
    "search_pipelines: [
      "allowed_queries_pipeline",
      "poweruser_allowed_queries_pipeline"
    ]
  }]
}

In this case, the search pipeline processor doesn't know anything about users or roles, which is nice. It's more complicated for the user, since they need to manage different pipelines for different roles. We can improve that situation by allowing pipeline delegation (i.e. the pipeline processor, which currently exists for ingest pipelines, but not search pipelines).

Make it an aliasing problem

This is kind of like the second solution ("Authorized access to search pipelines"), but we use aliases to avoid needing to add any new authorization logic.

If we add a search_pipeline parameter to an index alias (and modify the current logic so that it cannot be overridden at query time), we can achieve the same sort of thing as above:

PUT /my_index/_alias/my_index_alias
{
  "search_pipeline": "allowed_queries_pipeline"
}

PUT /my_index/_alias/poweruser_my_index_alias
{
  "search_pipeline": "poweruser_allowed_queries_pipeline"
}

PUT _plugins/_security/api/roles/regular_user
{
    "index_permissions": [{
    "index_patterns": [
      "my_index_alias"
    ],
    "allowed_actions": [
      "read"
    ]
  }]
}

PUT _plugins/_security/api/roles/poweruser
{
    "index_permissions": [{
    "index_patterns": [
      "my_index_alias",
      "poweruser_my_index_alias"
    ],
    "allowed_actions": [
      "read"
    ]
  }]
}

This way, both roles only have permission to query via the aliases, but the aliases attach the search pipelines.

This may be the easiest to implement and has the advantage that we're just reusing existing controls. It has all the downsides of the second option, though, plus the extra cognitive burden of the alias hop.

@msfroh
Copy link
Collaborator Author

msfroh commented Nov 3, 2023

Note that search pipeline processor can do almost anything you want with a search request or response, so a good auth mechanism could also control access to pipelines so that e.g. only certain roles can use a RAG pipeline to call out to a (potentially expensive) LLM.

@peternied
Copy link
Member

@msfroh Thanks for the details. I've got some ideas about how we handle this space that I've been trying to capture - maybe this is right time so please forgive any hand waving.

I don't think a GET /index-*/_search request has enough context is properly captures its intentions. For the most important search use cases, there are locked in parameters, some flexible ones - there is a whole domain language for OpenSearch that can be used to improve its accuracy and performance. The effort to dial in this isn't really captured all in once place, its spread across a couple of system within OpenSearch that are a union of index configuration, security settings, user roles, search parameters, search templates, search pipeline, etc. I think there should be a new concept that bundles these together.

I've been thinking of this a View[1] where all of the metadata is centralized into a single place. For this kind of scenario - I think associating the search pipeline to the view might be a way to manage access, so if a user has access to GET /_view/salesTransactions/_search you don't need the same individually granular permissions

I'm in the process of carving out time to dedicate to this effort 🤞 - as it would help the security team vastly with caching & performance, as well as some scenarios that are hard to manage such as conflicting permissions between different roles [2]. I think this would be the way I'd recommend making access controls availiable for search pipelines too, how does that sound to you?


Why not just allow ACLs on search pipelines?
My core problem is about predictability and conveying failure state. If a user makes a search request to 'index-*' the permissions evaluation has to expand that name to all the indices and make sure they have access, then for each indices if there sub components that could have permissions they need to be resolved too - such as search pipelines. At a single point in time this is possible and understandable. However over time what happens a default search pipeline associated with an index was changed - the authorization might have a different result. If it were to fail, how would the user making the request understand - often I would say they won't unless they were the one that modified the search pipeline.

Being able to cache authorization information isn't possible if the configuration is embedded inside resources that need to be resolved.

@cwperks
Copy link
Member

cwperks commented Nov 3, 2023

Thank you for the rich detail @msfroh. I'll try to provide some context around how some other plugins have implemented attribute-based access control (ABAC) within a plugin and outside of the security plugin.

Another approach could involve adding authorization to search pipelines and then using different pipelines for different users/roles

I think this approach may align with how I have seen plugins secure resources created by the plugin. I'm most familiar with how anomaly detection and ml-commons secure detectors or models created by their plugins, but I think other plugins follow a similar pattern as well. If I understand correctly, in your example the pipeline is the resource to be shared and it can be provisioned using a pattern similar to how AD or ml-commons protect the resources created in their plugins.

  1. Documentation on AD Security: https://opensearch.org/docs/latest/observing-your-data/ad/security/#advanced-limit-access-by-backend-role
  2. Documentation on ML commons Security: https://opensearch.org/docs/latest/ml-commons-plugin/model-access-control/#model-access-control-prerequisites

Edit: It would be great if the security plugin could be used for such cases. AD and ml-commons implement ABAC independently for their plugins use-cases, but the security plugin should be able to provide some mechanisms that plugins can take advantage of to secure resources created within the plugin.

@msfroh
Copy link
Collaborator Author

msfroh commented Nov 7, 2023

Thanks, @peternied and @cwperks!

I added a comment on #6181, since that really sounds like the best way to unify some of these security + search pipelines ideas and where I think the two work streams can really help each other out.

That is, if we can have a "view" concept that has access controls and binds to a specific search pipeline (without letting folks override the pipeline in their request), we can implicitly limit access to specific pipelines (both in terms of preventing folks from using pipelines unless authorized and in terms of preventing folks from bypassing pipelines).

In exchange, search pipelines may help the security plugin by taking on DLS duties (and maybe FLS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Query Capabilities
Projects
Status: Later (6 months plus)
Development

No branches or pull requests

4 participants