Data projection with views #6181

peternied · 2023-02-03T23:11:58Z

Is your feature request related to a problem? Please describe.

Sometimes there are clear relationships between indices, e.g. http-logs-2023-01-20 http-logs-2023-01-21. As data gets reshaped or physically moved there is a desire to preserve how the data is referenced. OpenSearch Dashboards has a feature around this called index patterns that doesn't exist in the backend.

If there was a way to create a logical grouping of these physical storage mediums the responsibilities between data usage and ingestion could be separated. I think this would be a big win for lower maintenance of OpenSearch clusters over time.

Describe the solution you'd like

In SQL there are tables and views, views offer flexibility and centralized management, see great answers on this stack overflow question What is a good reason to use SQL views? Pulling from the great answer by user210748 I'd suggest this system does the following:

Views can join and simplify multiple indices into a single virtual index
Views can act as aggregated tables, where the database engine aggregates data (sum, average etc) and presents the calculated results as part of the data
Views can hide the complexity of data; for example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying indices
Views take very little space to store; the database contains only the definition of a view, not a copy of all the data it presents
Depending on the SQL engine used, views can provide extra security
Views can limit the degree of exposure of an index or indices to the outer world

Describe alternatives you've considered

Aliases

OpenSearch already has aliases that represent a virtualized view, maybe they could be built up to offer these additional features. Alternatively, there are some quirks like the is_write_index that we might want to be careful around.

https://opensearch.org/docs/latest/opensearch/index-alias/

Data streams

Data streams are a virtualized view focused on managing the physical storage, maybe they could be built up to handle data projections filtering.

https://opensearch.org/docs/latest/opensearch/data-streams/

Additional context

Coming from the security plugin, there are features for document level security (DLS), field level security, and field masking. These features are built into index permissions and they are kind of clunky where a query to apply DLS has to be double-encoded in the json body. Views could easily encompass these scenarios. Modeling view creation and management as a separately from managing permissions to the views is a cleaner separation compared to what is available in the security plugin.

anasalkouz · 2023-02-07T21:25:08Z

@peternied Thanks for the proposal. Could you elaborate more on the use-case? I don't related the connection between rotated indexes and the materialized views. Can you elaborate more about the security feature use-cases?

nandi-github · 2023-04-28T17:19:53Z

@peternied Reading through your description, it does look like an access control /security use-case. You can help us understand the permission better

peternied · 2023-10-18T17:11:21Z

Attended the Search Relevance - Triage & Backlog Review Triage meeting today and had an opportunity to bring up this issue - thanks @macohen

@austintlee The views are often used in the context data lakes for security features. Materialize views have uses - but have a substainal implementation cost - would it be worthwhile to users that would want this feature? Are partners that would be interested in testing out how this works and could feedback, that would be important to figure out a release/investment plan.
@reta Have you looked into https://opensearch.org/docs/1.2/opensearch/search-template/, could that be used for this scenario?

I'll (@peternied) will continue to iterate on this issue as I get more information. Thanks!

msfroh · 2023-11-06T20:11:36Z

I really like this idea and I think it can combine with search pipelines to open some really exciting possibilities.

I'm imagining a scenario where:

A view has a simple name so they're easy to resolve (just a lookup, no pattern matching) -- which would hopefully help with access-control resolution.
A view cannot be used as the target of an index / bulk / update request. (I could see some value in letting a view be the "source" of a reindex request, as a way of creating a materialized view. Personally, I would delay that to a later release, though.)
A view can front an arbitrary index pattern.
A view can have an attached search pipeline, which is not overridable. We could reasonably replace DLS with a FilterQueryRequestProcessor. If you want to use a different search pipeline or no search pipeline, use a different view. (This would address @besha100's comment on [Search Pipelines] Add a processor that provides fine-grained control over what queries are allowed #10938 about disallowing disabling of a search pipeline.)

peternied · 2023-11-30T16:24:29Z

I'll be digging into this feature more, tracking with [Spike] Proposal document for Views feature security#3561

peternied · 2023-12-22T01:36:00Z

Wanted to provide an update before the holidays arrived - I've got a functional POC in OpenSearch [1] and the Security Plugin [2] alongside a breakdown design and implement for an experimental release [3]. After returning from the break we will see about a demo and more feedback.

peternied · 2024-01-26T18:57:11Z

Projected Views #11957 is in draft to add basic version of this feature and I've copied the high level design for how the feature works:

sequenceDiagram
    participant Client
    participant HTTP_Request as ActionHandler
    participant Cluster_Metadata as Cluster Metadata Store
    participant Data_Store as Indices

    Client->>HTTP_Request: View List/Get/Update/Create/Delete<BR>/views or /views/{view_id}
    HTTP_Request->>Cluster_Metadata: Query Views
    alt Update/Create/Delete
        Cluster_Metadata->>Cluster_Metadata: Refresh Cluster
    end
    Cluster_Metadata-->>HTTP_Request: Return
    HTTP_Request-->>Client: Return

    Client->>HTTP_Request: Search View<br>/views/{view_id}/search
    HTTP_Request->>Cluster_Metadata: Query Views
    Cluster_Metadata-->>HTTP_Request: Return
    HTTP_Request->>HTTP_Request: Rewrite Search Request
    HTTP_Request->>HTTP_Request: Validate Search Request
    HTTP_Request->>Data_Store: Search indices
    Data_Store-->>HTTP_Request: Return
    HTTP_Request-->>Client: Return

jainankitk · 2024-02-16T02:02:36Z

@peternied - Thank you for proposing this idea. While some of the aspects around access control / security make sense to me, I am unable to reason other benefits of views compared to alias/index-pattern for Opensearch. Does it make more sense to extend alias for this purpose instead of introducing another concept views as first class citizen of Opensearch. Can you help me understand how are we designing functionalities as suggested by user210748:

Views can join and simplify multiple indices into a single virtual index.

SQL literally joins multiple indices for querying related information across multiple indices using the single virtual index. I don't see any such correlation across indices or shards in opensearch. The only join supported by Opensearch is using parent/child relation which is limited to single shard, not even index.

Views can act as aggregated tables, where the database engine aggregates data (sum, average etc) and presents the calculated results as part of the data

The views in SQL are nothing but named query and schema-on-write makes easily translates the queries on views into bigger query on original datasets. I am unable to understand how are we planning to aggregate the information across potentially unrelated indices.

Views can hide the complexity of data; for example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying indices
Views take very little space to store; the database contains only the definition of a view, not a copy of all the data it presents
Views can limit the degree of exposure of an index or indices to the outer world

We should be able to achieve this using alias!?

Depending on the SQL engine used, views can provide extra security

Can you expand more on this?

The above diagram gives some idea about the request/response flow for CRUD API, but I am really interested in how we are planning to compose the result together from potentially completely different indices, without tying them together to specific schema.

jainankitk · 2024-02-19T21:11:32Z

There is great work in search performance - and views impacts reducing overhead for privilege evaluation. When a search request is created, such as GET indexes-*/_search the permissions evaluation system had to transform the wildcard pattern into a list of concrete indices, then validate that the user making the request has permission to those indices. This result cannot be trivially* be cached because 1) the wildcard expression resolves can change over time and 2) permissions of a user to an index can change.

This is where the alias fits perfectly. We can restrict the permissions to an alias for specific users/groups without worrying about the underlying indices the alias is getting mapped to.

When considering API design and the kinds of information that needs to be available to a permissions system, ensuring that all reasoning on a request is visible in the request itself amplifies that cacheability. By building an API which insures that for an identity the permission evaluation will always be the same it can save considerablity on cpu/memory overhead. These benefits are only viable with a new access model as alias and index patterns as designed work in opposition.
* [*] The union of this information _could_ be cached - but it hasn't been pursued deeply because on A indexes * (B identities * C role mappings * D roles) varies greatly in different cluster configurations making this cache could be considerable in size - and often see cache flush to add/removing an index.

Can you please expand this to help me understand the viability only with a new access model compared to alias or index patterns. Also even if there are minor limitations with using alias, we should be looking to augment that instead of introducing completely new concept called "views".

I'm going to say something strange - OpenSearch uses joins all the time.

OpenSearch uses joins even thought conceptually we don't think of them of them in the SQL architecture sense. In OpenSearch each index is a 'table', and when we make a query GET reports-2024-*/_search we are joining all of those 'tables' together in a projected output. Often these indexes' have mapping properties that are aligned, but they don't have to. When we use index patterns with a wildcard - we are implicitly performing a joins on these queries.

I believe you're confusing unions with joins, especially if you consider common log analytics use case of Opensearch. If I am looking for monthly aggregation of 4xx/5xx http status codes within log* index, it is nothing but unioning the results from different indices. Whereas SQL joins are used to run operations on related albeit very different data sets.

peternied · 2024-02-19T21:30:05Z

I believe you're confusing unions with joins

Yup - I did! I'll correct that in the previous comment

msfroh · 2024-02-21T22:16:21Z

This is where the alias fits perfectly. We can restrict the permissions to an alias for specific users/groups without worrying about the underlying indices the alias is getting mapped to.

@peternied -- This is a good point. Can we manage permissions on aliases?

I feel like there was some other reason why aliases are not a good fit, but I'm struggling to remember.

peternied · 2024-02-21T22:51:53Z

@msfroh maybe this sparks something; in the Security Plugin - aliases don't have permissions concepts around them. When you use an alias, or an index pattern foo-* the security plugin resolves these to the concrete indexes and then checks the user permissions on that concrete list.

So a user could run a query GET my-alias/_search could return 200 and then an admin changes the underlying pointer and it starts to return 403 - not because you don't have access to the alias, but because the alias -> index mapping changed.

In my mind, there is an existing conceptual model that are users are aware of, GET BLANK/_search it works in a very specific way and if we change that it will create confusion around security controls. If there is a new thing GET _views/BLANK/_search it creates that opportunity to change the permissions model and the implications of that model in a clear opt-in pattern. Query authors won't mistake a view for an alias, or index, or index pattern - even though they can perform many of the same operations on them.

I can see the argument that an opt-in model is not a feature, but a bug. There are other manageability issues and historical features that we might not want to support, but I think those concerns can be built up and mitigated.

jainankitk · 2024-02-22T01:09:55Z

So a user could run a query GET my-alias/_search could return 200 and then an admin changes the underlying pointer and it starts to return 403 - not because you don't have access to the alias, but because the alias -> index mapping changed.

IMO, that is the correct behavior. Even in SQL world, Permissions need to be granted to the person executing the query for every object referenced by the view. Except if the referenced object is owned by the view owner. In which case, the authorization decision is made using ownership chains. Should we introduce the concept of ownership to views and indices in OpenSearch?

reta · 2024-02-22T14:16:01Z

IMO, that is the correct behavior. Even in SQL world, Permissions need to be granted to the person executing the query for every object referenced by the view.

:+1, the views are "moving targets" and not designated users may gain unexpected permissions

Should we introduce the concept of ownership to views and indices in OpenSearch?

I think this may not be applicable to OpenSearch at large, it may change int the future but the identity is optional. And it still opens up the hole in a system since the view could be created with * pattern.

jainankitk · 2024-02-22T17:36:09Z

In my mind, there is an existing conceptual model that are users are aware of, GET BLANK/_search it works in a very specific way and if we change that it will create confusion around security controls.

I have been thinking of security as new feature for alias, instead of changing existing model.

If there is a new thing GET _views/BLANK/_search it creates that opportunity to change the permissions model and the implications of that model in a clear opt-in pattern.

I am wondering if there should be explicit _views qualifier while querying them. Probably, the end user should be agnostic of whether they are querying view or an index?

jainankitk · 2024-02-22T17:40:19Z

So a user could run a query GET my-alias/_search could return 200 and then an admin changes the underlying pointer and it starts to return 403 - not because you don't have access to the alias, but because the alias -> index mapping changed.
+1, the views are "moving targets" and not designated users may gain unexpected permissions

Does that mean views can run into similar scenario as alias if the view -> index mapping changed?

reta · 2024-02-22T19:39:08Z

Does that mean views can run into similar scenario as alias if the view -> index mapping changed?

The alias model does not bundle permissions - the individual indices behind the alias are checked

msfroh · 2024-02-22T19:45:27Z

IIRC, the specific point of a view in the context of this issue is that the permissions are on the view.

If the view is updated to point to a different index (or indices), then, yes, the user would be able to query that different index (through the view).

reta · 2024-02-22T19:47:36Z

If the view is updated to point to a different index (or indices), then, yes, the user would be able to query that different index (through the view).

I believe view could use index patterns, right? If yes - no updates are needed

jainankitk · 2024-02-22T19:52:28Z

If the view is updated to point to a different index (or indices), then, yes, the user would be able to query that different index (through the view).

I am wondering, how are we enforcing permissions on the index in this case. Can this result in some escalation of privilege? User might have been explicitly denied for index A, but might have access to view pointing to index A. IAM resolves this by having pass role permission, I guess Ownership Chains does similar in SQL world. But I might be wrong.

peternied · 2024-02-22T23:44:37Z

@msfroh @reta @jainankitk This is really good discussion. I've created an RFC [1] to discuss the problem space - I think that will lead to better alignment before I jump into low level implementation details.

[1] [RFC] Aligning Access and Visibility in OpenSearch security#4069

jainankitk · 2024-02-23T20:15:50Z

@msfroh @reta @jainankitk This is really good discussion. I've created an RFC [1] to discuss the problem space - I think that will lead to better alignment before I jump into low level implementation details.
* [1] [[RFC] Aligning Access and Visibility in OpenSearch security#4069](https://github.com/opensearch-project/security/issues/4069)

Thanks @peternied for getting this started. Can we also add below question to the above or separate issue?

_I am wondering if there should be explicit views qualifier while querying them. Probably, the end user should be agnostic of whether they are querying view or an index?

peternied · 2024-02-23T20:34:34Z

if there should be explicit views qualifier while querying them.

Until we are aligned on the problem being solved I don't think we can reason over this implementation detail, lets circle back around to this one. I think doing a broader API/name review will be required and this topic will come up during those discussions.

peternied added enhancement Enhancement or improvement to existing feature or request untriaged Indexing & Search labels Feb 3, 2023

anasalkouz added discuss Issues intended to help drive brainstorming and decision making feature New feature or request and removed untriaged labels Feb 7, 2023

anasalkouz added Migration:ReqReview and removed Migration:ReqReview labels Mar 16, 2023

peternied mentioned this issue Jun 5, 2023

[FEATURE] Extend field level security opensearch-project/security#2834

Open

This was referenced Aug 30, 2023

[Search Pipelines] How should we handle default pipelines for multiple indices? Aliases? Wildcards? #7512

Closed

[BUG] "Field level security" and "Field masking definitions" don't work together with "Document level security" opensearch-project/security#3274

Open

peternied mentioned this issue Oct 17, 2023

[Spike] Proposal document for Views feature opensearch-project/security#3561

Closed

3 tasks

peternied mentioned this issue Nov 3, 2023

[Search Pipelines] Modify search pipeline behavior based on authenticated user/role #11053

Open

msfroh mentioned this issue Nov 6, 2023

[Search Pipelines] Apply default search pipelines to requests that don't target a single index #11058

Closed

This was referenced Nov 13, 2023

[Search Pipelines] Add a processor that provides fine-grained control over what queries are allowed #10938

Open

Configurable query logging using search pipelines #11188

Open

peternied mentioned this issue Nov 30, 2023

[BUG] Role with Document-level security (DLS) masks more generic permissions opensearch-project/security#3773

Closed

This was referenced Jan 19, 2024

Projected Views #11957

Merged

[BUG] dfm_empty_overrides_all removes DLS also if other role has no explicit data/read permission opensearch-project/security#3963

Open

msfroh mentioned this issue Feb 2, 2024

[PROPOSAL][Query Sandboxing] Query Sandboxing high level approach. #11173

Open

peternied mentioned this issue Feb 6, 2024

[DOC] Views opensearch-project/documentation-website#6363

Closed

4 tasks

peternied mentioned this issue Feb 14, 2024

[Feature Request] Views features wishlist #12322

Closed

5 tasks

msfroh mentioned this issue Feb 19, 2024

[RFC] Search Query Sandboxing: User Experience #12342

Open

mgodwan added Search Search query, autocomplete ...etc and removed Indexing & Search labels Feb 21, 2024

github-project-automation bot added this to Search Project Board Feb 21, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Feb 21, 2024

peternied mentioned this issue Feb 21, 2024

[RFC] View feature wishlist #12424

Open

5 tasks

peternied mentioned this issue Mar 25, 2024

Unable to update index pattern opensearch-project/security#4112

Closed

msfroh mentioned this issue Apr 2, 2024

Qsb framework changes #13007

Closed

8 tasks

This was referenced Apr 6, 2024

[RFC] Improving Dls/Fls with Views #13108

Open

[RFC] Expanding DLS/FLS using Views #13137

Closed

msfroh mentioned this issue Apr 23, 2024

[Feature Request][RFC] Multi-tenancy as a construct in OpenSearch #13341

Open

getsaurabh02 added this to OpenSearch Roadmap May 31, 2024

github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024

getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024

Bukhtawar mentioned this issue Aug 19, 2024

Adding @jainankitk as a Maintainer #15304

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data projection with views #6181

Data projection with views #6181

peternied commented Feb 3, 2023

anasalkouz commented Feb 7, 2023

nandi-github commented Apr 28, 2023

peternied commented Oct 18, 2023

msfroh commented Nov 6, 2023

peternied commented Nov 30, 2023

peternied commented Dec 22, 2023

peternied commented Jan 26, 2024

jainankitk commented Feb 16, 2024

jainankitk commented Feb 19, 2024

peternied commented Feb 19, 2024

msfroh commented Feb 21, 2024

peternied commented Feb 21, 2024

jainankitk commented Feb 22, 2024

reta commented Feb 22, 2024

jainankitk commented Feb 22, 2024

jainankitk commented Feb 22, 2024

reta commented Feb 22, 2024

msfroh commented Feb 22, 2024 •

edited

Loading

reta commented Feb 22, 2024

jainankitk commented Feb 22, 2024

peternied commented Feb 22, 2024

jainankitk commented Feb 23, 2024

peternied commented Feb 23, 2024

Data projection with views #6181

Data projection with views #6181

Comments

peternied commented Feb 3, 2023

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Aliases

Data streams

Additional context

anasalkouz commented Feb 7, 2023

nandi-github commented Apr 28, 2023

peternied commented Oct 18, 2023

msfroh commented Nov 6, 2023

peternied commented Nov 30, 2023

peternied commented Dec 22, 2023

peternied commented Jan 26, 2024

jainankitk commented Feb 16, 2024

jainankitk commented Feb 19, 2024

peternied commented Feb 19, 2024

msfroh commented Feb 21, 2024

peternied commented Feb 21, 2024

jainankitk commented Feb 22, 2024

reta commented Feb 22, 2024

jainankitk commented Feb 22, 2024

jainankitk commented Feb 22, 2024

reta commented Feb 22, 2024

msfroh commented Feb 22, 2024 • edited Loading

reta commented Feb 22, 2024

jainankitk commented Feb 22, 2024

peternied commented Feb 22, 2024

jainankitk commented Feb 23, 2024

peternied commented Feb 23, 2024

msfroh commented Feb 22, 2024 •

edited

Loading