Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Object level security, Phase 1 #93115

Merged
merged 18 commits into from
Apr 12, 2021
Merged

Conversation

legrego
Copy link
Member

@legrego legrego commented Mar 1, 2021

Summary

This RFC describes the first phase of Object Level Security (#82725)

View rendered RFC

@legrego legrego added the RFC label Mar 5, 2021
@legrego legrego marked this pull request as ready for review March 5, 2021 14:48
@legrego legrego added the Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! label Mar 5, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-security (Team:Security)

@legrego legrego added release_note:skip Skip the PR/issue when compiling release notes v8.0.0 labels Mar 5, 2021
@legrego legrego requested review from jportner, azasypkin and a team March 5, 2021 14:51
@mattkime
Copy link
Contributor

mattkime commented Mar 5, 2021

There's one more use case that I think should be accounted for - telemetry. I think the needs regarding collecting telemetry on saved objects would be filled by some sort of administrator role but its worth specifically calling out just to be safe.

@legrego
Copy link
Member Author

legrego commented Mar 6, 2021

There's one more use case that I think should be accounted for - telemetry. I think the needs regarding collecting telemetry on saved objects would be filled by some sort of administrator role but its worth specifically calling out just to be safe.

@mattkime to clarify, are you interested in making sure that private object telemetry is reportable? If so, I don't think any of the changes in this RFC will impact telemetry collection one way or another. As far as I'm aware, telemetry uses a bare saved objects client, which does not include our security wrapper. This wrapper is what would be responsible for securing access to private objects.

If that addresses your concern, then I'll update the RFC to make note of this. Otherwise I'm happy to discuss further

@mattkime
Copy link
Contributor

mattkime commented Mar 8, 2021

@legrego That sounds good to me. My concern comes from working with spaces where there's not really a way to work with saved objects across spaces.

Copy link
Contributor

@jportner jportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like it and it sounds straightforward. Nits/feedback below.

We haven't really said this anywhere but this proposal is basically to implement Discretionary Access Control (DAC) for Saved Objects. In future phases we would allow an object's owner (subject) to grant read/write to other users or roles (subjects).

Another thing to consider: do we ever want to make a private object owned by a role instead of a user? Maybe that would be a totally separate classification -- not 'public' or 'private' but something else ('privateGroup').

rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
Comment on lines 148 to 149
#### Performance considerations
When overwriting existing objects, the security wrapper must first retrieve all of the existing `private` objects to ensure that the user is authorized. This requires another round-trip to `get`/`bulk-get` all `private` objects so we can authorize the operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SOR, we already do pre-flight requests for multi-namespace object types when creating, updating, or deleting objects, because we need to ensure that the object exists in the context of the current space.

Rather than adding an additional pre-flight request to the Security wrapper, perhaps we should consider exposing the existing pre-flight requests from the SOR to its wrappers? Maybe allow the wrapper to pass in a handler/callback for those requests. Maybe that's an optimization we can make in a later phase, but I wanted to put it forward for consideration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for calling this out, I meant to explain my thinking here. I agree that an additional pre-flight request from the security wrapper is less than ideal, but I'm hesitant to over-engineer an optimal solution in this first phase, since I don't expect that we'll make heavy use of private multi-space objects in the near future.

Refresh my memory: why do the current set of pre-flight namespace checks live in the repository? I want to make sure I'm not overlooking a scenario where the repository should also be validating the accessControl.

Copy link
Contributor

@jportner jportner Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you take an action on an object, the repository has to retrieve it and make sure the object exists in the current space context. This was not needed when we had single-namespace objects (whose raw document IDs contained the namespace), but now that we have multi-namespace objects we need to do that preflight check.

At the moment the preflight check only occurs on multi-namespace object types, which is rare, but in the future most object types will be multi-namespace...

Edit: agree that we don't need to necessarily over-engineer something for Phase 1, but I also think we should plan ahead for the future when the performance hit will be more egregious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have been clearer: why does the repository have to perform the check, as opposed to say, the spaces wrapper?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if the Spaces plugin is disabled, the repository should still behave this way (e.g., you should only be able to interact with objects that were in the default space before the Spaces plugin was disabled)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this auditing limitation might come up in a couple of other places. Do we have the same problem when:

  • updating a namespaceType: 'multiple' saved objects that exists, but not in the current namespace?
  • getting or updating a saved object that doesn't exist?
  • performing an operation which doesn't succeed because of a network error?

Yes, these situations are all possible today. We "mitigate" this by explaining that the audit log records an outcome of unknown for these scenarios. If you want a definitive answer, then you have to correlate Kibana audit logs with the corresponding Elasticsearch audit logs to be sure that the request actually completed. We can definitively tell you when the request has failed due to authorization checks in the wrapper, but beyond that the audit log will effectively tell the auditor that the user was authorized to perform this action, but we don't know if it completed successfully or not.

I'm quite nervous about the complexity of saved objects as a whole. My impression is that there's only a handful of people that somewhat grasp how it all fits together (and I find myself constantly learning some new behaviour like audit logging limitations). This implementation complexity also comes with a very complex set of tests where the amount of permutations keep steadily growing. So anything that could lower complexity feels worthwhile to explore.

No disagreement here on any of this!

We can always decide to adopt some technical debt / complexity in the short term, but I think a large part of the value of an RFC is starting to shape a more concrete long term technical vision so that if we adopt additional complexity we have some fleshed out ideas for how we can pay it down.

I can spend some time on this if that'd be helpful. Long-term, it seems like the structure of the saved objects service may change somewhat drastically (i.e. potentially deprecating the SOC and generic APIs). I know we've had some long-form discussions, but do we have anything more concise that explains what we're targeting? We probably do, I just can't find it 🙂.
If I know what we're targeting, then I can try to tailor the approach to match that vision.
If not, I can enumerate these other ideas with the assumption that the current architecture will remain largely in place, and then we can figure out what that looks like in the evolved architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have anything more concise that explains what we're targeting

We need another RFC 🙈

I'm working on writing down a longer term strategy in 7.13. I think the most impactful change for auditing might be deprecating the saved objects client and generic API's in favour of plugins building custom HTTP API's. Previously auditing had to happen on the saved objects level, but if we're adding another layer into our architecture is that still the best place?

Maybe as a first step we can start by thinking through the ideal end state of audit logging and work back from that. This might be better suited to a sync discussion, but some questions we could explore:

  • If all data access happened through HTTP API's which either responded with 401, error or success could the URL and status code be an adequate replacement for generic saved_object_create/get/find/delete events?
  • there seems to be two classes of audit logs: generic saved_object_* events and events that are specific to the alerting domain, do we anticipate wanting to add more domain specific audit events?
  • from an auditing perspective what do we consider as interesting security events that needs to be logged. If a plugin has a hidden saved object exposed through an HTTP API and rejects a request because of some domain specific business rule, should this create an event? E.g. "only users with super-powers are allowed to break through steel doors"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as a first step we can start by thinking through the ideal end state of audit logging and work back from that. This might be better suited to a sync discussion, but some questions we could explore:

I forked this off into a shared doc to continue the discussion in a more sane medium than GH comment threads

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rudolf I started describing some alternate approaches in 682f6c8 (#93115)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the alternatives, I'm currently favoring 5.2.2 - Pluggable authorization, but not necessarily coupled to this first phase of OLS. I think this model could work within the context of our "Persistence layer"/"Saved objects are data" discussions.

I'm not opposed to reevaluating if our timelines end up converging, though

rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
Copy link
Member

@azasypkin azasypkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK: started to review, but haven't managed to finish it yet, will continue tomorrow morning.

rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
#### Attaching Access Controls

This wrapper will be responsible for attaching an access control specification to all private objects before they are created.
It will also allow users to provide their own access control specification in order to support the import/create use cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: how exactly is this supposed to support import use case? You mean it will provide a way to specify what owner should be set for all private objects during import?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each private object will have its access specification attached during export, so that it may be preserved on import

rfcs/text/0015_ols_phase_1.md Outdated Show resolved Hide resolved
Comment on lines 148 to 149
#### Performance considerations
When overwriting existing objects, the security wrapper must first retrieve all of the existing `private` objects to ensure that the user is authorized. This requires another round-trip to `get`/`bulk-get` all `private` objects so we can authorize the operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @kobelb. We decided to build the namespace infrastructure into the repository and the same might make sense for private objects. Especially now that licensing is not the primary driver for how we architect these layers.

This wrapper will be responsible for attaching an access control specification to all private objects before they are created.
It will also allow users to provide their own access control specification in order to support the import/create use cases.

Similar to the way we treat `namespaces`, it will not be possible to change an access control specification via the `update`/`bulk_update` functions in this first phase. We may consider adding a dedicated function to update the access control specification, similar to what we've done for sharing to spaces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should changing the ACL / owner be a separate API? Could we just prevent changing owners from the client but allow changing the acl from the repository?

This is very much forward-looking, but in a world where we've deprecated the saved objects client and plugins only consume the repository from behind their API controllers, it feels like it could simplify things if a plugin is able to just update a saved object and it's ACL in a single api method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect to need either of these mechanisms for this first phase of work, so this is something we can discuss in more detail when the time comes. My primary motivation for making this a separate API is audit logging. Auditors might want to know when an object has changed owners, or when a user has chosen to share their object with another user.

The normal update API doesn't record this level of granularity in the audit logs. We could add conditional logic to audit ACL updates differently from other updates, but my preference would be to treat this as a distinct operation.

Secondly, it's plausible that a user would be authorized to update a saved object, but unauthorized to adjust the ACL. Allowing the ACL to be updated via update would further complicate the authorization logic here.

* * public (default): instances of this saved object type will be accessible to all users within the given namespace, who are authorized to act on objects of this type.
* * private: instances of this saved object type will belong to the user who created them, and will not be accessible by other users, except for administrators.
*/
export type SavedObjectsAccessClassification = 'public' | 'private';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a saved object is public it means we can quickly skip the ACL checking performance cost. But how does this fit into the longer term vision of OLS? I understand that transitioning from private to public is out of scope, but if this is something that's applied to the saved object type instead of a single object it wouldn't be possible to later share this saved object.

It feels like restricting this to a type moves us away from true OLS where the permissions are solely determined by the object itself. Is private types just technical debt until we can solve the performance problems or do we expect to always have this distinction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. When I discussed this first phase with @arisonl, he cautioned that the future of OLS is uncertain at this point. While the meta issue describes a potential end state, it's not clear if/when sharing "public" objects (such as dashboards) will be possible.

Having said that, I could envision a third type of classification for objects that are sharable to other users. At that point, we could design the transition for moving public objects to become sharable. My current thinking is that private objects will never be sharable. They contain information that is specific to the current user, so it doesn't make sense to allow them to be shared.


# 6. Adoption strategy

Adoption for net-new features is hopefully straightforward. Like most saved object features, the saved objects service will transparently handle all authorization and auditing of these objects, so long as they are properly registered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through our security model, we would only be able to enforce security if Elasticsearch is behind a firewall preventing users from connecting directly and users don't have access to the developer console. Since the same is true of spaces is this something users are familiar with, I couldn't find anything in our documentation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect that a vast majority of Kibana users will not have direct access to the .kibana index. We have a note in the breaking changes doc which implies this, but we don't do a great job of documenting this today 🙈.

Comment on lines 148 to 149
#### Performance considerations
When overwriting existing objects, the security wrapper must first retrieve all of the existing `private` objects to ensure that the user is authorized. This requires another round-trip to `get`/`bulk-get` all `private` objects so we can authorize the operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have anything more concise that explains what we're targeting

We need another RFC 🙈

I'm working on writing down a longer term strategy in 7.13. I think the most impactful change for auditing might be deprecating the saved objects client and generic API's in favour of plugins building custom HTTP API's. Previously auditing had to happen on the saved objects level, but if we're adding another layer into our architecture is that still the best place?

Maybe as a first step we can start by thinking through the ideal end state of audit logging and work back from that. This might be better suited to a sync discussion, but some questions we could explore:

  • If all data access happened through HTTP API's which either responded with 401, error or success could the URL and status code be an adequate replacement for generic saved_object_create/get/find/delete events?
  • there seems to be two classes of audit logs: generic saved_object_* events and events that are specific to the alerting domain, do we anticipate wanting to add more domain specific audit events?
  • from an auditing perspective what do we consider as interesting security events that needs to be logged. If a plugin has a hidden saved object exposed through an HTTP API and rejects a request because of some domain specific business rule, should this create an event? E.g. "only users with super-powers are allowed to break through steel doors"

Comment on lines 25 to 26
OLS allows saved objects to be owned by individual users. This allows Kibana to store information that is specific
to each user, which enables further customization and collaboration throughout our solutions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the value in being able to designate a saved object as belonging to a specific user identity. But it's probably worth asking, do we also need security? Just like a user is able to completely mess up or delete another user's dashboard would it matter if there wasn't a hard control to prevent accessing a saved object belonging to another user, but this was simply enforced in the UI layer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question. I think there are some cases where recording identity would be enough, but there are other cases where we'd want to enforce security.

Take user preferences as an example: There isn't a ton of value in preventing Alice from reading Bob's preferences, but it would be really troubling if Alice could modify or delete Bob's preferences. Since all users would be authorized to update/delete these SOs, even a traditionally "readonly" user might be authorized to bulk delete the preferences of all other users. That's not something that an administrator could easily fix, unless they happened to have a snapshot of the Kibana index that they were willing to restore.

@legrego legrego requested a review from rudolf April 2, 2021 12:28
Copy link
Contributor

@rudolf rudolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fleshing this out, I'm 👍 for the design.

If we're deprecating saved objects in the long term then having some additional technical debt that's isolated to the wrapper might be ideal. That way if we move it to Kibana Persistence service at a later stage it's easy to remove all private objects technical debt. It also doesn't feel like it makes sense to try to improve the architecture in Saved Objects directly and we should rather use these ideas in how we build the persistence service.

Copy link
Contributor

@jportner jportner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

A few notes:

  • 3.2 Saved Objects API: Looking at create / bulk_create again, in the implementation I think we'll want to ensure that the object's owner is maintained even in the event of an overwrite (e.g., user "larry" created a profile, then user "admin" overwrites that profile -- the profile should still be owned by "larry"). I'm guessing you already intended this behavior, and I don't think it necessarily has to be in the RFC, but I wanted to point it out!
  • 5.2 Re-using the repository's pre-flight checks: I thought of another option as an alternative to a callback as described in 5.2.3. Instead of that, any client wrapper could fetch the object/s on its own and pass that down to the repository in an options field (preflightObject/s?) so the repository can reuse that result if it's defined, instead of initiating an entire additional preflight check. That resolves our problem without much additional complexity.
    Of course we don't want consumers (mis)using this field, we can either mark it as @internal or we could explore creating a separate "internal SOC" interface that is only meant to be used by the SOC wrappers.
  • 8.3 Behavior when security is disabled: I think this is resolved, private objects should be accessible if the Security plugin is disabled

@legrego
Copy link
Member Author

legrego commented Apr 8, 2021

3.2 Saved Objects API: Looking at create / bulk_create again, in the implementation I think we'll want to ensure that the object's owner is maintained even in the event of an overwrite (e.g., user "larry" created a profile, then user "admin" overwrites that profile -- the profile should still be owned by "larry"). I'm guessing you already intended this behavior, and I don't think it necessarily has to be in the RFC, but I wanted to point it out!

I've always considered an overwrite to be the same as a delete + create. In that scenario, I wouldn't expect the object's owner to be maintained.

5.2 Re-using the repository's pre-flight checks: I thought of another option as an alternative to a callback as described in 5.2.3. Instead of that, any client wrapper could fetch the object/s on its own and pass that down to the repository in an options field (preflightObject/s?) so the repository can reuse that result if it's defined, instead of initiating an entire additional preflight check. That resolves our problem without much additional complexity.
Of course we don't want consumers (mis)using this field, we can either mark it as @internal or we could explore creating a separate "internal SOC" interface that is only meant to be used by the SOC wrappers.

I'll update the alternatives to add this suggestion - thanks!

8.3 Behavior when security is disabled: I think this is resolved, private objects should be accessible if the Security plugin is disabled

++ I'll update the RFC to reflect this as well

Copy link
Member

@azasypkin azasypkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!


The security wrapper will ensure that the user is authorized to share/unshare all existing `private` objects.
#### Performance considerations
Similar to the "create / override" scenario above, the security wrapper must first retrieve all of the existing `private` objects to ensure that the user is authorized. This requires another round-trip to `get`/`bulk-get` all `private` objects so we can authorize the operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question:

This requires another round-trip to get/bulk-get all private objects

If I read it correctly addToNamespaces / deleteFromNamespaces work with single objects and we don't need to do a bulk-get here? Feel free to ignore this comment if we plan to extend these APIs to work with multiple objects in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be changing in the future to support bulk operations via a new updateObjectsSpaces function (https://github.com/jportner/kibana/blob/4d0b48e9828e27bc645a2a0b4883acc3cc4a3cc7/src/core/server/saved_objects/service/lib/repository.ts#L1271-L1271).

Otherwise, you're right that the current implementation does not support bulk operations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks!

@legrego legrego added v7.13.0 auto-backport Deprecated - use backport:version if exact versions are needed labels Apr 12, 2021
@legrego
Copy link
Member Author

legrego commented Apr 12, 2021

Thanks all for your feedback and thoughtful comments!

@legrego legrego merged commit 7e2ffc0 into elastic:master Apr 12, 2021
@legrego legrego deleted the rfc/ols-phase-1 branch April 12, 2021 16:27
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Apr 12, 2021
Co-authored-by: Joe Portner <5295965+jportner@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Apr 12, 2021
Co-authored-by: Joe Portner <5295965+jportner@users.noreply.github.com>

Co-authored-by: Larry Gregory <larry.gregory@elastic.co>
Co-authored-by: Joe Portner <5295965+jportner@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed release_note:skip Skip the PR/issue when compiling release notes RFC Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! v7.13.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants