Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append-only privilege for untrusted endpoints #68414

Open
danhermann opened this issue Feb 2, 2021 · 18 comments
Open

Append-only privilege for untrusted endpoints #68414

danhermann opened this issue Feb 2, 2021 · 18 comments
Assignees
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Data Management Meta label for data/management team Team:Security Meta label for security team

Comments

@danhermann
Copy link
Contributor

danhermann commented Feb 2, 2021

In order to grant minimal permissions to untrusted endpoints, we need a privilege that permits append-only indexing, auto-creation of target indices or data streams only if there is an existing template, and prohibits mapping changes.

@danhermann danhermann added >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC :Data Management/Data streams Data streams and their lifecycles labels Feb 2, 2021
@danhermann danhermann self-assigned this Feb 2, 2021
@elasticmachine elasticmachine added Team:Security Meta label for security team Team:Data Management Meta label for data/management team labels Feb 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@ph
Copy link
Contributor

ph commented Feb 3, 2021

@danhermann ping us when it's ready for testing.

@bytebilly
Copy link
Contributor

@danhermann have you already considered to use create_doc? In 7.x it allows mapping updates, but it won't anymore in 8.x. For the index creation, is create_index too permissive?

@danhermann danhermann changed the title Append-only role for untrusted endpoints Append-only privilege for untrusted endpoints Feb 4, 2021
@scunningham
Copy link

@danhermann, I remain concerned that auto-creation of data streams or target indices is an issue for untrusted endpoints. Since we have built-in templates for data streams such as "metrics--, logs--, and synthetics--", an attacker with append-only privileges could without restriction create thousands of data streams.

Just to see what would happen, I created an apikey with only "create_doc" and "indices:admin/auto_create" privs. I was surprised to see that the data streams are created, but the indexing failed:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "action [indices:admin/mapping/auto_put] is unauthorized for API key id [1UWzaXcBN6DMbwHTWSLc] of user [elastic] on indices [.ds-logs-attack-003-2021.02.03-000001], this action is granted by the privileges [auto_configure,manage,write,all]"
      }
    ],
    "type" : "security_exception",
    "reason" : "action [indices:admin/mapping/auto_put] is unauthorized for API key id [1UWzaXcBN6DMbwHTWSLc] of user [elastic] on indices [.ds-logs-attack-003-2021.02.03-000001], this action is granted by the privileges [auto_configure,manage,write,all]"
  },
  "status" : 403
}

Elastic log:

│ info [o.e.c.m.MetadataMappingService] [MacBook-Pro.local] [.ds-logs-mydosattack-002-2021.02.03-000001/AcH7vGL1RVC0TdX-3mrDtw] update_mapping [_doc]
   │ info [o.e.c.m.MetadataMappingService] [MacBook-Pro.local] [.ds-logs-attack-002-2021.02.03-000001/RiiKWUPdQBeSgPjgZFGXNg] update_mapping [_doc]
   │ info [o.e.c.m.MetadataCreateIndexService] [MacBook-Pro.local] [.ds-logs-attack-003-2021.02.03-000001] creating index, cause [initialize_data_stream], templates [logs], shards [1]/[1]
   │ info [o.e.c.m.MetadataCreateDataStreamService] [MacBook-Pro.local] adding data stream [logs-attack-003] with write index [.ds-logs-attack-003-2021.02.03-000001] and backing indices []
   │ info [o.e.x.i.IndexLifecycleTransition] [MacBook-Pro.local] moving index [.ds-logs-attack-003-2021.02.03-000001] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [logs]
   │ info [o.e.x.i.IndexLifecycleTransition] [MacBook-Pro.local] moving index [.ds-logs-attack-003-2021.02.03-000001] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"hot","action":"unfollow","name":"wait-for-indexing-complete"}] in policy [logs]
   │ info [o.e.x.i.IndexLifecycleTransition] [MacBook-Pro.local] moving index [.ds-logs-attack-003-2021.02.03-000001] from [{"phase":"hot","action":"unfollow","name":"wait-for-indexing-complete"}] to [{"phase":"hot","action":"unfollow","name":"wait-for-follow-shard-tasks"}] in policy [logs]

@ruflin and I have been discussing a model where Kibana uses the data stream api to pre-create the target data streams before dispatching the policies to the agents. In that scenario, an append-only privilege would be very restrictive: no data stream/index creation, no new mapping. Ie., only add a document if the target exists.

Thoughts?

@danhermann
Copy link
Contributor Author

@scunningham, it sounds like create_doc is all you need if your data streams are created in advance by something else and they do not have dynamic mappings?

@ruflin
Copy link
Contributor

ruflin commented Feb 5, 2021

@danhermann The templates contain dynamic mappings. Some of the templates we could probably remove it but not all of them. I wonder if runtime fields could come to our rescue here. We disable dynamic mapping but because of runtime fields it will still be queryable?

@martijnvg
Copy link
Member

Perhaps we just need to introduce a new role that allows dynamic mapping updates or modify the existing create_doc privilege to also grant dynamic mapping updates?

@scunningham
Copy link

IMHO, dynamic mappings are too dangerous a privilege to grant to an untrusted endpoint:

  1. An attacker could overwhelm the index with a bunch of bogus mappings intended to prevent new valid mappings from being created, hitting the mapping limits.
  2. An attacker could, if the timing is right, purposely mis-map a field such that subsequent valid documents would fail due to mapping exceptions.

To support only "create_doc", the dynamic mapping would need to be removed from the data stream templates, and all data streams would have to be created before the agents start streaming data.

The built-in index_templates for logs*,metrics*,synthetics*, etc, all contain a dynamic template:

 "dynamic_templates" : [
              {
                "strings_as_keyword" : {
                  "mapping" : {
                    "ignore_above" : 1024,
                    "type" : "keyword"
                  },
                  "match_mapping_type" : "string"
                }
              }
            ],

Removing dynamic templates to lock down untrusted endpoints would be a significant departure from current behaviors.

@tvernum
Copy link
Contributor

tvernum commented Feb 8, 2021

I don't think this is a problem that should be solved primarily through security.

If there are specific endpoints (agent policies?) that should never use dynamic mapping or dynamic index creation then it's reasonable not to grant them those privileges (auto_configure). In fact this is why we made it a explicit privilege for data stream - so that you can have an ingestion key that has create_doc only, and nothing else.

But, if you leave it at that, then it's just setting things up to fail. Within ES security, in general we don't decide whether to do something based on whether it is allowed by security. We attempt to do it because the system is configured to do that thing, and then we fail if security prevents it.
So, if an index/data stream is configured with dynamic templates, and a new field is ingested that matches that template we will attempt to perform a mapping update, and will fail if the user is not allowed to perform auto-mapping updates.

If there is a system configuration that says to do something, and alongside that is a security configuration that says to prevent something, then there is a conflict and it will typically result in an error.

I think what we need to be talking about is how do we configure the system so that these data streams never attempt to perform dynamic mapping changes, and then removing that privilege from the ingestion key is straight forward.

@jpountz
Copy link
Contributor

jpountz commented Feb 8, 2021

I'd be curious to get more context around this feature request.

Dynamic mapping updates are needed for some data sources that can't provide a schema up-front, and are also important for the onboarding of new data sources, so I wonder how we plan to make this work if we start disallowing dynamic mapping updates on untrusted endpoints. Will we need to have a concept of trusted endpoints too, and if so, on what criterion would an endpoint be trusted or not?

@ruflin and I have been discussing a model where Kibana uses the data stream api to pre-create the target data streams before dispatching the policies to the agents

I like that it removes the need for untrusted endpoints to create data streams but I wonder how it works in the case of a standalone agent setup. Or do we not foresee a need for doing standalone deployments of untrusted endpoints?

For my understanding, would it be an option to only give privileges to a finite set of data streams to untrusted endpoints to avoid letting them create thousands of data streams the way that @scunningham described?

I wonder if runtime fields could come to our rescue here. We disable dynamic mapping but because of runtime fields it will still be queryable?

Configuring mappings with dynamic:false and defining runtime fields as part of search requests would "work", but this has limitations too e.g. such fields would not be suggested via Kibana so it's unclear how users would learn about them in the first place, and they could be slow to search or aggregate.

@scunningham
Copy link

Let me attempt to frame the problem a bit better.

In the field, we expect Fleet Agents to execute in various environments along a broad risk spectrum:

  • Trusted: Systems behind layered defenses with robust security controls and limited access; high value servers etc.
  • Untrusted: Systems in largely uncontrolled environments with minimal security controls; think laptops in a coffee shop, undergraduate computer labs, or systems accessible by a rogue employee.

The Fleet system, as implemented today, prioritizes supporting trusted environments.  For 7.11,  the Fleet implementation (in Kibana at the moment) generates a default Elastic Search api key which it provides to each of its integrations.  This default key has broad privileges:

{
	"fleet-output": {
		"cluster": ["monitor"],
		"index": [{
			"names": [
				"logs-*",
				"metrics-*",
				"traces-*",
				".ds-logs-*",
				"ds-metrics-*",
				"ds-traces-*",
				".logs-endpoint.diagnostic.collection-*",
				".ds-.logs-endpoint.diagnostic.collection-*"
			],
			"privileges": [
				"write",
				"create_index",
				"indices:admin/auto_create"
			]
		}]
	}
}

What are the types of attacks that are possible per privilege:

  • write
    • Update or delete existing records, effective corrupting any document in any of the wildcarded indices
    • Arbitrary dynamic mapping updates; allowing attacker to corrupt or exhaust mappings as previously described in discussion
  • create_index
    • Create any index in the above wild carded namespace, either
      • Starving the system by creating thousands of indices
      • Racing the system to create a known indices and corrupting its mappings, settings, etc.
  • indices:admin/auto_create
    • Similar to create_index; can create arbitrary data streams in the index simply by indexing a document

[Note that normal denial of service attacks are not discussed here. DOS attacks leveraging legitimate operations remain an issue, but are outside the scope of this document. This discussion is limited to attacks that could corrupt data or destabilize the system.]

For 7.12, the privileges have been locked down a bit, however, data stream creation attacks and dynamic mapping attacks are still possible:

 {
 	"fleet-output": {
 		"cluster": ["monitor"],
 		"index": [{
 			"names": ["logs-*",
 				"metrics-*",
 				"traces-*",
 				".logs-endpoint.diagnostic.collection-*"
 			],
 			"privileges": [
 				"auto_configure",
 				"create_doc"
 			]
 		}, ],
 	},
 }

At the extreme untrusted edge of the risk spectrum, ideally an agent would only have append privileges. This should be the default behavior of the agent in a high risk environment; ie. the system fails closed if additional privileges have not been explicitly granted. This is different from the current behavior, which fails open for our default installation.

However, we do have legitimate cases where an integration may require dynamic mapping and potentially dynamic data stream creation as well. Perhaps in those cases, we can generate a more specific api_token that grants the required permission for a set of fully qualified indices. That would limit the attack surface to the indices that require this functionality. In an environment on the trusted end of the spectrum, this may be an acceptable risk.

Fundamentally, the problem we are trying to address is that there is currently no one security posture that will accommodate the needs of all the applications, as well as provide reasonable defense against known attacks.

@ph
Copy link
Contributor

ph commented Feb 10, 2021

@mostlyjason @urso @andresrc ^ please have a look.

@mostlyjason
Copy link

mostlyjason commented Feb 11, 2021

we can generate a more specific api_token that grants the required permission for a set of fully qualified indices

I imagine a common workflow is that a security operations team tests a monitoring solution in a internal environment first before deploying to an untrusted environment? In this case, the internal environment can fully quality the indices before the untrusted environment sends data. The downside is that its extra steps for the operator to bootstrap those dynamic indices, but this could be seen as a more advanced use case. I'm not sure if rollover indices to initialize with the same dynamic mapping from the prior one? If not, that might be a good addition so it continues working on rollover.

@tvernum
Copy link
Contributor

tvernum commented Feb 16, 2021

Perhaps in those cases, we can generate a more specific api_token that grants the required permission for a set of fully qualified indices.

From a least-privilege point of view, that seems wise. Even if we were to solve the mapping & index creation problem described above, there would still be residual risks if we gave untrusted endpoints the ability to append to an unnecessarily wide range of indices

@jpountz
Copy link
Contributor

jpountz commented Feb 23, 2021

Thanks @scunningham, this makes sense to me and how we are thinking of trusted vs. untrusted endpoints in particular was helpful. One aspect I'll be interested in is how we know whether an endpoint is trusted or not, e.g. does it require manual action from the user or is it something that can be inferred from the datasets that are enabled on that endpoint?

@scunningham
Copy link

@jpountz We've not come up with a way to know from the agent's standpoint whether it is trusted. Fleet would have be told somehow, only the customer can really make that assertation. The reality is that the customer is in a difficult position to make that assessment. Security operators are often dealing with huge populations of endpoints in a very dynamic environment; with new endpoints arriving and old ones dropping off constantly. Defaulting to a trusted mode, and asking the customer to manually take an action when agents should be untrusted is risky. For that reason, in my opinion, fleet agents should be untrusted by default.

We should be able to infer from the integration definitions whether or not an integration is considered "untrusted" and adjust the privileges accordingly. If the customer adds an integration to a policy that requires higher privileges, we should notify the user and ask them explicitly to opt in. The user will be responsible for maintaining the list of agents associated with this policy.

There's a lot of subtlety here around permissions how "risky" they actually are. It is probably a mistake to blanket mark a policy as "untrusted" if, for example, we only add read permissions to a specific innocuous index. We shouldn't underestimate the UX complexity here.

@ruflin has put forth a propsal which I think is a good compromise of least privilege per data stream. I am hopeful that this approach, coupled with pre-creation of all but the dynamically created data streams will mitigate many of the concerns described above. However, we've yet to come up with a solution to the dynamic mapping denial of service attack short of disabling dynamic mapping entirely in the untrusted case.

@ruflin
Copy link
Contributor

ruflin commented Feb 25, 2021

If trusted or not should not be dependent on the dataset. The same dataset (data stream) can be used in different context. Lets take a simplified nginx example. In one case, we monitor nginx on an untrusted machine. Because of this, we have append only, no dynamic fields and no creation of data streams as permissions shipped down. This nginx monitoring cannot add any dynamic fields which were not predefined. On the other hand, we monitor nginx services in k8s. There the namespace might be dynamic and also the labels added to each event are dynamic. This requires the Elastic Agent to run in a trusted environment as we ship down more permissions. The resulting data stream for both events could be the same, what is the different is the policy and the permissions on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC Team:Data Management Meta label for data/management team Team:Security Meta label for security team
Projects
None yet
Development

No branches or pull requests

10 participants