Load multiple policy bundles #721

stan-podolski · 2018-04-26T12:25:27Z

I'm trying to find a way to load policies from different bundles into the same OPA instance. So far I was unable to do so as you can only specify 1 bundle in the config file (https://www.openpolicyagent.org/docs/bundles.html). However, it is possible to specify multiple number of services from where bundle can be downloaded.

If this is not supported - what is the recommended way to load policies from different sources into the same OPA instance?

Thanks!

tsandall · 2018-04-26T14:21:29Z

hi @stan-podolski. OPA supports multiple services so that different backends can be used for bundle downloading, status updates, decision logs, etc.

Can you provide a bit more detail about why you need to load bundles from multiple sources into OPA?

FWIW, the bundle that you serve to OPA could contain many different policies (as well as the data they depend on). If you do have multiple sources of policy and data, one option would be to combine them in your bundle service implementation.

stan-podolski · 2018-04-26T14:32:43Z

Hi @tsandall, my use case is such that I have a number of independent teams, which manage their individual policies. They also have independent release cycles.
I wanted to set up an instance of OPA so it will go and pick up the latest release bundles for all teams.
Agree that it is possible to combine all policies into a single bundle ( thats what we gonna end up doing most likely), was just wondering if we could avoid that additional step.

tsandall · 2018-04-26T15:00:51Z

That makes sense.

By restricting the agent to a single bundle, we ensure that the bundle management is largely orthogonal to other capabilities in OPA. For example, OPA does not have to map policy decisions => bundles in the event log. Another example, is optimization; administrators could reasonably expect OPA to be optimize across bundles (e.g., by deduplicating the data.) Hope this helps.

I'm going to close this issue for now. If you have other questions feel free to post issues or ask us on Slack.

tsandall · 2019-04-25T16:53:10Z

I'm re-opening this issue because a few users have requested it. Now that we let bundles declare the root(s) they own, we could have multiple (non-overlapping) bundles loaded at once. The status and decision log plugins will need to take multiple bundles into account:

Status should report the status of each bundle activation
Decision Logs should include the revision of each bundle in events

patrick-east · 2019-06-19T21:25:43Z

I've been poking around at this and have a proposal for some changes, would be great to get some feedback from folks who would want to use this. I want to make sure we can handle the desired use-cases nicely.

Proposed Changes

Config

Currently we configure bundles with a map like:

services:
  acmecorp:
    url: https://example.com/service/v1

bundle:
  name: authz/bundle.tar.gz
  prefix: somedir
  service: acmecorp

We already support multiple services like:

services:
  acmecorp:
    url: https://example.com/service/v1
  hooli:
    url: https://hooli.com/service/

bundle:
  name: authz/bundle.tar.gz
  prefix: somedir
  service: acmecorp

The new config supported for bundles would change to use a map
similar to what the services is using.

services:
  acmecorp:
    url: https://example.com/service/v1
  hooli:
    url: https://hooli.com/service/

bundles:
  authz/bundle.tar.gz:
    resource: somedir/authz/bundle.tar.gz
    service: acmecorp
  hooli-bundle:
    resource: somedir/authz/bundle.tar.gz
    service: hooli

In this format the yaml key is the new name which would probably only surface in logs and status updates. Internally this becomes the new name for the bundle. The change here breaks the connection between the name and the URL we pull the resource from. As shown in the example you could have the name be an arbitrary string like hooli-bundle or keep it the same as the older style and keep using the filename and partial url.

On that note, the other big change is that instead of a prefix+name with default prefix magic we just let them configure the resource path. The URL then is simply service.url + bundle.resource.

Update: We will allow for an empty resource option and for backwards compatibility will default to using service.url + /bundles/ + bundle.name for the URL. For example a config like:

services:
  example:
    url: https://example.com/policy/

bundles:
  authz/bundle.tar.gz:
    service: example

Will utilize the default behavior and result in queries going to https://example.com/policy/bundles/authz/bundle.tar.gz

Note that the service option used there is also optional iff you had a single service defined. This behavior stays the same as the older bundle config defaults.

Bundle Root config

We already allow for setting a root for the bundle to manage in
their manifest. However with multiple bundles, potentially managed
by different teams (as described in one of the use-cases for this),
it might be kind of hard to coordinate or enforce which bundle gets
which roots. It could be really bad if some bundle took over a different
ones roots, either accidentally or maliciously.

For the initial implementation of this we will document the dangers that
go with using >1 bundle. In the future we can potentially implement some
features/functionality to give better control over how conflicts should
be handled.

/health Changes

We have a parameter to allow for including bundle status in the health
check. We will have to update it to wait for all bundles to have been
ready for the first time.

We could potentially add another parameter to specify which bundles to
consider.. This will not be supported in the first version of this.

Status API Changes

The status API defines a payload that looks very similar to the original
bundles config where everything is a flat map. We will have to adjust this
to use a newer format with the bundle maps or send multiple status updates
(one per bundle).

The current proposal is to include all bundles configured in the status payload
to make it easier to get status on each (less work for the caller). This will
add a new field to the payload called bundles which corresponds 1:1 with
the bundles configuration.

Old Request:

POST /status[/<partition_name>] HTTP/1.1
Content-Type: application/json

{
    "labels": {
        "app": "my-example-app",
        "id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
        "version": "v0.12.0"
    },
    "bundle": {
        "name": "http/example/authz",
        "active_revision": "TODO",
        "last_successful_download": "2018-01-01T00:00:00.000Z",
        "last_successful_activation": "2018-01-01T00:00:00.000Z"
    }
}

New Request:

POST /status[/<partition_name>] HTTP/1.1
Content-Type: application/json

{
    "labels": {
        "app": "my-example-app",
        "id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
        "version": "v0.12.0"
    },
    "bundles": {
        "authz/bundle.tar.gz": {
            "active_revision": "123",
            "last_successful_download": "2018-01-01T00:00:00.000Z",
            "last_successful_activation": "2018-01-01T00:00:00.000Z"
        },
        "hooli-bundle": {
            "active_revision": "876",
            "last_successful_download": "2018-01-01T00:00:00.000Z",
            "last_successful_activation": "2018-01-01T00:00:00.000Z"
        }
    }
}

The older version of the payload with bundle will be sent when the configuration
is using bundle. Even if there is only a single bundle configured in bundles
we will use the newer style payload.

Decision logs

As-is we report a revision with the decision. We will now report a map of bundles with
revision values. For backwards compatibility we will keep the old revision field iff
there is a single bundle configured (via the bundle config option).

Old Payload:

[
  {
    "labels": {
      "app": "my-example-app",
      "id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
      "version": "v0.12.0"
    },
    "decision_id": "4ca636c1-55e4-417a-b1d8-4aceb67960d1",
    "revision": "W3sibCI6InN5cy9jYXRhbG9nIiwicyI6NDA3MX1d",
    "path": "http/example/authz/allow",
    "input": {
      "method": "GET",
      "path": "/salary/bob"
    },
    "result": "true",
    "requested_by": "[::1]:59943",
    "timestamp": "2018-01-01T00:00:00.000000Z"
  }
]

New Payload:

[
  {
    "labels": {
      "app": "my-example-app",
      "id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
      "version": "v0.12.0"
    },
    "decision_id": "4ca636c1-55e4-417a-b1d8-4aceb67960d1",
    "bundles": {
        "authz/bundle.tar.gz": {
            "revision": "W3sibCI6InN5cy9jYXRhbG9nIiwicyI6NDA3MX1d"
        },
        "hooli-bundle": {
            "revision": "7747ea16-ce75-45d9-be0f-7105926c303e"
        }
    },
    "path": "http/example/authz/allow",
    "input": {
      "method": "GET",
      "path": "/salary/bob"
    },
    "result": "true",
    "requested_by": "[::1]:59943",
    "timestamp": "2018-01-01T00:00:00.000000Z"
  }
]

The bundles are an object with a revision key so that in the future we
could more easily add in additional information about them.

REST API provenance parameter

Update the response given when the provenance url parameter is set to include
information about each bundle. This will match the changes to the Status API.

Request

POST /v1/data/example?provenance=true HTTP/1.1

Old Response:

{
  "provenance": {
    "build_commit": "1955fc4d",
    "build_host": "foo.com",
    "build_timestamp": "2019-04-29T23:42:04Z",
    "revision": "ID-b1298a6c-6ad8-11e9-a26f-d38b5ceadad5",
    "version": "0.10.8-dev"
  },
  "result": true
}

New Response:

{
  "provenance": {
    "build_commit": "1955fc4d",
    "build_host": "foo.com",
    "build_timestamp": "2019-04-29T23:42:04Z",
    "bundles": {
        "authz/bundle.tar.gz": {
            "revision": "W3sibCI6InN5cy9jYXRhbG9nIiwicyI6NDA3MX1d"
        },
        "hooli-bundle": {
            "revision": "7747ea16-ce75-45d9-be0f-7105926c303e"
        }
    },
    "version": "0.10.8-dev"
  },
  "result": true
}

The new format will only be send if the bundles config option is used (same proposed behavior as Status API)

Internal Changes of note

Update discovery code to handle the new bundles config
Update bundle plugin to handle managing >1 downloader (including aggregating status for each)
Change where we store the bundle manifests from /system/bundle/manifest to something
like /system/bundle/<bundle name>/manifest. And update the helper functions accordingly
Update the server to handle >1 bundles
Make bundle manifest/revision helpers public golang API's for external consumption

Will update the proposal as investigation/coding continues.

====
Edits:

~~Switch to~~ Add bundles (plural) in config
Remove prefix & name in bundles only use resource for path
Add service to status API payload
Removed the allowed_roots stuff from this initial implementation
Added changes for internal storage of bundle manifest(s)
Added changes for decision log
Added changes for provenance
Removed extra "prefix" field on newer config example.
Change example bundle names
Clarify what the name can be
Remove resource and service from the status API payload
Updated the internal changes to still only use a single plugin instance with >1 downloaders internally.
Fix typo in status API example
Add details about default resource value
Clarify default service value

tsandall · 2019-06-21T16:28:06Z

Config option (2) looks best to me. The 'bundle' and 'bundles' keys can be mutually exclusive. The docs can be updated to use 'bundles' throughout. 'bundle' can be supported until we decide to remove it (pre-1.0?)
Bundle Root config. This seems like a good idea. Can we treat it as follow-on work?
/health changes. Also seems like a good idea. Agreed that this can be follow-on work.
Status API changes. I didn't expect to see the "version" field since these changes are additive. My expectation was that if a single "bundle" was configured, the status update would have the existing "bundle" field. If multiple "bundles" were configured, the status update would have the new "bundles" field.
We'll also need to update the decision log events to support multiple revisions.
The bundle manifest is written into the storage at /system/bundle/manifest. There are helper functions in internal/manifest to read the bundle revision and roots out of storage. The server uses these helpers to get the revision to include in decision log events and check for write conflicts. We'll need to update the server to deal with multiple revisions/roots. We should consider exposing the revision lookup so that the opa-istio[1] and other Go embeddings[2] do not have to re-implement the logic 😱

[1] https://github.com/open-policy-agent/opa-istio-plugin/blob/master/internal/internal.go#L35
[2] https://github.com/open-policy-agent/example-api-authz-go/blob/master/internal/opa/opa.go#L86

patrick-east · 2019-06-21T18:52:00Z

Thanks @tsandall ! I had totally missed the decision logger changes and that we read the manifest out of storage like that. Will update to account for them and the manifest storage stuff.

I'll remove the version field from the status API, IIRC the first write up of this I re-used the bundle key and changed the format then later switched to bundles. If we go with the new key we don't need it. I'm all in favor of that approach.

For the bundle root config I'm not sure if we want to wait on this. The solution is partially for security but also for ordering. For example, if we have two bundles with overlapping roots we can't guarantee the order we download/process/activate the bundles. So one bundle might get downloaded and take some root over, then the second one tries and fails to activate.. OPA might still be handling requests but not giving the same responses as if the second bundle had been activated first.

I spent some time thinking through other ways to handle this.. we could batch together activations and check roots on all configured bundles before activating any of them. If we find any conflicts we could de-activate/remove the older (currently being used) version of all conflicting bundles. The downside is that we lose some flexibility with different configured poll rates, or if say one server is temporarily offline or slow it could block all the other bundles from updating. It also means OPA keeps running but is giving bad (probably error) responses for requests it could have otherwise been handling.

Another crazy idea is having some sort of hierarchy or policy to decide which bundle in conflicting bundles wins... The complexity seemed to heavily outweigh the benefits though.

The crux of the problem is that the bundle gets to decide what it owns. IMO any approach we take with this should push the final decision for that to the OPA service (whether through config or internal policy). We just need to make sure bundles don't come in and declare they own some root that overlaps with someone elses.

So, the solution I like is that the admin instead draws lines in the sand and says bundle A owns namespace foo and bundle B owns namespace bar. They can do whatever they want in those namespaces, but if they break out of them they are in trouble (and only that bundle). We also don't have to worry about timing. If each of those configured roots updates at different times we can be assured that they won't just remove/rewrite everything in someone elses. It might still affect the evaluation results if they are importing from some other namespace.. but IMO that is on the policy author and deployer to coordinate.

Edit: Follow up after thinking on it for a little while.. I'm more ok with leaving the bundle roots stuff for the next iteration unless we get some input from users on it. Me making assumptions and guessing at problems is probably going to give a less good solution. For now we can just document the risks run by using multiple bundles and how to avoid them with best practices.

tsandall · 2019-06-21T20:48:45Z

Sounds good. Thanks for thinking through the edge cases/attack vectors nonetheless!

tsandall · 2019-06-25T06:16:50Z

Read through the updates. I have a couple thoughts.

Joining the path prefix and bundle name into resource might be problematic for Status services because if they want just the bundle name then they have to strip the prefix (...but they can't differentiate between the two.) We could tell people define multiple services (each with a different path) however then we have to duplicate the credentials and other service-level config.
Can we avoid the extra identifier in the bundle config?

Proposal (list format):

bundles:
  - name: authz/bundle.tar.gz
    prefix: somedir
    service: acmecorp

OR (object format):

bundles:
  authz/bundle.tar.gz:
    prefix: somedir
    service: acmecorp

In the current proposal the Decision Log messages use the alias and the Status messages use the alias AND the resource. Regardless of where we land, is it possible to make these consistent? I think changing the naming to the above would deal with this.
No strong opinion about the new format of the bundle structure in the Decision Log messages but another option would be:

{
  "revisions": {
     "authz/bundle.tar.gz": "kjsfkjahfdsasdhfaljfskdf",
     "authz/context": "kjsafhsfaksksksjsdjsksa"
  }
}

EDIT: Just to clarify, we ought to use the object format for the bundle config, I just wanted to cover both options.

patrick-east · 2019-06-25T18:01:12Z

Thanks @tsandall !

Joining the path prefix and bundle name into resource might be problematic for Status services because if they want just the bundle name then they have to strip the prefix (...but they can't differentiate between the two.) We could tell people define multiple services (each with a different path) however then we have to duplicate the credentials and other service-level config.

I'm not 100% sure I follow why they would need to parse the resource. The name is given to them in the proposed status api payload as the key for the bundle details and could be whatever they need (my examples show only simple strings but it can still be authz/bundle.tar.gz if someone wanted). Maybe I should update the names to not match the services.. I'll change that in the next update.

From my point of view I don't like using that older style path+filename as the name or key for a couple of reasons. First, as keys, and probably just being a bundle.tar.gz, it pushes deployers to use some path prefix (not the path prefix, but the like auth/ part) to keep them unique. This IMO makes the names less friendly. Second, and IMO the most important part, it confuses the matter of where/how the stuff in them gets loaded. Other places with OPA the path/directory structure of things loaded implies some evaluation details about where things are loaded. With the bundles it seems to be arbitrary for naming to avoid collisions (and you still need to watch out for collisions with the roots defined in the manifests)... I don't like that amount of complexity with what is effectively just a string we use as the name. Internally we need a URL and a name we log/report status with. Which is exactly what the new proposal is, a name and a resource URL to download the bundle from.

If the deployer needed that name to be authz/bundle.tar.gz they can still do that, its a valid string 😄 But if they just want to call it authz or auth-bundle or whatever and point it to a URL (which might change over time) they can. I'm proposing that we stop imposing some implicit meaning to what the name is in favor of (IMO) more explicit and decoupled configuration options.

In the current proposal the Decision Log messages use the alias and the Status messages use the alias AND the resource. Regardless of where we land, is it possible to make these consistent?

Yep, thats easy enough. I would opt for removing it from the status API if we stick with it. I had figured it might be useful for the status API to get more of the config (since it is monitoring bundles) vs the other auditing logs that just needed the revision for the bundle to correlate later on. My reasoning being that someone/something watching the results of the status API would see an error and, if in an error state, would want to know where it is without looking up the opa config. The auditing use-case may not need that shortcut and is also much more frequently sent so I left out the extra info. All that being said, we didn't report anything other than the name and revision previously so we can stick with that.

Re: (4) and the format, I opted to break the "name" -> "revision" style info in favor of a map of "name" -> bundle object with "revision" field so that we can more easily add things later on. If we group them under the revisions key I think we re-introduce that difficulty to add more info. We might have additional details to report and I wouldn't want to have like:

{
  "revisions": {
     "authz/bundle.tar.gz": "kjsfkjahfdsasdhfaljfskdf",
     "authz/context": "kjsafhsfaksksksjsdjsksa"
  },
  "other_bundle_thing": {
     "authz/bundle.tar.gz": "the thing",
     "authz/context": "another thing"
  },
  "yet_another_bundle_thing": {
     "authz/bundle.tar.gz": "more thing",
     "authz/context": "another thing"
  }
}

versus something like:

{
    "authz/bundle.tar.gz": {
        "revision": "kjsfkjahfdsasdhfaljfskdf",
        "other_bundle_thing": "the thing",
        "yet_another_bundle_thing": "more thing"
    },
    "authz/context": {
        "revision": "kjsfkjahfdsasdhfaljfskdf",
        "other_bundle_thing": "another thing",
        "yet_another_bundle_thing": "another thing"
    }
}

IMO the second one is much easier to parse (programmatically and eyeballing json)

tsandall · 2019-06-26T04:55:40Z

Closing the loop here from the offline conversation. There was confusion on my part about the new "resource" field (I thought it was replacing the existing "name" field as the bundle identifier.) Decoupling the bundle identifier from the location where the bundle is stored could be useful and explicitly specifying the path is preferrable to the current implicit style using the "prefix".

The proposal looks good to me on re-reading.

I noticed one small mistake in the Status API section. The bundle name is "acmecorp" but I think it should be "authz/bundle.tar.gz".
I just realized that due to the provenance API changes this enhancement will impact services that query OPA for decisions (it didn't occur to me on first reading.) I don't think there is a lot to be done about this it's just a bit unfortunate because usually control plane changes are decoupled from query API changes (and vice versa).

patrick-east · 2019-06-26T17:15:28Z

I also updated to include the default behavior for resource like we discussed offline.

I noticed one small mistake in the Status API section. The bundle name is "acmecorp" but I think it should be "authz/bundle.tar.gz".

Fixed in latest update.

I just realized that due to the provenance API changes this enhancement will impact services that query OPA for decisions (it didn't occur to me on first reading.) I don't think there is a lot to be done about this it's just a bit unfortunate because usually control plane changes are decoupled from query API changes (and vice versa).

Yea.. I wasn't sure how best to tackle that one. If you have any ideas on handling it better I'm all ears.
One take-away for me is that I should be more careful on changes like that when reviewing to help push them to be more future-proof. In retrospect we knew this multi-bundle thing was coming and could have made the transition a little more smooth.

This change brings in support for multiple bundles to be downloaded and activated OPA. This is enabled by using the new config option `bundles` to define the bundles, and deprecates the older `bundle` option. The new `bundles` keyword and structure is propagated through to the decision logs, status API, provenance, stored manifests, etc. Check out the doc changes for all the updated structures. That being said any existing configuration using `bundle` will *not* see the new structure, everything is intended to be backwards compatible (almost to a fault). Fixes: open-policy-agent#721 Signed-off-by: Patrick East <east.patrick@gmail.com>

This change brings in support for multiple bundles to be downloaded and activated OPA. This is enabled by using the new config option `bundles` to define the bundles, and deprecates the older `bundle` option. The new `bundles` keyword and structure is propagated through to the decision logs, status API, provenance, stored manifests, etc. Check out the doc changes for all the updated structures. That being said any existing configuration using `bundle` will *not* see the new structure, everything is intended to be backwards compatible (almost to a fault). Fixes: #721 Signed-off-by: Patrick East <east.patrick@gmail.com>

tsandall closed this as completed Apr 26, 2018

tsandall reopened this Apr 25, 2019

tsandall added the enhancement label Apr 25, 2019

This was referenced Jun 10, 2019

Download bundles from OCI-compatible registries #1464

Closed

OPA Status - expose as prometheus metrics #1472

Closed

patrick-east self-assigned this Jun 18, 2019

patrick-east mentioned this issue Jul 16, 2019

Add support for multiple bundles #1566

Merged

tsandall closed this as completed in #1566 Jul 31, 2019

eatwithforks mentioned this issue Sep 16, 2020

Document how to configure bundle and services open-policy-agent/gatekeeper#829

Closed

ashutosh-narkar added this to Open Policy Agent Aug 5, 2024

ashutosh-narkar moved this to Done in Open Policy Agent Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load multiple policy bundles #721

Load multiple policy bundles #721

stan-podolski commented Apr 26, 2018

tsandall commented Apr 26, 2018

stan-podolski commented Apr 26, 2018

tsandall commented Apr 26, 2018

tsandall commented Apr 25, 2019

patrick-east commented Jun 19, 2019 •

edited

Loading

tsandall commented Jun 21, 2019

patrick-east commented Jun 21, 2019 •

edited

Loading

tsandall commented Jun 21, 2019

tsandall commented Jun 25, 2019 •

edited

Loading

patrick-east commented Jun 25, 2019

tsandall commented Jun 26, 2019

patrick-east commented Jun 26, 2019

Load multiple policy bundles #721

Load multiple policy bundles #721

Comments

stan-podolski commented Apr 26, 2018

tsandall commented Apr 26, 2018

stan-podolski commented Apr 26, 2018

tsandall commented Apr 26, 2018

tsandall commented Apr 25, 2019

patrick-east commented Jun 19, 2019 • edited Loading

Proposed Changes

Config

Bundle Root config

/health Changes

Status API Changes

Decision logs

REST API provenance parameter

Internal Changes of note

tsandall commented Jun 21, 2019

patrick-east commented Jun 21, 2019 • edited Loading

tsandall commented Jun 21, 2019

tsandall commented Jun 25, 2019 • edited Loading

patrick-east commented Jun 25, 2019

tsandall commented Jun 26, 2019

patrick-east commented Jun 26, 2019

patrick-east commented Jun 19, 2019 •

edited

Loading

patrick-east commented Jun 21, 2019 •

edited

Loading

tsandall commented Jun 25, 2019 •

edited

Loading