Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load multiple policy bundles #721

Closed
stan-podolski opened this issue Apr 26, 2018 · 12 comments · Fixed by #1566
Closed

Load multiple policy bundles #721

stan-podolski opened this issue Apr 26, 2018 · 12 comments · Fixed by #1566
Assignees

Comments

@stan-podolski
Copy link

I'm trying to find a way to load policies from different bundles into the same OPA instance. So far I was unable to do so as you can only specify 1 bundle in the config file (https://www.openpolicyagent.org/docs/bundles.html). However, it is possible to specify multiple number of services from where bundle can be downloaded.

If this is not supported - what is the recommended way to load policies from different sources into the same OPA instance?

Thanks!

@tsandall
Copy link
Member

hi @stan-podolski. OPA supports multiple services so that different backends can be used for bundle downloading, status updates, decision logs, etc.

Can you provide a bit more detail about why you need to load bundles from multiple sources into OPA?

FWIW, the bundle that you serve to OPA could contain many different policies (as well as the data they depend on). If you do have multiple sources of policy and data, one option would be to combine them in your bundle service implementation.

@stan-podolski
Copy link
Author

Hi @tsandall, my use case is such that I have a number of independent teams, which manage their individual policies. They also have independent release cycles.
I wanted to set up an instance of OPA so it will go and pick up the latest release bundles for all teams.
Agree that it is possible to combine all policies into a single bundle ( thats what we gonna end up doing most likely), was just wondering if we could avoid that additional step.

@tsandall
Copy link
Member

That makes sense.

By restricting the agent to a single bundle, we ensure that the bundle management is largely orthogonal to other capabilities in OPA. For example, OPA does not have to map policy decisions => bundles in the event log. Another example, is optimization; administrators could reasonably expect OPA to be optimize across bundles (e.g., by deduplicating the data.) Hope this helps.

I'm going to close this issue for now. If you have other questions feel free to post issues or ask us on Slack.

@tsandall
Copy link
Member

I'm re-opening this issue because a few users have requested it. Now that we let bundles declare the root(s) they own, we could have multiple (non-overlapping) bundles loaded at once. The status and decision log plugins will need to take multiple bundles into account:

  • Status should report the status of each bundle activation
  • Decision Logs should include the revision of each bundle in events

@patrick-east
Copy link
Contributor

patrick-east commented Jun 19, 2019

I've been poking around at this and have a proposal for some changes, would be great to get some feedback from folks who would want to use this. I want to make sure we can handle the desired use-cases nicely.

Proposed Changes

Config

Currently we configure bundles with a map like:

services:
  acmecorp:
    url: https://example.com/service/v1

bundle:
  name: authz/bundle.tar.gz
  prefix: somedir
  service: acmecorp

We already support multiple services like:

services:
  acmecorp:
    url: https://example.com/service/v1
  hooli:
    url: https://hooli.com/service/

bundle:
  name: authz/bundle.tar.gz
  prefix: somedir
  service: acmecorp

The new config supported for bundles would change to use a map
similar to what the services is using.

services:
  acmecorp:
    url: https://example.com/service/v1
  hooli:
    url: https://hooli.com/service/

bundles:
  authz/bundle.tar.gz:
    resource: somedir/authz/bundle.tar.gz
    service: acmecorp
  hooli-bundle:
    resource: somedir/authz/bundle.tar.gz
    service: hooli

In this format the yaml key is the new name which would probably only surface in logs and status updates. Internally this becomes the new name for the bundle. The change here breaks the connection between the name and the URL we pull the resource from. As shown in the example you could have the name be an arbitrary string like hooli-bundle or keep it the same as the older style and keep using the filename and partial url.

On that note, the other big change is that instead of a prefix+name with default prefix magic we just let them configure the resource path. The URL then is simply service.url + bundle.resource.

Update: We will allow for an empty resource option and for backwards compatibility will default to using service.url + /bundles/ + bundle.name for the URL. For example a config like:

services:
  example:
    url: https://example.com/policy/

bundles:
  authz/bundle.tar.gz:
    service: example

Will utilize the default behavior and result in queries going to https://example.com/policy/bundles/authz/bundle.tar.gz

Note that the service option used there is also optional iff you had a single service defined. This behavior stays the same as the older bundle config defaults.

Bundle Root config

We already allow for setting a root for the bundle to manage in
their manifest. However with multiple bundles, potentially managed
by different teams (as described in one of the use-cases for this),
it might be kind of hard to coordinate or enforce which bundle gets
which roots. It could be really bad if some bundle took over a different
ones roots, either accidentally or maliciously.

For the initial implementation of this we will document the dangers that
go with using >1 bundle. In the future we can potentially implement some
features/functionality to give better control over how conflicts should
be handled.

/health Changes

We have a parameter to allow for including bundle status in the health
check. We will have to update it to wait for all bundles to have been
ready for the first time.

We could potentially add another parameter to specify which bundles to
consider.. This will not be supported in the first version of this.

Status API Changes

The status API defines a payload that looks very similar to the original
bundles config where everything is a flat map. We will have to adjust this
to use a newer format with the bundle maps or send multiple status updates
(one per bundle).

The current proposal is to include all bundles configured in the status payload
to make it easier to get status on each (less work for the caller). This will
add a new field to the payload called bundles which corresponds 1:1 with
the bundles configuration.

Old Request:

POST /status[/<partition_name>] HTTP/1.1
Content-Type: application/json
{
    "labels": {
        "app": "my-example-app",
        "id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
        "version": "v0.12.0"
    },
    "bundle": {
        "name": "http/example/authz",
        "active_revision": "TODO",
        "last_successful_download": "2018-01-01T00:00:00.000Z",
        "last_successful_activation": "2018-01-01T00:00:00.000Z"
    }
}

New Request:

POST /status[/<partition_name>] HTTP/1.1
Content-Type: application/json
{
    "labels": {
        "app": "my-example-app",
        "id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
        "version": "v0.12.0"
    },
    "bundles": {
        "authz/bundle.tar.gz": {
            "active_revision": "123",
            "last_successful_download": "2018-01-01T00:00:00.000Z",
            "last_successful_activation": "2018-01-01T00:00:00.000Z"
        },
        "hooli-bundle": {
            "active_revision": "876",
            "last_successful_download": "2018-01-01T00:00:00.000Z",
            "last_successful_activation": "2018-01-01T00:00:00.000Z"
        }
    }
}

The older version of the payload with bundle will be sent when the configuration
is using bundle. Even if there is only a single bundle configured in bundles
we will use the newer style payload.

Decision logs

As-is we report a revision with the decision. We will now report a map of bundles with
revision values. For backwards compatibility we will keep the old revision field iff
there is a single bundle configured (via the bundle config option).

Old Payload:

[
  {
    "labels": {
      "app": "my-example-app",
      "id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
      "version": "v0.12.0"
    },
    "decision_id": "4ca636c1-55e4-417a-b1d8-4aceb67960d1",
    "revision": "W3sibCI6InN5cy9jYXRhbG9nIiwicyI6NDA3MX1d",
    "path": "http/example/authz/allow",
    "input": {
      "method": "GET",
      "path": "/salary/bob"
    },
    "result": "true",
    "requested_by": "[::1]:59943",
    "timestamp": "2018-01-01T00:00:00.000000Z"
  }
]

New Payload:

[
  {
    "labels": {
      "app": "my-example-app",
      "id": "1780d507-aea2-45cc-ae50-fa153c8e4a5a",
      "version": "v0.12.0"
    },
    "decision_id": "4ca636c1-55e4-417a-b1d8-4aceb67960d1",
    "bundles": {
        "authz/bundle.tar.gz": {
            "revision": "W3sibCI6InN5cy9jYXRhbG9nIiwicyI6NDA3MX1d"
        },
        "hooli-bundle": {
            "revision": "7747ea16-ce75-45d9-be0f-7105926c303e"
        }
    },
    "path": "http/example/authz/allow",
    "input": {
      "method": "GET",
      "path": "/salary/bob"
    },
    "result": "true",
    "requested_by": "[::1]:59943",
    "timestamp": "2018-01-01T00:00:00.000000Z"
  }
]

The bundles are an object with a revision key so that in the future we
could more easily add in additional information about them.

REST API provenance parameter

Update the response given when the provenance url parameter is set to include
information about each bundle. This will match the changes to the Status API.

Request

POST /v1/data/example?provenance=true HTTP/1.1

Old Response:

{
  "provenance": {
    "build_commit": "1955fc4d",
    "build_host": "foo.com",
    "build_timestamp": "2019-04-29T23:42:04Z",
    "revision": "ID-b1298a6c-6ad8-11e9-a26f-d38b5ceadad5",
    "version": "0.10.8-dev"
  },
  "result": true
}

New Response:

{
  "provenance": {
    "build_commit": "1955fc4d",
    "build_host": "foo.com",
    "build_timestamp": "2019-04-29T23:42:04Z",
    "bundles": {
        "authz/bundle.tar.gz": {
            "revision": "W3sibCI6InN5cy9jYXRhbG9nIiwicyI6NDA3MX1d"
        },
        "hooli-bundle": {
            "revision": "7747ea16-ce75-45d9-be0f-7105926c303e"
        }
    },
    "version": "0.10.8-dev"
  },
  "result": true
}

The new format will only be send if the bundles config option is used (same proposed behavior as Status API)

Internal Changes of note

  • Update discovery code to handle the new bundles config
  • Update bundle plugin to handle managing >1 downloader (including aggregating status for each)
  • Change where we store the bundle manifests from /system/bundle/manifest to something
    like /system/bundle/<bundle name>/manifest. And update the helper functions accordingly
  • Update the server to handle >1 bundles
  • Make bundle manifest/revision helpers public golang API's for external consumption

Will update the proposal as investigation/coding continues.

====
Edits:

  1. Switch to Add bundles (plural) in config
  2. Remove prefix & name in bundles only use resource for path
  3. Add service to status API payload
  4. Removed the allowed_roots stuff from this initial implementation
  5. Added changes for internal storage of bundle manifest(s)
  6. Added changes for decision log
  7. Added changes for provenance
  8. Removed extra "prefix" field on newer config example.
  9. Change example bundle names
  10. Clarify what the name can be
  11. Remove resource and service from the status API payload
  12. Updated the internal changes to still only use a single plugin instance with >1 downloaders internally.
  13. Fix typo in status API example
  14. Add details about default resource value
  15. Clarify default service value

@tsandall
Copy link
Member

  • Config option (2) looks best to me. The 'bundle' and 'bundles' keys can be mutually exclusive. The docs can be updated to use 'bundles' throughout. 'bundle' can be supported until we decide to remove it (pre-1.0?)

  • Bundle Root config. This seems like a good idea. Can we treat it as follow-on work?

  • /health changes. Also seems like a good idea. Agreed that this can be follow-on work.

  • Status API changes. I didn't expect to see the "version" field since these changes are additive. My expectation was that if a single "bundle" was configured, the status update would have the existing "bundle" field. If multiple "bundles" were configured, the status update would have the new "bundles" field.

  • We'll also need to update the decision log events to support multiple revisions.

  • The bundle manifest is written into the storage at /system/bundle/manifest. There are helper functions in internal/manifest to read the bundle revision and roots out of storage. The server uses these helpers to get the revision to include in decision log events and check for write conflicts. We'll need to update the server to deal with multiple revisions/roots. We should consider exposing the revision lookup so that the opa-istio[1] and other Go embeddings[2] do not have to re-implement the logic 😱

[1] https://github.com/open-policy-agent/opa-istio-plugin/blob/master/internal/internal.go#L35
[2] https://github.com/open-policy-agent/example-api-authz-go/blob/master/internal/opa/opa.go#L86

@patrick-east
Copy link
Contributor

patrick-east commented Jun 21, 2019

Thanks @tsandall ! I had totally missed the decision logger changes and that we read the manifest out of storage like that. Will update to account for them and the manifest storage stuff.

I'll remove the version field from the status API, IIRC the first write up of this I re-used the bundle key and changed the format then later switched to bundles. If we go with the new key we don't need it. I'm all in favor of that approach.

For the bundle root config I'm not sure if we want to wait on this. The solution is partially for security but also for ordering. For example, if we have two bundles with overlapping roots we can't guarantee the order we download/process/activate the bundles. So one bundle might get downloaded and take some root over, then the second one tries and fails to activate.. OPA might still be handling requests but not giving the same responses as if the second bundle had been activated first.

I spent some time thinking through other ways to handle this.. we could batch together activations and check roots on all configured bundles before activating any of them. If we find any conflicts we could de-activate/remove the older (currently being used) version of all conflicting bundles. The downside is that we lose some flexibility with different configured poll rates, or if say one server is temporarily offline or slow it could block all the other bundles from updating. It also means OPA keeps running but is giving bad (probably error) responses for requests it could have otherwise been handling.

Another crazy idea is having some sort of hierarchy or policy to decide which bundle in conflicting bundles wins... The complexity seemed to heavily outweigh the benefits though.

The crux of the problem is that the bundle gets to decide what it owns. IMO any approach we take with this should push the final decision for that to the OPA service (whether through config or internal policy). We just need to make sure bundles don't come in and declare they own some root that overlaps with someone elses.

So, the solution I like is that the admin instead draws lines in the sand and says bundle A owns namespace foo and bundle B owns namespace bar. They can do whatever they want in those namespaces, but if they break out of them they are in trouble (and only that bundle). We also don't have to worry about timing. If each of those configured roots updates at different times we can be assured that they won't just remove/rewrite everything in someone elses. It might still affect the evaluation results if they are importing from some other namespace.. but IMO that is on the policy author and deployer to coordinate.

Edit: Follow up after thinking on it for a little while.. I'm more ok with leaving the bundle roots stuff for the next iteration unless we get some input from users on it. Me making assumptions and guessing at problems is probably going to give a less good solution. For now we can just document the risks run by using multiple bundles and how to avoid them with best practices.

@tsandall
Copy link
Member

Sounds good. Thanks for thinking through the edge cases/attack vectors nonetheless!

@tsandall
Copy link
Member

tsandall commented Jun 25, 2019

Read through the updates. I have a couple thoughts.

  1. Joining the path prefix and bundle name into resource might be problematic for Status services because if they want just the bundle name then they have to strip the prefix (...but they can't differentiate between the two.) We could tell people define multiple services (each with a different path) however then we have to duplicate the credentials and other service-level config.

  2. Can we avoid the extra identifier in the bundle config?

Proposal (list format):

bundles:
  - name: authz/bundle.tar.gz
    prefix: somedir
    service: acmecorp

OR (object format):

bundles:
  authz/bundle.tar.gz:
    prefix: somedir
    service: acmecorp
  1. In the current proposal the Decision Log messages use the alias and the Status messages use the alias AND the resource. Regardless of where we land, is it possible to make these consistent? I think changing the naming to the above would deal with this.

  2. No strong opinion about the new format of the bundle structure in the Decision Log messages but another option would be:

{
  "revisions": {
     "authz/bundle.tar.gz": "kjsfkjahfdsasdhfaljfskdf",
     "authz/context": "kjsafhsfaksksksjsdjsksa"
  }
}

EDIT: Just to clarify, we ought to use the object format for the bundle config, I just wanted to cover both options.

@patrick-east
Copy link
Contributor

Thanks @tsandall !

Joining the path prefix and bundle name into resource might be problematic for Status services because if they want just the bundle name then they have to strip the prefix (...but they can't differentiate between the two.) We could tell people define multiple services (each with a different path) however then we have to duplicate the credentials and other service-level config.

I'm not 100% sure I follow why they would need to parse the resource. The name is given to them in the proposed status api payload as the key for the bundle details and could be whatever they need (my examples show only simple strings but it can still be authz/bundle.tar.gz if someone wanted). Maybe I should update the names to not match the services.. I'll change that in the next update.

From my point of view I don't like using that older style path+filename as the name or key for a couple of reasons. First, as keys, and probably just being a bundle.tar.gz, it pushes deployers to use some path prefix (not the path prefix, but the like auth/ part) to keep them unique. This IMO makes the names less friendly. Second, and IMO the most important part, it confuses the matter of where/how the stuff in them gets loaded. Other places with OPA the path/directory structure of things loaded implies some evaluation details about where things are loaded. With the bundles it seems to be arbitrary for naming to avoid collisions (and you still need to watch out for collisions with the roots defined in the manifests)... I don't like that amount of complexity with what is effectively just a string we use as the name. Internally we need a URL and a name we log/report status with. Which is exactly what the new proposal is, a name and a resource URL to download the bundle from.

If the deployer needed that name to be authz/bundle.tar.gz they can still do that, its a valid string 😄 But if they just want to call it authz or auth-bundle or whatever and point it to a URL (which might change over time) they can. I'm proposing that we stop imposing some implicit meaning to what the name is in favor of (IMO) more explicit and decoupled configuration options.

In the current proposal the Decision Log messages use the alias and the Status messages use the alias AND the resource. Regardless of where we land, is it possible to make these consistent?

Yep, thats easy enough. I would opt for removing it from the status API if we stick with it. I had figured it might be useful for the status API to get more of the config (since it is monitoring bundles) vs the other auditing logs that just needed the revision for the bundle to correlate later on. My reasoning being that someone/something watching the results of the status API would see an error and, if in an error state, would want to know where it is without looking up the opa config. The auditing use-case may not need that shortcut and is also much more frequently sent so I left out the extra info. All that being said, we didn't report anything other than the name and revision previously so we can stick with that.

Re: (4) and the format, I opted to break the "name" -> "revision" style info in favor of a map of "name" -> bundle object with "revision" field so that we can more easily add things later on. If we group them under the revisions key I think we re-introduce that difficulty to add more info. We might have additional details to report and I wouldn't want to have like:

{
  "revisions": {
     "authz/bundle.tar.gz": "kjsfkjahfdsasdhfaljfskdf",
     "authz/context": "kjsafhsfaksksksjsdjsksa"
  },
  "other_bundle_thing": {
     "authz/bundle.tar.gz": "the thing",
     "authz/context": "another thing"
  },
  "yet_another_bundle_thing": {
     "authz/bundle.tar.gz": "more thing",
     "authz/context": "another thing"
  }
}

versus something like:

{
    "authz/bundle.tar.gz": {
        "revision": "kjsfkjahfdsasdhfaljfskdf",
        "other_bundle_thing": "the thing",
        "yet_another_bundle_thing": "more thing"
    },
    "authz/context": {
        "revision": "kjsfkjahfdsasdhfaljfskdf",
        "other_bundle_thing": "another thing",
        "yet_another_bundle_thing": "another thing"
    }
}

IMO the second one is much easier to parse (programmatically and eyeballing json)

@tsandall
Copy link
Member

Closing the loop here from the offline conversation. There was confusion on my part about the new "resource" field (I thought it was replacing the existing "name" field as the bundle identifier.) Decoupling the bundle identifier from the location where the bundle is stored could be useful and explicitly specifying the path is preferrable to the current implicit style using the "prefix".

The proposal looks good to me on re-reading.

  • I noticed one small mistake in the Status API section. The bundle name is "acmecorp" but I think it should be "authz/bundle.tar.gz".

  • I just realized that due to the provenance API changes this enhancement will impact services that query OPA for decisions (it didn't occur to me on first reading.) I don't think there is a lot to be done about this it's just a bit unfortunate because usually control plane changes are decoupled from query API changes (and vice versa).

@patrick-east
Copy link
Contributor

I also updated to include the default behavior for resource like we discussed offline.

I noticed one small mistake in the Status API section. The bundle name is "acmecorp" but I think it should be "authz/bundle.tar.gz".

Fixed in latest update.

I just realized that due to the provenance API changes this enhancement will impact services that query OPA for decisions (it didn't occur to me on first reading.) I don't think there is a lot to be done about this it's just a bit unfortunate because usually control plane changes are decoupled from query API changes (and vice versa).

Yea.. I wasn't sure how best to tackle that one. If you have any ideas on handling it better I'm all ears.
One take-away for me is that I should be more careful on changes like that when reviewing to help push them to be more future-proof. In retrospect we knew this multi-bundle thing was coming and could have made the transition a little more smooth.

patrick-east added a commit to patrick-east/opa that referenced this issue Jul 30, 2019
This change brings in support for multiple bundles to be downloaded
and activated OPA.

This is enabled by using the new config option `bundles` to define
the bundles, and deprecates the older `bundle` option.

The new `bundles` keyword and structure is propagated through to the
decision logs, status API, provenance, stored manifests, etc. Check
out the doc changes for all the updated structures.

That being said any existing configuration using `bundle` will *not*
see the new structure, everything is intended to be backwards
compatible (almost to a fault).

Fixes: open-policy-agent#721

Signed-off-by: Patrick East <east.patrick@gmail.com>
tsandall pushed a commit that referenced this issue Jul 31, 2019
This change brings in support for multiple bundles to be downloaded
and activated OPA.

This is enabled by using the new config option `bundles` to define
the bundles, and deprecates the older `bundle` option.

The new `bundles` keyword and structure is propagated through to the
decision logs, status API, provenance, stored manifests, etc. Check
out the doc changes for all the updated structures.

That being said any existing configuration using `bundle` will *not*
see the new structure, everything is intended to be backwards
compatible (almost to a fault).

Fixes: #721

Signed-off-by: Patrick East <east.patrick@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants