Method to Synchronize Schema Versions with Git #346

jakeyheath · 2024-08-02T21:36:33Z

jakeyheath
Aug 2, 2024

We are an organization that has heavily adopted gitops-driven workflows. All our applications use gitops to check in configuration of their deployed applications. Every application can be directly correlated to git commit. Ideally, we'd like a way to synchronize our authorization_model_id values with git so that we can use this same development practice to inform applications what version of the schema they are on. If we ever needed to debug a particular schema version, we could go back to that commit and see how the schema was defined.

Right now, authorization_model_id's are randomly generated after the schema is submitted. It would be nice to have a flag or option to set the value of the authorization_model_id (i.e. --set-id <value>). That way we could pass in the git SHA when submitting schema migrations and we could configure our application to use a git SHA value to use that version of the schema during deployment. We have a very similar approach to Docker build tags.

Here are some solutions we've thought of implementing to work around this:

Check in the authorization_model_id to the git repo
Run some service in the backend that keeps a mapping of authorization_model_id to git SHA and magically inject this into applications
Have our teams manage their own schema migrations and authorization_model_id's (we would love to have a paved road here, so we aren't keen on this solution as it puts a lot of burden on the development teams to do this properly)

Related discussions

aaguiarz · 2024-08-14T23:01:32Z

aaguiarz
Aug 14, 2024
Maintainer

Hi @jakeyheath

What if we let you specify a model Name or ID when writing the model?

e.g.

POST /stores/<string>/authorization-models 
{
"name" : "77d4c98a51ca75c50d714f10b6fa7c8ee91202ac",
"type_definitions": [
 { ... } ]
}

And provide a way to query by name

GET /stores/<string>/authorization-models?name=77d4c98a51ca75c50d714f10b6fa7c8ee91202ac&page_size=<integer>&continuation_token=<string>

You'll need to query the model ID based on the name.

Would that help?

3 replies

danielloader Aug 16, 2024

Firstly being able to grab a model by a name string would be an improvement over what we have now, but without the same functionality on stores you still have some uniqueness in the system that prevents multi environment workflows.

Secondly it's worth having a quick evaluation of how OpenFGA wants to deal with what is essentially metadata on the objects it stores.

Metadata

Depending on appetite there's all sorts of approaches OpenFGA can take with metadata. So here's some examples pulled from prior art I've worked with before.

Kubernetes

I'm quite biased as a platform engineer in a Kubernetes context, but the way Kubernetes objects handle metadata appeals here.

Just thinking out loud here on designs and use-cases and if there's justifications because any additional complexity has to be paid for.

{
  "metadata": {
    "name": "77d4c98a51ca75c50d714f10b6fa7c8ee91202ac",
    "labels": {
      "gitHash": "dee1e34",
      "version": "1.3.2"
    },
    "annotations": {
      "gitRepo": "https://github.com/example/repo"
    }
  },
  "type_definitions": [
    { ... }
  ]  
}

The rationale here would be a name is something you can select directly, akin to the query you have where ?name=77d4c98a51ca75c50d714f10b6fa7c8ee91202ac but you could have labels (something you can query over for label matching) and annotations (something you can't query but can access on the OpenFGA client object once initialised and the model is retrieved).

You could potentially then log out the annotations to stdout/stderr so when a client finishes initialising there's a log of useful metadata for operations people to be able to track.

Additionally labels could be useful for selection of models, for example ?labelSelector=version%3D1.3.2 or returning all models via a filter, such as ?labelSelector=tested%3Dtrue&limit=5 to get the last 5 tested=true models.

OCI

Away from the Kubernetes flavoured examples, we could take inspiration from OCI containers.

{
  "tags": [
    "1.3.2",
    "dee1e34"
  ],
  "type_definitions": [
    { ... }
  ]  
}

In which case you could query on the tag values directly e.g. ?tag=1.3.2 but this would require copying the OCI registry semantics of having either immutable tags or mutable tags.

With immutable tags you'd be tying tags to models, once and only once, with mutable tags you'd have a database table where in you just move the pointer to the model when a tag is reused.

SemVer

You could ditch arbitrary tags for a highly opinionated field using SemVer:

{
  "version": "1.3.2",
  "type_definitions": [
    { ... }
  ]  
}

The rationale here would be for auto promotions of versions, within semantic version range clamping.

An example would look like so - ?version=%5E1.2.3 (^1.2.3) - Each time you evaluate a model lookup (once per client initialisation) pick the latest version that's higher than 1.2.3 but lower than 2.0.0.

Name

As proposed already above this reply, just having a simple string alias on a model in the database offers a lot of flexibility in what is stored at the cost of less flexibility in the query capabilities.

Since it's a free form string, you can store SemVer, CalVer, integer versions or git hashes in this field and use it as an alias for change promotion. You'd be unable to do comparisons or ranges, or sorting easily but for being able to select a model using a string that could be passed between environments, this would fit the bill.

Note on Idempotency

It might be worth considering storing a hash of the entire model as metadata, so submission of the same model with an identical model payload can either issue a soft warning it's a duplicate, or prevent submission and issue an appropriate error.

I can't think of a use case for having multiple identical models available with different model IDs but this might be by design so open to hearing any.

danielloader Aug 16, 2024

Rationale

So to add some rationale to why this is important to me as a use-case I'll document where this has been most painful as a functionality gap.

Let's take a standard environment setup before moving onto the pitfalls and problems:

Production environment
Staging Environment
Development Environment

Problem Case

I run this environment using Argo CD (and Kargo) for handling change promotion and reconciliation into environments.
Microservices all need to be configured to talk to OpenFGA (including OIDC) for the SDKs to work in each service.
Each service has to reference a model (and store ID) for a predictable deployment outcome.
With OpenFGA, Store IDs and Model IDs aren't predictable by design.
Promotion of changes in the system is predicated on a single source of truth and aliases to bundles or packages of state (usually an image container with SemVer tags).

As with other runtimes (Lambda, ECS, etc), Kubernetes makes environment variables immutable in the deployment manifests - and changing them triggers a re-deployment of the pod. This is actually ideal behaviour in this scenario as I'd like a service to restart and re-evaluate the FGA_MODEL_ID (and to a lesser extent the FGA_STORE_ID) values to reflect a change in the system.

As we already know, if you publish a model (or create a store) you get a unique identifier back (including on repeated submission of the same model because it's immutable but not idempotent).

So from a CI/CD perspective you have a bit of a problem case with trade-offs.

Multiple OpenFGA Instances

You deploy the same model to multiple OpenFGAs (and databases behind them), a pipeline that pushes to development, staging and production in one go so that a single git hash is the progenitor of the model.

Pros

You have network isolation on your side, OpenFGA can be made fully private and in the internal networking layer of your environment.
You can test new versions of OpenFGA in isolation before production and perform functionality and performance regression testing for quality gating of new versions.
You have database isolation so you can fiddle with tuples without blowing up production users experiences. (This is also a subtle con if you have to duplicate tuples all over the place to do feature testing)

Cons

On doing do you have 3 different FGA_MODEL_ID values you need to propagate to the deployments, and trigger them to restart to utilise them.
Additionally, you almost certainly want to stagger this process so that you can make use of the various environments to test these models. Which leads you into having gated pipelines that you have to manually approve to re-run again and again using the same commit hash (dev > staging > production).
You could also utilise a long lived branch system where you have production, staging and development branches and the act of doing a pull request across the branches triggers the deployment of a model.
Any of the above choices are very manual action heavy and would represent a lot of maintenance overhead for systems management being additionally mindful that you might need to repeat this process for 3 environments, and then 20 microservices in each environment.
Furthermore, you're on the hook to handle 3 instances of Postgres (and likely a HA cluster with read replicas and failover) because of the isolation/duplication of resources.

Regardless of the pros and cons the ultimate issue is that you can't "promote" state (and state in this context means the FGA_MODEL_ID) between environments, so you end up needing to keep track of the model version in 3 environments, and this model version doesn't have a representation in the source to compare - so if you lose any metadata tying them together you'll be relying on manually pairing up a model to its source representation if you need to debug things going wrong. This is especially pertinent when you're doing local testing and you want to roll back to an older model and find out when things went wrong between a working known state and now.

Note

Since the latest model specs allow you to put comments in the model, you could abuse them to put some metadata into the source code but then you'd be on the hook for maintaining a system that embeds a git tag into the comments of the model for quick correlation between a model and the source.

Single OpenFGA Instance

On the flip side you could run a single OpenFGA instance to service your environments, akin to how you'd run a single Flagsmith instance (which itself has an internal data model and representation of environments).

Pros

The Store ID and Model ID inconsistency issue melts away, you just publish a model to a single OpenFGA instance and find a way of automating the result into your deployment and just bump those through your environments.
You can go as far as actually embedding these IDs into the microservice code at the point of a model release and trigger a rebuild - so that if you released a new model a service would increment, auto patch notes and release a new build. Furthermore this would allow you to roll back application code in lockstep with the model ID it's expecting to do checks against.
You can run a larger more high availability deployment cost efficiently if it's re-used in multiple environments.
If you have a single internet accessible OpenFGA instance you can reach it via Github Actions or other CI easily, using the Oauth2 client credentials flow making CI much simpler than punching a hole into your private deployment networks.

Cons

Congratulations on having to deal with networking now and both offer a reasonable amount of overhead over locally available private OpenFGA deployments per environment:
- Either you cross the internet via TLS and OIDC authenticated connections (which has caused some issues this week for me)
- Or you use some out of band networking magic to cross network boundaries across the internet in an encapsulated way e.g. tailscale, or you use the cloud vendor option for routing via their networking without hair-pinning out the NAT e.g. AWS VPC Lattice, VPC peering or Transit Gateway.
You're using production for everything, which means if there's a performance regression (caused by a test, openFGA deployment of a new version, Postgres issue, etc) it will affect everyone equally.
You can't easily test OpenFGA instances ahead of time. You could in theory connect multiple OpenFGA deployments to the same Postgres database and run traffic mirroring of requests, and check the responses but at the cost of doubling your database load.
Since granular access to the OpenFGA with tokens and claims isn't implemented yet, you get Production grade access of the data from any environment, and while this isn't a PII leaking issue for us, it's still a concern.

Summary

All of this boils down to toil and safety in promotion of changes between environments. However we achieve that; be it metadata, name field, idempotency (via a model hash field) is less important to tackling the underlying problem of being confident when an application starts it'll start with the model the developer expected it to.

I've tried both options, both felt like the cons were particularly concerning enough to end up at this discussion with these notes.

Wishlist of Preferred Outcomes

What I'd actually like to be able to do:

Configure applications at source to use a model (and embed that into the packaged artifact of an image container so it's immutable and people can't change the model of a deployment without rebuilding and incrementing the version - this immutability is important to me).
Be able to deploy models to multiple OpenFGA instances in a way this information can be used across any environment as my deployment artifacts are promoted between environments, not rebuilt and pushed (again for immutability reasons).
Even having this with Model IDs only solves half the question if Store IDs are different per instance, since they are core to the API delineation, so being able to reference a Store ID by a predictable identifier that can be shared between environments would help close this user story.

As a future gazing wishlist that might be a big step too far but here's some thoughts anyway:

Alternative state storage for models - in addition to the Database, being able to store models as OCI artifacts ORAS, S3 Objects, or even volume mounted on the local filesystem to the OpenFGA process would have some benefits around immutability and flexibility of moving models around a system.
Idempotency on models - you could utilise a Kubernetes operator pattern to define models in a custom object and have deployments referencing those if the controller produced an accompanying ConfigMap of FGA_STORE_ID and FGA_MODEL_ID values. Related Slack thread. Additionally you could upload a model to a local OpenFGA container on a dev laptop and confirm it's working as expected, encode that unique identifier in your source code and be confident the same model would be selected remotely or locally in any OpenFGA instance.
Direct Git storage of models - Felt like it's worth its own suggestion rather than the first one, but reading the models from git itself and being able to select the one you want via the git ref (hash, branch, tag etc).

And finally, the above are all musings and throwing ideas out there rather than fully fleshed out feature requests - I'll take a name field on Store and Model objects to resolve the issue in the short term if that expedites processes.

jakeyheath Aug 16, 2024
Author

@aaguiarz I think that would help a lot yes. It definitely handles my use case.

For my use case, the developers are already querying the openFGA server with an authorization_model_id, so it wouldn't be much of a stretch to ask them to query it using the name instead of the ID. This would allow us to name the schema version and associate it with the git branch or commit SHA and instruct users they should always use the git SHA to query for the schema.

As a request to this, I think it might be nice to have several names associated with a schema version rather than just one. This is similar to the above example of using container tags. That way we should tag with the git sha, but also the git branch or another name.

aaguiarz · 2024-08-16T14:14:19Z

aaguiarz
Aug 16, 2024
Maintainer

Thanks @danielloader for the context!

1 reply

danielloader Aug 16, 2024

Was a hell of a morning, the coffee hit just right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenFGA

Method to Synchronize Schema Versions with Git #346

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

OpenFGA

Method to Synchronize Schema Versions with Git #346

jakeyheath Aug 2, 2024

Related discussions

Replies: 2 comments · 4 replies

aaguiarz Aug 14, 2024 Maintainer

danielloader Aug 16, 2024

Metadata

Kubernetes

OCI

SemVer

Name

Note on Idempotency

danielloader Aug 16, 2024

Rationale

Problem Case

Multiple OpenFGA Instances

Single OpenFGA Instance

Summary

Wishlist of Preferred Outcomes

jakeyheath Aug 16, 2024 Author

aaguiarz Aug 16, 2024 Maintainer

danielloader Aug 16, 2024

jakeyheath
Aug 2, 2024

Replies: 2 comments 4 replies

aaguiarz
Aug 14, 2024
Maintainer

jakeyheath Aug 16, 2024
Author

aaguiarz
Aug 16, 2024
Maintainer