Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blueprint: abstract on-disk json/toml keys from struct data #983

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mvo5
Copy link
Contributor

@mvo5 mvo5 commented Oct 15, 2024

This is a draft to show how we could move to a more unified/consistent "blueprint" package where we deprecate some keys and move to a single and consistent view of blueprints.

The end-goal of this is that this code will be part of a new "gitub.com/osbuild/bueprints" but that repo is not quite ready and patches would apply equally well. Here the concept can be discussed more easily.

The basic idea is that:

  1. blueprint.Blueprint itself does not contain any json/toml keys
  2. The parsing is done via blueprint.Load() and the json/toml structs are not expose
  3. We will use "plural" for all arrays, e.g. "user" -> "users" and "group" -> "groups"
  4. We will use underscore ("_") instead of dash ("-") consitently, this means "destination-path" -> "destination_path" etc

Unfortunately this is a lot of busywork, I'm not sure if there is an easier way, we could look into reflection to see if we can make this more elegant.

[draft because this needs at least toml loading and we need to adjust the rest of the code to not use the struct directly for unmarshal]

P.S. Naming ideas welcome too - fooOnDisk, fooJsonToml, fooJT, fooForParser, fooPF all feel cumbersome so far

This commit separates the `bp.Blueprint` struct from the actual
json/toml keys associated with it. This will allow us to deprecate
keys and rename keys without having to change the bp.Blueprint
structure in the code. There is also only a single parsing function
that will be used by the callers that can ensure compatibility
and consistency.
This commit makes some keys deprecated and replaces them with
more consistent keys. This includes:
- user -> users
- group -> groups
- containers-storage -> containers_stoarge

and start with the needed tests.
Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts that popped up:

Some typos but I won't care about those since this is draft.

I like the explicit conversions here between the different formats; even if it is a lot of code it is easily readable.

Does Go have the idea of warnings? Do we want function signatures that include a way to pass on warnings about deprecated keys (or other mistakes)?

Could we have a Version field with a default 0?

@lzap
Copy link
Contributor

lzap commented Oct 17, 2024

I very much like the idea of this and I would like to push it even further. What bothers me that we have several blueprints:

  • image-builder
  • composer / bib
  • images (JSON/TOML)

This is a great chance to not only find the good common blueprint format, but also to prepare ground for required transformations that will open doors for major OpenAPI changes later. Since I will have some spare time in the next quarter, I would like to take this idea and build on top of this draft. Let me know how you feel about the following plan.

First off, a terminology. I think since we will be dealing with many blueprint words I suggest the following naming conventions. Let’s discuss them, at least for the rest of this github comment I will use these:

  • common blueprint - the ultimate common type with extensions (more about them later)
  • distributed blueprint - on-disk blueprint format (I feel like on-disk is not a good choice for APIs)

First off, I would like to think about versioning and also prepare for transition in composer and image-builder by building an initial version of the common blueprint. Let’s call it common blueprint v0, it would have zero changes, just extracted type from the images library.

Now, versioning is an important one. I am thinking not leveraging Go native modules versioning but instead keep the module on the v1 forever (or maybe even never creating any git tag leaving it on sha-date v0 versioning scheme) and simply putting the common blueprint into properly named package. For example: github.com/osbuild/blueprint/cb0 would be a package for common blueprint v0 and db0 would be distributed blueprint v0.

The API would provide two interfaces: BlueprintReader and BlueprintWriter providing Load/Save capabilities as proposed in this draft PR. Instead of direct filenames, it would use Reader and Writer interfaces so it can be used easily in composer/image-builder. But the idea remains the same: distributed blueprint is separated from the common blueprint. And other project can build their own readers and writers into their native types or formats.

Now, I assume that the native blueprint in the images library has 1:1 mapping, but that is not the case for composer and image-builder I was told. But I would like to design the common blueprint to be bi-directional. Meaning, a distributed blueprint from image-builder could be converted to a common blueprint and back without loosing any data. This consistency will ensure the solution is future-proof and also will allow some nice test scenarios.

For that, I propose to create an "extension" field in the common blueprint, likely a slice or a map. Readers and writers could use the extension field to add own arbitrary data. For example, image-builder writer could put openscap tailoring information (insights uuid) there and reader could extract it from there if needed. This extension data would be only recognized by image-builder and ignored by other implementations.

Once v0 common blueprint and respective reader/writer are implemented, it can be relatively easily plugged into composer and image-builder while keeping their current APIs. Once that is done, only then I would suggest to do a common blueprint "cleanup" as proposed here and bump the version to v1. New readers/writers will need to be created for each version but this will allow us to start messing around with composer or image-builder APIs at its own pace. A huge benefit is what @ondrejbudai mentioned on our Prague meetup - have an API transformation for deprecated v1 API to v2 API.

The ultimate goal would be to use the "native" common blueprint readers and writers everywhere leading to one common format (JSON/TOML). But thanks to the separated types, we can gradually rollout major format upgrades.

I think that the new library should only provide the common blueprint, extensions, testing support, JSONSpec schema and interfaces with a default (reference) readers and writers. Individual readers and writers should be part of images, composer and image-builder repositories together with tests. Tho we could also explore the idea of hosting also all readers and writers in the new repository - having everything in one place is indeed a tempting idea.

Now, there is so much I do not know. Maybe this plan is too crazy, wrong or just too optimistic. Please let me know how you feel about this @mvo5 @supakeen @ondrejbudai @croissanne

@achilleas-k
Copy link
Member

achilleas-k commented Oct 17, 2024

I very much like the idea of this and I would like to push it even further. What bothers me that we have several blueprints:

  • image-builder
  • composer
  • images (JSON/TOML)

This is a great chance to not only find the good common blueprint format, but also to prepare ground for required transformations that will open doors for major OpenAPI changes later. Since I will have some spare time in the next quarter, I would like to take this idea and build on top of this draft. Let me know how you feel about the following plan.

Just ftr, in my mind, the one in osbuild-composer is the canonical blueprint format.
The one in images should have branched off and been renamed to something else when the split happened. It would have avoided a lot of confusion about what "blueprint" means and where it's defined.

First off, a terminology. I think since we will be dealing with many blueprint words I suggest the following naming conventions. Let’s discuss them, at least for the rest of this github comment I will use these:

  • common blueprint - the ultimate common type with extensions (more about them later)
  • distributed blueprint - on-disk blueprint format (I feel like on-disk is not a good choice for APIs)

I would reserve the name "blueprint" for the schema we end up defining that will be used in osbuild-composer and eventually image builder. The one that gets documented and we expect users to write and use to interact with all frontends of image builder.
Any other build configuration should be named something else to avoid confusion. A "build config" for example that either adds extra bits of options or simplifies the schema because it has less requirements for backwards compatibility.

Let me illustrate both these extra bits and simplifications with examples:

  1. In osbuild-composer, the blueprint needs to support two customizations that overlap: ssh keys and users. When a user starts a build with a blueprint that contains both customizations, the ssh keys get converted to user customizations. The ssh keys customization does not exist in the internal representation of the build configuration, but it exists in the user-facing blueprint for backwards compatibility. So, here in osbuild/images, the sshkey customization does not exist and instead osbuild-composer merges the two customizations before passing the config on to osbuild/images.
  2. Some options are defined on the command line when using osbuild-composer, like ostree stuff or the desired size of the image. An internal build config should include these (if they will still exist), because osbuild/images or any other image generator shouldn't care about the distinction between options and blueprint.

If we had a BuildConfig (the name is not final, just an example) in osbuild/images that did both of the above, the interface to osbuild/images would be simpler and we would only need to do the conversion from Blueprint to BuildConfig internally. It would also avoid confusion about the nature of this thing in osbuild/images called Blueprint and its relation to the osbuild-composer Blueprint.

The problem with this approach is that projects importing osbuild/images and wanting to support Blueprint would have to implement the conversion.

First off, I would like to think about versioning and also prepare for transition in composer and image-builder by building an initial version of the common blueprint. Let’s call it common blueprint v0, it would have zero changes, just extracted type from the images library.

I would extract it from osbuild-composer because that's really the canonical version right now.

Now, versioning is an important one. I am thinking not leveraging Go native modules versioning but instead keep the module on the v1 forever (or maybe even never creating any git tag leaving it on sha-date v0 versioning scheme) and simply putting the common blueprint into properly named package. For example: github.com/osbuild/blueprint/cb0 would be a package for common blueprint v0 and db0 would be distributed blueprint v0.

I think it would be nicer for adoption if we defined it as a schema. Either OpenAPI or jsonschema or something like that. Then structs for languages can be generated automatically.

The API would provide two interfaces: BlueprintReader and BlueprintWriter providing Load/Save capabilities as proposed in this draft PR. Instead of direct filenames, it would use Reader and Writer interfaces so it can be used easily in composer/image-builder. But the idea remains the same: distributed blueprint is separated from the common blueprint. And other project can build their own readers and writers into their native types or formats.

Now, I assume that the native blueprint in the images library has 1:1 mapping, but that is not the case for composer and image-builder I was told. But I would like to design the common blueprint to be bi-directional. Meaning, a distributed blueprint from image-builder could be converted to a common blueprint and back without loosing any data. This consistency will ensure the solution is future-proof and also will allow some nice test scenarios.

Yes, we should definitely do this for the service blueprints.

For that, I propose to create an "extension" field in the common blueprint, likely a slice or a map. Readers and writers could use the extension field to add own arbitrary data. For example, image-builder writer could put openscap tailoring information (insights uuid) there and reader could extract it from there if needed. This extension data would be only recognized by image-builder and ignored by other implementations.

Once v0 common blueprint and respective reader/writer are implemented, it can be relatively easily plugged into composer and image-builder while keeping their current APIs. Once that is done, only then I would suggest to do a common blueprint "cleanup" as proposed here and bump the version to v1. New readers/writers will need to be created for each version but this will allow us to start messing around with composer or image-builder APIs at its own pace. A huge benefit is what @ondrejbudai mentioned on our Prague meetup - have an API transformation for deprecated v1 API to v2 API.

The ultimate goal would be to use the "native" common blueprint readers and writers everywhere leading to one common format (JSON/TOML). But thanks to the separated types, we can gradually rollout major format upgrades.

I think that the new library should only provide the common blueprint, extensions, testing support, JSONSpec schema and interfaces with a default (reference) readers and writers. Individual readers and writers should be part of images, composer and image-builder repositories together with tests. Tho we could also explore the idea of hosting also all readers and writers in the new repository - having everything in one place is indeed a tempting idea.

Now, there is so much I do not know. Maybe this plan is too crazy, wrong or just too optimistic. Please let me know how you feel about this @mvo5 @supakeen @ondrejbudai @croissanne

I had the opposite idea, kind of. I think the "official" schema should contain everything, the union of all Blueprints everywhere. Sub-schemas of the common Blueprint would then be defined for different implementations, of even for different parts of projects. For example, bootc-image-builder doesn't support the whole Blueprint, only a subset. That would be up to bootc-image-builder to define. Also, in osbuild/images, different image types support different parts of the Blueprint already. The [customizations.installer] section only works for Anaconda installers. [customizations.filesystem] doesn't make sense for a lot of image types (containers, ostree commits, installers). Right now these restrictions are handled poorly by large, hard to follow functions in each distribution implementation. A way to import or define a subschema to support would be a great way to unify this behaviour.

@supakeen
Copy link
Member

I very much like the idea of this and I would like to push it even further. What bothers me that we have several blueprints:

  • image-builder
  • composer
  • images (JSON/TOML)

Images should theoretically be the one in osbuild-composer (which is leading), we didn't split it out when images got split out and got a bunch of duplication. Hence I'd say we have two current incompatible blueprints: service (image-builder here), and local (composer here).

  • common blueprint - the ultimate common type with extensions (more about them later)
  • distributed blueprint - on-disk blueprint format (I feel like on-disk is not a good choice for APIs)

Just "blueprint" should be enough, I'd really dislike to split it out further than that.

First off, I would like to think about versioning and also prepare for transition in composer and image-builder by building an initial version of the common blueprint. Let’s call it common blueprint v0, it would have zero changes, just extracted type from the images library.

I don't understand this distinction is it what you want to have in the future?

Now, versioning is an important one. I am thinking not leveraging Go native modules versioning but instead keep the module on the v1 forever (or maybe even never creating any git tag leaving it on sha-date v0 versioning scheme) and simply putting the common blueprint into properly named package. For example: github.com/osbuild/blueprint/cb0 would be a package for common blueprint v0 and db0 would be distributed blueprint v0.

I'm not particularly well versed on Go module versioning but if that's what's commonly done then it sounds good.

For that, I propose to create an "extension" field in the common blueprint, likely a slice or a map. Readers and writers could use the extension field to add own arbitrary data. For example, image-builder writer could put openscap tailoring information (insights uuid) there and reader could extract it from there if needed. This extension data would be only recognized by image-builder and ignored by other implementations.

We can have implementations using osbuild/blueprints pass a slice of supported sections. What is the difference between sections (top-level keys) and having them under extensions?

Once v0 common blueprint and respective reader/writer are implemented, it can be relatively easily plugged into composer and image-builder while keeping their current APIs. Once that is done, only then I would suggest to do a common blueprint "cleanup" as proposed here and bump the version to v1. New readers/writers will need to be created for each version but this will allow us to start messing around with composer or image-builder APIs at its own pace. A huge benefit is what @ondrejbudai mentioned on our Prague meetup - have an API transformation for deprecated v1 API to v2 API.

Yes, a new version of blueprints (with a required version field) would be a nice idea.

The ultimate goal would be to use the "native" common blueprint readers and writers everywhere leading to one common format (JSON/TOML). But thanks to the separated types, we can gradually rollout major format upgrades.

I'd say that we should only support a single serialization format if at all possible (it might be difficult with the API).

@lzap
Copy link
Contributor

lzap commented Oct 21, 2024

Thanks for your comments.

I see that both of you do not like the newly proposed terms, I just could not dare to suggest two separate terms. I agree that "blueprint" term should be only used for the "main schema", and we should probably find better suited names for sub-formats.

The only thing that complicates this is that a new struct name will require a big patch in each and every project. Maybe the renaming can happen as a separate commit once the extraction is done. But that is just a technical detail.

So you are both saying that osbuild-composer should be the "main" schema source, not here. I see, I had no idea.

Either OpenAPI or jsonschema or something like that. Then structs for languages can be generated automatically.

I was planning schema, but I thought it is something that is not important now. I was planning to suggest "bottom-to-top" approach we utilized in launch service - we actually generate OpenAPI components (schema) from Go structs. This ensures that the schema is 100% Go "friendly" and there are no features which are difficult to work with. For example, enums. Other language bindings can be generated too as a follow-up step.

Then it looks like you both do not like the extensions idea. I am all for making the "main" blueprint a "catch-all" structure and then there is no need for an extension. The only drawback is that if we ever convert an API or "on-disk" format to the "main" blueprint in the future, there will always be some irrelevant fields that will appear in documentation (if we take advantage of schemas). That was my only concern and that is why I went down this idea. If you think exposing everything is better then I am all for it.

I don't understand this distinction is it what you want to have in the future?

I am not sure if I get your concern, but the only idea around extracting a v0 with zero changes first is just technical thing - I want to make sure that extraction goes smooth and everything works before I would start renaming or cleaning up the data structures. But this problem can be covered with tests nicely, so maybe doing it in one go is better option.

Copy link

This PR is stale because it has been open 30 days with no activity. Remove "Stale" label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 21, 2024
@lzap
Copy link
Contributor

lzap commented Nov 21, 2024

Called for a meeting next Monday about common blueprint format.

@github-actions github-actions bot removed the Stale label Nov 22, 2024
mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit decouples the marshaling representation of the
`FilesystemCustomization` from the actual struct. This means that
we can reuse custom unmarshaling via datasizes.Size and also
have our external represenation "pure".

If we continue using this pattern is also means that we can rename
fields in the marshaling and provide compatbility easily.

Thanks to Thozza, c.f.
osbuild#1049 (comment)
and
osbuild#983
mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit decouples the marshaling representation of the
`FilesystemCustomization` from the actual struct. This means that
we can reuse custom unmarshaling via datasizes.Size and also
have our external represenation "pure".

If we continue using this pattern is also means that we can rename
fields in the marshaling and provide compatbility easily.

Thanks to Thozza, c.f.
osbuild#1049 (comment)
and
osbuild#983
mvo5 added a commit to mvo5/images that referenced this pull request Nov 22, 2024
This commit decouples the marshaling representation of the
`FilesystemCustomization` from the actual struct. This means that
we can reuse custom unmarshaling via datasizes.Size and also
have our external represenation "pure".

If we continue using this pattern is also means that we can rename
fields in the marshaling and provide compatbility easily.

Thanks to Thozza, c.f.
osbuild#1049 (comment)
and
osbuild#983
github-merge-queue bot pushed a commit that referenced this pull request Nov 22, 2024
This commit decouples the marshaling representation of the
`FilesystemCustomization` from the actual struct. This means that
we can reuse custom unmarshaling via datasizes.Size and also
have our external represenation "pure".

If we continue using this pattern is also means that we can rename
fields in the marshaling and provide compatbility easily.

Thanks to Thozza, c.f.
#1049 (comment)
and
#983
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants