Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry - Package layout/contents #3266

Closed
majastrz opened this issue Jun 18, 2021 · 7 comments
Closed

Registry - Package layout/contents #3266

majastrz opened this issue Jun 18, 2021 · 7 comments

Comments

@majastrz
Copy link
Member

majastrz commented Jun 18, 2021

Overview

Regardless of the choice of a package manager in #2128 we will need to choose how the packages that are published to the registry are organized internally. This is an impactful design decision and the rationale needs to be documented, so here we are 🙂.

Proposal

At a high level, we need the following to be stored in a package:

  1. Entry point module
  2. Additional modules that are not published anywhere
  3. A readme.md file
  4. A metadata file (if applicable)

Considerations

  • The most interesting part is the decision about the format in which 1 and 2 are stored in Bicep packages.
  • The type checking and validation quality MUST have parity with what we have in local modules today.

Option 1 - Source

We would store module content as .bicep files. This would require the consumer of the module to compile the contents.

Pros:

  • No changes to compiler pipelines.
  • Language server navigation features (go to def, find refs, etc.) are simpler to implement.

Cons:

  • Recompiling published modules may fail due to different compiler version or linter settings and prevent module consumption.

Option 2 - Intermediate Language self-contained

The module and its local or external dependencies are compiled into a single JSON file (like local modules) and included in the package. The package can be optionally annotated with modules it depends on. It is not required to consume the module, but may be useful for auditing purposes.

Pros:

  • Consumer of a module is decoupled from compiler and linter settings used to create the module.

Cons:

  • Requires an additional metadata file for type checking and validation parity given the limitations of the IL.
  • [NuGet only] Unnatural pattern with a package bundling in all its dependencies. Package size grows with the number of direct and transitive dependencies.
  • Code navigation features will be more complicated to implement.

Option 3 - Intermediate Language with dependencies

The module and its local dependencies are compiled into a single JSON file. The external dependencies in the JSON are replaced with placeholders to be filled in during code generation on the module consumer side. The package is annotated with dependencies.

Pros:

  • Consumer of a module is decoupled from compiler and linter settings used to create the module.
  • Standard packaging pattern where the contents of a package and its dependencies are decoupled.

Cons:

  • Requires an additional metadata file for type checking and validation parity given the limitations of the IL.
  • Code navigation features will be more complicated to implement.
  • We are planning to decouple modules from nested deployments in the IL in the future. To make codegen simpler we may need to retain the IL language scoping capabilities provided by linked deployments without incurring the overhead of a separate deployment.
@majastrz
Copy link
Member Author

We feel pretty strongly that the module consumer should not care how the module producer created the module and what settings they used. This makes option 1 not viable. The code navigation con of option 2 and 3 can be mitigated by also bundling up the module sources.

@bmoore-msft
Copy link
Contributor

We talked about having to check-in main.json for the module to be consumed by another user/build... made me think of a few things (since the model of requiring a intermediate artifact in source control seems potentially problematic). Also note, that I'm not sure we are completely analogous to say a C# nuget package (not sure we aren't either just not convinced that's a good bar for us)

We talked about needing the json because the consumer's version of bicep may be different/incompatible. A couple thoughts:

  • is the service going to be compatible with the json? we have breaking changes there (rare but how does that get resolved)
  • could I have different behavior in different bicep files due to not using the bicep from source? (not sure it's transparent if I do, but makes me go "hmmm...."
  • if a module requires a specific version of bicep, then the consumer should be required to use that version (note I don't think this incompatibility will be common as we stabilize)
  • if a bicep file requires a specific version of the compiler, then should bicep support a way to indicate that in the source file?
  • if in the module scenario we require the json file (for some scenario/reason) that's a different between using a module locally and from a registry - locally we compile and registry we do not
  • if we require the json, maybe we should not put the bicep in the registry? (which is really odd if git is our "repository" for modules)
  • if both must exist in source then how to I block changes to main.json so they don't drift...
  • is it obvious to users that they must check-in a file they cannot directly edit?
  • if I have multiple bicep files then I have multiple json files as well?

Will add more if I think of 'em...

@majastrz
Copy link
Member Author

We talked about having to check-in main.json for the module to be consumed by another user/build... made me think of a few things (since the model of requiring a intermediate artifact in source control seems potentially problematic). Also note, that I'm not sure we are completely analogous to say a C# nuget package (not sure we aren't either just not convinced that's a good bar for us)

We talked about needing the json because the consumer's version of bicep may be different/incompatible. A couple thoughts:

  • is the service going to be compatible with the json? we have breaking changes there (rare but how does that get resolved)

Bicep should not be generating JSON that is incompatible with the service. If we end up causing a breaking change in the service, it should be reverted or fixed.

  • could I have different behavior in different bicep files due to not using the bicep from source? (not sure it's transparent if I do, but makes me go "hmmm...."

I'm not understanding the question. What is "source" in this context?

  • if a module requires a specific version of bicep, then the consumer should be required to use that version (note I don't think this incompatibility will be common as we stabilize)

No, such a tight coupling between versions would pretty much stop people from upgrading. We really need to support files produced by same or lower versions.

  • if a bicep file requires a specific version of the compiler, then should bicep support a way to indicate that in the source file?

This seems like it would fit better in a bicepconfig.json file rather than in every .bicep file.

  • if in the module scenario we require the json file (for some scenario/reason) that's a different between using a module locally and from a registry - locally we compile and registry we do not

If the JSON contents are different, that seems like the right behavior. Why is the file different?

  • if we require the json, maybe we should not put the bicep in the registry? (which is really odd if git is our "repository" for modules)

We will eventually have go to definition for module parameters. With local modules, it would take you to the local bicep file. For external module, we either take you to the JSON, which may be jarring or to the sidecar bicep file.

  • if both must exist in source then how to I block changes to main.json so they don't drift...

That would require a mechanism to enforce self-consistency between all the files that make a Git module. If someone references a malformed module like that, they would get an error explaining the issue. This is pretty much required for a Git-based approach and is less necessary for NuGet and OCI.

  • is it obvious to users that they must check-in a file they cannot directly edit?

I'm guessing you're asking this in context of a Git-based workflow... Yes, that's what makes the workflow awkward with Git. The issue doesn't exist with NuGet or OCI because the JSON only lives as a build output artifact and inside .nupkg file or the OCI artifact that was pushed to a registry.

  • if I have multiple bicep files then I have multiple json files as well?

Most likely, the JSON would be needed only for the module that is being packaged. Any local module references would get inlined into the resulting JSON just like today.

Will add more if I think of 'em...

@bmoore-msft
Copy link
Contributor

We talked about having to check-in main.json for the module to be consumed by another user/build... made me think of a few things (since the model of requiring a intermediate artifact in source control seems potentially problematic). Also note, that I'm not sure we are completely analogous to say a C# nuget package (not sure we aren't either just not convinced that's a good bar for us)
We talked about needing the json because the consumer's version of bicep may be different/incompatible. A couple thoughts:

  • is the service going to be compatible with the json? we have breaking changes there (rare but how does that get resolved)
    Bicep should not be generating JSON that is incompatible with the service. If we end up causing a breaking change in the service, it should be reverted or fixed.

But we're not using json generated from bicep, we're using static json that was generated at the time of check-in. Agree that such a change would be rare, but it happens occasionally in RPs moreso than ARM. Though I'm not sure such a change wouldn't require a bicep change either...

  • could I have different behavior in different bicep files due to not using the bicep from source? (not sure it's transparent if I do, but makes me go "hmmm...."
    I'm not understanding the question. What is "source" in this context?

I have a bicep file in the registry, that I'm not actually using, that was written against an "old" version of bicep. That has one syntax/behavior, that I'm not able to use in my bicep files because of the versioning changes (that are causing us to use the json from the registry instead of the bicep). Functionally this will work fine, but if I'm reasoning over the code, will be harder to understand why things are working the way they are... Overall this (not this particular issue but the overall thread) feels like function of mixing SCC with a registry where we're not consuming source code but the intermediate lang.

  • if a module requires a specific version of bicep, then the consumer should be required to use that version (note I don't think this incompatibility will be common as we stabilize)
    No, such a tight coupling between versions would pretty much stop people from upgrading. We really need to support files produced by same or lower versions.

I think we would do as in other languages where we specify a min version, not an exact version - because we do need to be backward compat.

  • if a bicep file requires a specific version of the compiler, then should bicep support a way to indicate that in the source file?
    This seems like it would fit better in a bicepconfig.json file rather than in every .bicep file.

Maybe, but the right now such a config file is not part of a bicep "project/template/deployment" so this may be a bigger step and users will "get it wrong" in perpetuity - e.g. as bicep files get moved from one place to another the information becomes lost. I don't think we'd require it, but simply use it if it were present. Over the long haul we probably need to solve this somehow regardless of the registry discussion... In JSON we have the $schema property (which is the closest thing), in bicep we don't have anything yet...

  • if in the module scenario we require the json file (for some scenario/reason) that's a different between using a module locally and from a registry - locally we compile and registry we do not
    If the JSON contents are different, that seems like the right behavior. Why is the file different?

Sorry I don't think I was clear... when using the module syntax with a local file, we don't require/use the compiled json for anything in the dev environment. When using the module syntax with a file in the registry, we don't use the bicep file and use the json. That won't be intuitive behavior as people copy/paste things around. The files can get out of sync, I have some json files in my scc and some I don't (and probably don't know why).

  • if we require the json, maybe we should not put the bicep in the registry? (which is really odd if git is our "repository" for modules)
    We will eventually have go to definition for module parameters. With local modules, it would take you to the local bicep file. For external module, we either take you to the JSON, which may be jarring or to the sidecar bicep file.

Ok, so sometimes we use the bicep and sometimes not? (in the dev experience?)

  • if both must exist in source then how to I block changes to main.json so they don't drift...
    That would require a mechanism to enforce self-consistency between all the files that make a Git module. If someone references a malformed module like that, they would get an error explaining the issue. This is pretty much required for a Git-based approach and is less necessary for NuGet and OCI.

So we would have to build something in the pipeline? (thinking about what customers would have to do on their own registries) - we have this model now (or a similar model) and it's awkward... one thing I can't actually do is make changes to the bicep file in the PR, because the json would then become out of sync. We could commit during the PR but that's a bit of an anti-pattern and increase the complexity for a customer's solution...

  • is it obvious to users that they must check-in a file they cannot directly edit?
    I'm guessing you're asking this in context of a Git-based workflow... Yes, that's what makes the workflow awkward with Git. The issue doesn't exist with NuGet or OCI because the JSON only lives as a build output artifact and inside .nupkg file or the OCI artifact that was pushed to a registry.

Yeah, I think so - it's the overarching thing I touched on above - that we're mixing source and build artifacts under scc itself. Where it feels like, if we can solve the "this bicep file needs a newer version than the bicep I have for VS Code to make sense of it" then we don't need the json under scc and most (all?) of this just goes away.

  • if I have multiple bicep files then I have multiple json files as well?
    Most likely, the JSON would be needed only for the module that is being packaged. Any local module references would get inlined into the resulting JSON just like today.

make sense - but then thinking about something like the aad module that's a lot of extra files...

@majastrz
Copy link
Member Author

We talked about having to check-in main.json for the module to be consumed by another user/build... made me think of a few things (since the model of requiring a intermediate artifact in source control seems potentially problematic). Also note, that I'm not sure we are completely analogous to say a C# nuget package (not sure we aren't either just not convinced that's a good bar for us)
We talked about needing the json because the consumer's version of bicep may be different/incompatible. A couple thoughts:

  • is the service going to be compatible with the json? we have breaking changes there (rare but how does that get resolved)
    Bicep should not be generating JSON that is incompatible with the service. If we end up causing a breaking change in the service, it should be reverted or fixed.

But we're not using json generated from bicep, we're using static json that was generated at the time of check-in. Agree that such a change would be rare, but it happens occasionally in RPs moreso than ARM. Though I'm not sure such a change wouldn't require a bicep change either...

I am talking about that. With NuGet or OCI, the external module reference would resolve to the compiled JSON packaged in with the module, so the problem doesn't exist. With Git, the JSON file is a build artifact that's committed to the repo and could be modified independently, which requires some integrity checks when such a module is being consumed.

  • could I have different behavior in different bicep files due to not using the bicep from source? (not sure it's transparent if I do, but makes me go "hmmm...."
    I'm not understanding the question. What is "source" in this context?

I have a bicep file in the registry, that I'm not actually using, that was written against an "old" version of bicep. That has one syntax/behavior, that I'm not able to use in my bicep files because of the versioning changes (that are causing us to use the json from the registry instead of the bicep). Functionally this will work fine, but if I'm reasoning over the code, will be harder to understand why things are working the way they are... Overall this (not this particular issue but the overall thread) feels like function of mixing SCC with a registry where we're not consuming source code but the intermediate lang.

Yeah I agree. Something like go to definition that takes me to a module written in older .bicep may be confusing.

  • if a module requires a specific version of bicep, then the consumer should be required to use that version (note I don't think this incompatibility will be common as we stabilize)
    No, such a tight coupling between versions would pretty much stop people from upgrading. We really need to support files produced by same or lower versions.

I think we would do as in other languages where we specify a min version, not an exact version - because we do need to be backward compat.

Yup, makes sense.

  • if a bicep file requires a specific version of the compiler, then should bicep support a way to indicate that in the source file?
    This seems like it would fit better in a bicepconfig.json file rather than in every .bicep file.

Maybe, but the right now such a config file is not part of a bicep "project/template/deployment" so this may be a bigger step and users will "get it wrong" in perpetuity - e.g. as bicep files get moved from one place to another the information becomes lost. I don't think we'd require it, but simply use it if it were present. Over the long haul we probably need to solve this somehow regardless of the registry discussion... In JSON we have the $schema property (which is the closest thing), in bicep we don't have anything yet...

I don't think we have sufficient justification to require something like that into every .bicep file. If someone moves a file to a directory with a stricter bicepconfig.json, it's fine if the code fails to compile. It's should be pretty easy to figure out why it worked in the old location and not in the new.

  • if in the module scenario we require the json file (for some scenario/reason) that's a different between using a module locally and from a registry - locally we compile and registry we do not
    If the JSON contents are different, that seems like the right behavior. Why is the file different?

Sorry I don't think I was clear... when using the module syntax with a local file, we don't require/use the compiled json for anything in the dev environment. When using the module syntax with a file in the registry, we don't use the bicep file and use the json. That won't be intuitive behavior as people copy/paste things around. The files can get out of sync, I have some json files in my scc and some I don't (and probably don't know why).

The reference syntax is not going to point to the JSON file by name. It will just point to the package name and version.

  • if we require the json, maybe we should not put the bicep in the registry? (which is really odd if git is our "repository" for modules)
    We will eventually have go to definition for module parameters. With local modules, it would take you to the local bicep file. For external module, we either take you to the JSON, which may be jarring or to the sidecar bicep file.

Ok, so sometimes we use the bicep and sometimes not? (in the dev experience?)

For local modules, both are supported today and local module syntax involves a relative path to either a .bicep or a JSON file (with set of file name extensions we support). External module references will not point directly to files but the package name and version. (See #3186 for a more detailed discussion.) The user shouldn't know or care that we're consuming the JSON file in the module to make it work. Bicep will manage the local package cache and resolve paths accordingly. Most package managers we use today (npm, NuGet, etc.) follow this approach of abstracting out the location of the module on the local file system.

  • if both must exist in source then how to I block changes to main.json so they don't drift...
    That would require a mechanism to enforce self-consistency between all the files that make a Git module. If someone references a malformed module like that, they would get an error explaining the issue. This is pretty much required for a Git-based approach and is less necessary for NuGet and OCI.

So we would have to build something in the pipeline? (thinking about what customers would have to do on their own registries) - we have this model now (or a similar model) and it's awkward... one thing I can't actually do is make changes to the bicep file in the PR, because the json would then become out of sync. We could commit during the PR but that's a bit of an anti-pattern and increase the complexity for a customer's solution...

Yes, I think you pointed this out elsewhere that it's the mixing of build artifacts and source that is causing this problem. The workflow is definitely awkward and problematic. Having a recommended solution for users to use would be a bonus, so they don't have figure it out, but regardless the consistency check has to be implemented in bicep to validate modules before they are consumed.

  • is it obvious to users that they must check-in a file they cannot directly edit?
    I'm guessing you're asking this in context of a Git-based workflow... Yes, that's what makes the workflow awkward with Git. The issue doesn't exist with NuGet or OCI because the JSON only lives as a build output artifact and inside .nupkg file or the OCI artifact that was pushed to a registry.

Yeah, I think so - it's the overarching thing I touched on above - that we're mixing source and build artifacts under scc itself. Where it feels like, if we can solve the "this bicep file needs a newer version than the bicep I have for VS Code to make sense of it" then we don't need the json under scc and most (all?) of this just goes away.

This isn't just about the bicep version. Consider this:

  • Person A builds a module with all the linter rules disabled and all the type warnings suppressed and publishes it to a registry.
  • Person B consumes a module but has all the linter rule severities set to error and has no warnings suppressed.

It must be possible for a module published by A to be consumed by B and it MUST NOT involve B matching the settings of A. The intermediate language (aka JSON) serves as the boundary.

  • if I have multiple bicep files then I have multiple json files as well?
    Most likely, the JSON would be needed only for the module that is being packaged. Any local module references would get inlined into the resulting JSON just like today.

make sense - but then thinking about something like the aad module that's a lot of extra files...

I'm not following. How does AAD module lead to a lot of extra files?

@yufeih
Copy link

yufeih commented Jul 23, 2021

This isn't just about the bicep version. Consider this:
Person A builds a module with all the linter rules disabled and all the type warnings suppressed and publishes it to a registry.
Person B consumes a module but has all the linter rule severities set to error and has no warnings suppressed.
It must be possible for a module published by A to be consumed by B and it MUST NOT involve B matching the settings of A. The intermediate language (aka JSON) serves as the boundary.

Is it possible to scope the settings locally? Say a linter rule applies only to my working directory, and transition to external modules would suppress linting? Presumably people don't want to see linting results on other people's code.

@majastrz
Copy link
Member Author

Linter rules reuse the semantic model as the core compilation pipeline. In the rule code it's very easy to access referenced module semantic models. We could require that rule authors check if the particular semantic model is external or not, but it would really be opt-in by rule author. Without a centralized solution, this will be a constant source of bugs. Referencing the JSON creates a boundary and prevents the entire class of bugs from existing.

@majastrz majastrz closed this as completed Nov 3, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants