Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenTF style artifacts #3313

Open
caniszczyk opened this issue Sep 1, 2023 · 18 comments
Open

Support OpenTF style artifacts #3313

caniszczyk opened this issue Sep 1, 2023 · 18 comments

Comments

@caniszczyk
Copy link

No description provided.

@caniszczyk
Copy link
Author

https://opentf.org

@tegioz
Copy link
Collaborator

tegioz commented Sep 2, 2023

Thanks @caniszczyk, OpenTF artifacts would be a nice addition to Artifact Hub 🙂

Hi @omry-hay 👋

Let me explain a bit how Artifact Hub indexes content. Please note that AH does not host or serve any of the artifacts kinds supported, we just collect and index some metadata about them periodically.

Any organization or user can add repositories to Artifact Hub. At the moment we support several repositories kinds, like Helm charts, OLM operators, etc. The tracker component polls them periodically and collects metadata as needed. Depending on the repository kind, metadata is extracted one way or another. For kinds that have already defined how a repository should be structured and served, like Helm charts or OLM operators, we have specialized tracker sources. This allowed publishers to start adding existing repositories without requiring any additional work. For other kinds, we have a generic tracker source based on our own metadata file. To keep AH as simple to maintain and extend as possible, we are trying to push this tracker as the recommended way to go whenever possible (most artifacts kinds supported use it actually).

The generic tracker relies on a custom metadata file and a flexible directory structure that supports one or more packages per repository, including multiple versions per package if needed. Data unique to the artifact kind can be added in the form on custom annotations.

Some examples of how other projects organize the Artifact Hub metadata for their artifacts:

Now we can move to OpenTF specifics 🙂

What artifacts kinds would you like to start with? OpenTF providers and OpenTF modules maybe? Once we have that we can start working on the new kind(s) (it shouldn't take long to have them ready). On your side, it'd be great if you could think about what would be your strategy to handle the Artifact Hub metadata (you can take a look at the examples above for inspiration if you'd like).

IMHO the best approach would be to add the AH metadata file to each of the providers/modules repositories and let them list themselves on Artifact Hub. As an example, this would mean adding an artifacthub-pkg.yml file to https://github.com/oracle/terraform-provider-oci. This decentralized approach works better for a number of reasons: each publisher maintains their artifacts, receive the corresponding errors, can apply individually for the verified and official statuses, latest versions are better kept up to date, etc. An alternative approach would be to create a single repository and add the metadata for all the artifacts you'd like to list on AH to it. And of course you can mix both as well, which can be helpful if there are a number of repositories where you may not be able to add the metadata file to.

You can start working on this as soon as you'd like. That way when we have AH ready for the new kinds you can list them straight away 😉

Please let us know if you have any questions!

@roni-frantchi
Copy link

roni-frantchi commented Sep 2, 2023

Hey @tegioz thanks for the warm welcome and very concise and clear intro!
I've scanned through some of your docs, and along with your concise explanation here it helped get the lay of the land.

For kinds that have already defined how a repository should be structured and served, like Helm charts or OLM operators, we have specialized tracker sources. This allowed publishers to start adding existing repositories without requiring any additional work.

I think I may have not fully understood the above ☝️, though.

We have a couple of use cases that we're looking into right now, which maybe what you mentioned above could help with, or maybe a different direction within ArtifactHub could.
Let me try and describe those:

  1. OpenTF aims to stay 100% backwards compatible with Terraform and work with existing private registries. That means, we would need the registry/hub used to resolve or list the artifacts to comply with the modules and providers API.
  2. Since OpenTF is looking to be a drop-in replacement, we would like to be able to cater all existing providers and modules from day 1. Since there is a strict naming convention imposed, mapping between provider/module names to their respective GitHub repositories containing their GitHub release artifacts, we were looking to be able to resolve those based on that - so that for instance even if an artifact is not "registered" to be listed on ArtifactHub, we would still be able to list its versions back the the API described in (1), as well as fetch the links allowing the CLI to download it from GitHub releases.

We would love your thoughts or pointers as to wether we could, and if so how extend ArtifactHub to support these use cases.

@tegioz
Copy link
Collaborator

tegioz commented Sep 2, 2023

No worries @roni-frantchi!

I think we may be talking about two different goals, maybe there was a misunderstanding 😇

  1. Artifact Hub aims to provide a UI where users can discover and inspect content from many different artifacts kinds. In this particular case, AH could replace what the Terraform registry UI provides, by adding support for more kinds like OpenTF providers or modules. This could be easily achieved by using the mechanism suggested in my previous comment.

  2. But as I understand you seem more interested on the other aspect of the Terraform registry: the APIs that allow the CLI to interact with it. Please note that Artifact Hub does not support this kind of functionality for any of the artifacts kinds supported, that's beyond its scope. As an example, AH can list OCI container images by relying on some metadata that the image can include (AH talks to the registries periodically to collect this information). But it's not possible to use AH as an OCI registry because it neither stores the images blobs nor provides the APIs needed to allow interacting with the content. Somehow it'd be similar for OpenTF artifacts.

We'd be super happy to help with 1, but for 2 AH would probably not be a good fit. It'd be great to have more open source TF registries alternatives, but IMHO not embedded on Artifact Hub or even tightly coupled to it. I think this is something that should probably be handled by a tool focused on that task (like citizen).

However, even if 2 is the priority at the moment, which makes sense, it'd be great to list OpenTF artifacts on AH at some point. This has been requested in the past by a few users, but it wasn't possible because AH only supports vendor neutral artifacts kinds.

Maybe AH could even be some sort of provider for the OpenTF registry in order to discover content to serve (just thinking out loud). Harbor, for example, uses Artifact Hub as a provider to replicate content from. This is implemented on the Harbor side by relying on the Artifact Hub API.

Hope this helps 🙂

@roni-frantchi
Copy link

Thank you @tegioz !

Yes, indeed being able to resolve artifacts is our first priority ATM and we were thinking maybe AH does consider that to be in scope (as opposed to hosting the artifacts themselves which is clearly OoS).
Thanks for clarifying that providing an API to resolve artifacts via different toolsets is not something you're looking to bring into scope right now.
I think you're spot on in saying that AH would be a perfect candidate for (1), we'd love to see that.

Maybe AH could even be some sort of provider for the OpenTF registry in order to discover content to serve (just thinking out loud). Harbor, for example, uses Artifact Hub as a provider to replicate content from. This is implemented on the Harbor side by relying on the Artifact Hub API.

This is also an approach to consider - thing is from our end, it means adaptations to the CLI which we would want to refrain to before getting feedback from the community that this is the direction OpenTF should take, which is why we try to preserve the resolution API (aside from the target host due to T&C rag pull , clearly).

@tegioz
Copy link
Collaborator

tegioz commented Sep 3, 2023

This is also an approach to consider - thing is from our end, it means adaptations to the CLI which we would want to refrain to before getting feedback from the community that this is the direction OpenTF should take, which is why we try to preserve the resolution API (aside from the target host due to T&C rag pull , clearly).

I think I didn't explain myself well, let me try again 😇

When I mentioned that AH could be a provider for the OpenTF registry, I meant for the new tool that is yet to be written.

I was seeing the full picture as follows:

  • Artifact Hub. We could add support for OpenTF artifacts as I suggested yesterday (or even adding a custom tracker source). This would allow you to list most of the existing providers and modules repositories on AH. Users and organizations would also be able to list their own. Hopefully the momentum would help getting more on board in an unattended way. Users would be able to explore and discover OpenTF artifacts on the AH website, which could bring more eyes into OpenTF.

  • OpenTF registry. This would be a new open source TF registry service. This is the part I suggested shouldn't be part of AH. This service would expose the providers and modules APIs as defined in the Terraform documentation. OpenTF would run one public instance and anyone would be able to run their private one if they wish. This component would need to feed from somewhere to read the data that'd be served from the APIs just mentioned. This could be a another registry, a yaml file, etc (I was thinking of them as providers or data sources). This is where I thought that AH could be one of those providers, allowing the registry to discover OpenTF artifacts repositories listed on AH (by using the AH API). This would be similar to what Harbor does.

  • OpenTF CLI. Nothing would change here. This tool would talk to the new OpenTF registry that supports the same APIs as the current TF public registry, so all should be good (provided that the API spec is accurate and up to date).

Hope this clarifies it 🙂

@roni-frantchi
Copy link

Got it thanks @tegioz .

So I get the upside for using AH to discover providers and modules, of course.

On the other hand, if there's a need to deploy a dedicated TF registry service providing the package resolution API to match that of the CLI, and seeing there is a clear imposed convention as to how to resolve them (see quote below), what would be the benefit in using the AH over letting the convention hit the GitHub API and fetch the stored packages there?

Since there is a strict naming convention imposed, mapping between provider/module names to their respective GitHub repositories containing their GitHub release artifacts, we were looking to be able to resolve those based on that - so that for instance even if an artifact is not "registered" to be listed on ArtifactHub, we would still be able to list its versions back the the API described in (1), as well as fetch the links allowing the CLI to download it from GitHub releases.

@tegioz
Copy link
Collaborator

tegioz commented Sep 4, 2023

I was thinking that AH could be useful in that case to discover repositories (i.e. list all OpenTF providers repositories urls, list all the official OpenTF modules repositories, the ones published by some given organizations, etc). As I was seeing it, it'd be just to fetch the repositories url and some minimal metadata, the entry point that would let you process the repositories as if they had been provided manually on a config file. So it'd be an optional mechanism to discover available repositories, not an alternative way of processing them. Once you had those urls, you could read their content from GitHub following those conventions.

Artifact Hub lets users and organizations to add repositories automatically from the UI, so more and more are added every day in an unattended way. You could achieve the same by maintaining a list of repositories in a file somehow and let users submit PRs to keep it up to date, for example, or by any other mechanism. It's not a huge win and probably not a priority, but maybe it could be useful at some point both for the public registry and for private deployments 🙂

@roni-frantchi
Copy link

Hey @tegioz thanks and apologies it took me so long to circle back.

So looking beyond our initial alpha and its interim registry which as described is more of a proxy by-convention to GitHub releases and their artifacts;
ArtifactHub would be amazing for package promotion and discovery, as a catalog.

Going back to what you have mention earlier, trying to explore that a little more with you - what are your thoughts about OpenTF making an enhancement on its client-side to also be able to use the ArtifactHub API, rather than just the official API (which will continued to be supported, for the sake of private registries).

  • Once a package was added to ArtifactHub, it keeps track of new tags/releases of that repo, right?
  • iirc packages may be submitted to ArtifactHub by not only by those who own the repo (and there's a badge confirming when it is the author). is that right?
  • To be able to support all ~6K of providers available on GitHub (since they must comply with a naming convention they're easy to find), we would index those and add to ArtifactHub - and their authors may commit the "verified" badge later
  • The OpenTF CLI would need to hit the ArtifactHub API and get a list of available version by owner/package, and for a given package get the GitHub repository which holds the source/release binaries.

Would love your thoughts here

@tegioz
Copy link
Collaborator

tegioz commented Sep 15, 2023

No worries 👍

Going back to what you have mention earlier, trying to explore that a little more with you - what are your thoughts about OpenTF making an enhancement on its client-side to also be able to use the ArtifactHub API, rather than just the official API (which will continued to be supported, for the sake of private registries).

Sure, that'd be great!

Just one thing I'd like to share with you first: I've been talking to @cynthia-sg and @mattfarina (the other Artifact Hub maintainers) about this new kind and we all agreed that we'd need to hold on a bit until OpenTF is part of the Linux Foundation / CNCF. Everything listed at the moment on AH is part of a foundation, as it's been a requirement since the beginning of the project, and we'd like to continue honoring this requirement. Hope this makes sense 😇

  • Once a package was added to ArtifactHub, it keeps track of new tags/releases of that repo, right?

Yes, that's right. Artifact Hub visits all the repositories registered periodically and indexes new versions available automatically. I'm thinking that it's likely that we add a new tracker source for this artifact kind. The current ecosystem is quite large and it'd make things easier on your side. But we'll need to think about it a bit more 🙂

  • iirc packages may be submitted to ArtifactHub by not only by those who own the repo (and there's a badge confirming when it is the author). is that right?

Correct. Owners can even claim the ownership of repositories in an automated way. So you could add all repositories (let's say on the OpenTF org in AH) and their respective owners could claim their ownership if they wish eventually. That'd allow them to request the verified publisher and official badges (when applicable), as well as receive notifications when something goes wrong processing their repo, among other things.

  • The OpenTF CLI would need to hit the ArtifactHub API and get a list of available version by owner/package, and for a given package get the GitHub repository which holds the source/release binaries.

One way of handling this would be to create a new single endpoint for this integration on AH. That endpoint would return a list of the OpenTF repositories listed on AH, including the information you'd need to operate on them. The OpenTF CLI tool could cache this data locally (we'll make sure it's properly cached in our CDN as well) and refresh it when it's older than X hours (something we can agree on once it's ready). We've used this approach with success in other similar cases (please see the issues below for more information):

@roni-frantchi
Copy link

Just one thing I'd like to share with you first: I've been talking to @cynthia-sg and @mattfarina (the other Artifact Hub maintainers) about this new kind and we all agreed that we'd need to hold on a bit until OpenTF is part of the Linux Foundation / CNCF. Everything listed at the moment on AH is part of a foundation, as it's been a requirement since the beginning of the project, and we'd like to continue honoring this requirement. Hope this makes sense 😇

Absolutely!
All I can say about that is it won't be long now 😇

Thanks for the responses validating my thinking!

To be able to support all ~6K of providers available on GitHub (since they must comply with a naming convention they're easy to find), we would index those and add to ArtifactHub - and their authors may commit the "verified" badge later

Wonder what are you're thoughts on that one too 🙏

One way of handling this would be to create a new single endpoint for this integration on AH. That endpoint would return a list of the OpenTF repositories listed on AH, including the information you'd need to operate on them. The OpenTF CLI tool could cache this data locally (we'll make sure it's properly cached in our CDN as well) and refresh it when it's older than X hours (something we can agree on once it's ready). We've used this approach with success in other similar cases (please see the issues below for more information):

That endpoint would return a list of the OpenTF repositories listed on AH, including the information you'd need to operate on them

Maybe I'm missing something - but it sounds like you're suggesting to have a single endpoint for all OpenTF packages, which sounds like a huge response right there - if indeed it is meant to contain all of the thousands of providers/modules, each with possibly hundreds of versions?...

On the other hand from looking at the issues you've shared it seems as if specific popular packages where added such endpoints for their clients to use?..
As in, not an endpoint for all Helm charts dump, but one that lists all version for the Harbor chart?..
Is the expectation to add such one for each popular provider/module?..
Wouldn't using an endpoint with dynamic path, where the provider/module is used as key?
(That's what we plan to be we rolling out with on our first pre release)

@tegioz
Copy link
Collaborator

tegioz commented Sep 17, 2023

Wonder what are you're thoughts on that one too 🙏

Oh I thought the paragraph that starts by "Correct. Owners can.." was answering that as well 🙂 Yes, that wouldn't be a problem. We just should coordinate a bit their addition to measure how everything goes as more and more are added.

Maybe I'm missing something - but it sounds like you're suggesting to have a single endpoint for all OpenTF packages, which sounds like a huge response right there - if indeed it is meant to contain all of the thousands of providers/modules, each with possibly hundreds of versions?

Well, this would depend on the amount of information needed. If we were able to keep this to a minimum, a list of 6k urls plus some extra details wouldn't be that big. Including all the versions would probably be too much, but I wonder if that would be something really needed or the OpenTF CLI tool could interact directly with the repositories once they've been located. I was thinking something like the repository name, maybe a short description and the url. I was hoping that would be enough as a starting point that the tool would be able to follow on its own. A file like that could be cached locally for a few hours, and would allow the tool to use that information even if AH was down.

On the other hand from looking at the issues you've shared it seems as if specific popular packages where added such endpoints for their clients to use?.. As in, not an endpoint for all Helm charts dump, but one that lists all version for the Harbor chart?

Oh no, those endpoints are listing all Helm charts actually 😉 They were added for their clients to use, but listing all content available. We did it this way because those tools required either doing a lot of concurrent searches or fetching a lot of packages each time they run. So instead of hitting the AH API thousands of times (per each user of that tool), we prepared the information they needed in a single endpoint (easy for them to fetch, and easy as well for us to serve and cache).

Wouldn't using an endpoint with dynamic path, where the provider/module is used as key?

The AH API exposes some endpoints to fetch a single package, or search for packages or repositories. That's already available. The problem is that, depending on how tools use those endpoints, there could be an important impact on the AH service. And when we receive a lot of requests, the rate limits start kicking, as we need to keep AH healthy for all users.

The OpenTF tool has the potential to have a lot of users, and those users may build tools on top of it, tools that may require hitting the AH API a lot. So given that the datasets in AH are relatively small and that we can afford to add some special endpoints in some situations, relying on a data dump is an option we can consider sometimes 😇 As an example, I've run a test and a gzipped json file with the names and urls of 6k repositories would be ~110KB. This indeed can get bigger as we add more data (i.e. the Nova dump gzipped is about 1.9MB).

But some other tools are hitting the AH API directly without relying on a data dump, like the Helm CLI, and that's perfectly fine too. That tool has a search hub subcommand that uses directly the AH search endpoint. Given how it's used, it's never been a problem.

My intention was to give you more options so that if the OpenTF tool required a heavier use of the AH API for its operations (like the other tools we provided those endpoints for), we could consider providing a data dump for your specific use case if that would help (I wish I had that option sometimes with some APIs we rely on!). On the other hand, if you only expect sporadic searches and fetching some information for a single package, then you could rely on the API as is, as those endpoints are already implemented.

@RLRabinowitz
Copy link

Hey @tegioz, Arel from OpenTofu here (the new name of OpenTF, under the Linux Foundation) 👋🏻
I'll continue the work of @roni-frantchi here 😄

Currently, the OpenTofu project has a requirement set for the OpenTofu Registry, marked in this comment. I'd like to continue this discussion here and see how ArtifactHub could help us with creating this Registry, and afterwards I'll create an RFC issue in OpenTofu for a suggestion as to how to accomplish that

So I'd like to continue the conversation from where it left off. From what I understand, creating a dump endpoint that will provide the names and URLs of all repositories is possible in AH-side. However, currently tofu uses the registry to:

  1. List available versions of the providers/modules
  2. Asks for the download URL for a provider/module version, based on the platform darwin_amd64, for example

Having tofu itself try and figure out what the available versions of a repository, or figure out the download URL for a specific artifact, would be problematic. The reason being that this would require GitHub API requests, and those would get throttled pretty quickly

Would there be any possible way of dealing with this requirement of OpenTofu? Seems like at the very least, we'd need to get the available versions of the providers and modules from AH, we might be able to generate the download URL by convention from the GitHub release of the repository that AH has directed us towards. However, ideally we'd prefer to be able to determine the release artifact download URL easily without resulting to that (as right now, all providers and modules are forced to be in GitHub)

Regarding a few of the other requirements here:

It must make available a way for authors to submit/revoke/update their public keys, in a way compatible with existing provider signatures

Can AH store Public GPG keys used for artifact signature validation? We need to be able to get the GPG key, for use with validating the signature of the artifact downloaded

Must allow for warnings to be attached to provider version metadata. E.g. the registry currently serves a custom warning when a user tries to fetch the long-deprecated terraform provider. See opentofu/registry#108 for more details.

If the dump endpoint could include versions, would it also be able to include deprecation warnings / information from somewhere?

It must support a single “identity” running thousands of concurrent executions of Tofu (that list and fetch providers and modules) without getting rate-limited.
An identity could be an IP address, a logged-in user (if a login-gate is involved), a company, etc.

Would this amount of load from a single IP address be OK for AH (from the standpoint of a dump endpoint)

Thank you very much for the help 🙏🏻

@tegioz
Copy link
Collaborator

tegioz commented Oct 17, 2023

Hi @RLRabinowitz 👋

The main goal of Artifact Hub is to provide a UI where users can explore and discover multiple kind of artifacts. However, it was never meant to be a potential SPOF that could block an entire ecosystem. We don't store any artifacts intentionally, just some metadata about them. As of today, if Artifact Hub was down users shouldn't be blocked from interacting with any of the artifacts/repositories listed on it. And this is something we'd like to continue being this way 🙂

However, Artifact Hub could provide an alternative to the UI side of the Terraform Registry for OpenTofu. But this would be much easier to achieve the other way around: AH being able to index content from OpenTofu's based registries instead of AH collecting information from the GH repositories. The main reason for this is that we are not immune to GH rate limits either 😅 If we were obtaining this metadata from you, this wouldn't be a problem (for us 😇).

As you mentioned, the Terraform Registry requires repositories to be in GitHub and, IIRC, they rely on GH webhook notifications for updates. Artifact Hub does not require repositories to be hosted on GitHub, so publishers can use GitLab, Bitbucket, Gitea, etc. For git based repositories, we use a poll model that uses some git operations to detect changes and process repositories when needed. So if we were processing the GitHub repositories ourselves, having a very large number of them on the same external provider could be problematic for us as well. As it was mentioned above in a previous comment, there are around ~6k TF providers/modules repositories. So if we were to process them, we'd need to roll this out progressively over time.

The suggestion of adding a dump endpoint was precisely to provide you with an API that was less likely to be rate limited. Our top priority is to keep the web application available at https://artifacthub.io/ up and running for all users. So when we detect any API usage that has a considerable impact on the service, we're forced to apply rate limits. This is less of a problem for a dump endpoint, as we can cache aggressively at the CDN level and it becomes a cost problem (something that we may need to deal with as well and rate limit at some point).

But when I proposed this dump endpoint, it was with the intention of providing an optional way of discovering repositories that the OpenTofu CLI tool would process itself. As an example, the Helm CLI has a search hub subcommand that allows users to search for charts listed on AH. But if AH is down, Helm users can continue installing and upgrading their charts as usual, as the Helm CLI talks directly to the Helm repositories and AH is just a way of discovering them (like it could be Google or GitHub, for example). Ideally OpenTofu should be able to do something similar and not depend on AH to operate.

Hope this makes sense (and helps!) 🙂

@RLRabinowitz
Copy link

Thank you @tegioz 😄

So, the use-case of tofu is a bit different than helm. With helm, you install a chart with a link to the specific chart's repository, and optionally provide a specific version you'd like to install. helm search hub is mostly a command that is run manually to find the repository you'd like to install

In tofu, you create a required_providers block that contains the names of the providers and their version constraints. The version constraints could be exact (= 1.0.0) or broader (>= 1.2.0, < 2.0.0). Moreover, multiple provider blocks could be defined when using modules, and then tofu merges those version constraints together. When running tofu init, tofu lists the available versions of the provider, and find the latest one that matches the constraints. When found, it attempts to download the artifact

So, listing the versions is an important part of tofu's flow. Getting the available versions of the providers an API call that we would have to make, and tofu will make that constantly (in tofu init runs).
Would that be something that could be handled in any way by AH, in a dump endpoint that's built over time, or in some other manner? Ideally, if this dump endpoint could also hold the versions of the providers (and this data could be cached in CDN for an hour or a few hours), that would help tofu . But as you've said:

The main goal of Artifact Hub is to provide a UI where users can explore and discover multiple kind of artifacts. However, it was never meant to be a potential SPOF that could block an entire ecosystem

So yeah, if such a service / endpoint would be down, the main flow of tofu would be blocked

However, Artifact Hub could provide an alternative to the UI side of the Terraform Registry for OpenTofu. But this would be much easier to achieve the other way around: AH being able to index content from OpenTofu's based registries instead of AH collecting information from the GH repositories. The main reason for this is that we are not immune to GH rate limits either 😅 If we were obtaining this metadata from you, this wouldn't be a problem (for us 😇).

That's an interesting approach for a different RFC, where we statically host information about providers and modules (repositories, their versions, download links, and other metadata). The documentation for that RFC is WIP, and maybe AH could be a nice option there. In such a case, would users be able to use the artifact UI to find information about the providers (and their docs) via AH? Would it still require the providers be added manually (or via API) to AH?

@tegioz
Copy link
Collaborator

tegioz commented Oct 18, 2023

No worries 😄

So, listing the versions is an important part of tofu's flow. Getting the available versions of the providers an API call that we would have to make, and tofu will make that constantly (in tofu init runs). Would that be something that could be handled in any way by AH, in a dump endpoint that's built over time, or in some other manner? Ideally, if this dump endpoint could also hold the versions of the providers (and this data could be cached in CDN for an hour or a few hours), that would help tofu . But as you've said:

The main goal of Artifact Hub is to provide a UI where users can explore and discover multiple kind of artifacts. However, it was never meant to be a potential SPOF that could block an entire ecosystem

So yeah, if such a service / endpoint would be down, the main flow of tofu would be blocked

TBH this is a position we'd rather not to be for any kind of artifact supported. In this particular case, we could be rate limited at any point given the amount of repositories expected, we don't have any guarantees. So we wouldn't be able to provide them either. And IMHO building and supporting a solution that may be critical for many organizations on top of these uncertainties wouldn't be right. AH can't be a potential blocker for the OpenTofu ecosystem.

That's an interesting approach for a different RFC, where we statically host information about providers and modules (repositories, their versions, download links, and other metadata). The documentation for that RFC is WIP, and maybe AH could be a nice option there. In such a case, would users be able to use the artifact UI to find information about the providers (and their docs) via AH? Would it still require the providers be added manually (or via API) to AH?

Yes, users would be able to use the AH UI to find information about the providers, read their docs, etc. We could also display the warnings you mentioned in your previous comment, or even other OpenTofu specific views.

The easiest way to achieve this would be to, once you've collected all information about the providers and modules available, to generate (and keep up to date over time) the required AH metadata files for all of them in a git repository (this can be easily automated). Artifact Hub would visit periodically the metadata repository and would index new/updated/deleted content as needed. This way, you could create an OpenTofu organization in AH and publish that content under it. No need to list providers individually, but users/orgs would have the ability to do so if they'd wish. You can see more details about how this would work in my first comment in this issue.

IMHO this is the right way to integrate OpenTofu artifacts in Artifact Hub. This is something we can have ready pretty quickly, so once you are ready on your side please let us know and we'll get it done 😇

@RLRabinowitz
Copy link

TBH this is a position we'd rather not to be for any kind of artifact supported. In this particular case, we could be rate limited at any point given the amount of repositories expected, we don't have any guarantees. So we wouldn't be able to provide them either. And IMHO building and supporting a solution that may be critical for many organizations on top of these uncertainties wouldn't be right. AH can't be a potential blocker for the OpenTofu ecosystem.

OK. So that would mean that we'd need to go down a different route for the "discovery" part of tofu, as it is a mandatory flow that happens constantly when using the tool. And we wouldn't want to make AH a SPOF for tofu if it is not designed address those kinds of concerns. Thank you for the explanation 😄

Now, regarding using AH UI:

IMHO the best approach would be to add the AH metadata file to each of the providers/modules repositories and let them list themselves on Artifact Hub. As an example, this would mean adding an artifacthub-pkg.yml file to https://github.com/oracle/terraform-provider-oci

This is possible, especially for first party providers, but a lot of providers/modules are 3rd party and would require someone to contribute that there. I don't think that doing this we'll get many 3rd party providers onto AH, at least not initially. Would using a specialized tracker source for that help on that front? (Though I assume you'd prefer to use the generic tracker)

An alternative approach would be to create a single repository and add the metadata for all the artifacts you'd like to list on AH to it

This approach is more likely IMO, as it does not require all 3rd party providers or modules to handle the AH integration themselves. However, that would mean that those repositories wouldn't be able to apply for verified and official statuses?
Either way, seems like we can mix and match there with those two approaches

Also, I have a question regarding documentation. Today, the documentation in AH is basically the README.md, right? In tofu, the documentation is based on a website/docs folder in the repository containing markdown files for the provider + markdown file per resource (example). Does AH support a more dynamic method of displaying documentation, such that would allow to have multiple different pages for the documentation?

Thanks again for all the help 🙏🏻

@tegioz
Copy link
Collaborator

tegioz commented Oct 18, 2023

I agree, I think it'd be best if you generated the metadata for all providers once you have processed them. A specialized tracker could talk to an API provided by the OpenTofu registry, but as you said, we'd prefer to use the generic tracker as it makes everything easier to maintain down the line. We support quite a bit of artifact kinds at the moment and the number keeps growing, so we need to try to keep it as simple as possible 😇

Regarding the verified and official status, we'll need to think about something for this. It doesn't help final users that packages that should be official aren't marked as such, so this is something we need to find a solution for, not just for OpenTofu. We have a similar situation with OLM operators, where we get most of them from a single repository as well.

The README file is quite important in AH due to the way the UI displays it. But we'd be open to include something specific for OpenTofu that allows displaying some extra information in a special way. We've done this for some artifacts kinds, like Helm. Some examples:

https://artifacthub.io/packages/helm/artifact-hub/artifact-hub?modal=values-schema
https://artifacthub.io/packages/helm/artifact-hub/artifact-hub?modal=template&template=db_migrator_job.yaml
https://artifacthub.io/packages/helm/artifact-hub/artifact-hub?modal=values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants