Artifact Hub API rate limit #214

tegioz · 2023-05-05T09:12:14Z

Hi! 👋

This is Sergio, from Artifact Hub 🙂

We've noticed some frequent API latency spikes and after looking a bit into it we've seen that most of the requests causing them are coming from Nova. We are now receiving ~120K requests per hour from Nova installations (and growing fast! see screenshot below please), and some of them seem to be doing a lot of concurrent requests -that require database access- simultaneously. So we have had to take action to protect the service at https://artifacthub.io, by adjusting the rate limits and blocking some requests. I wanted to give you a heads up as this is going to impact your users.

(requests per hour from Nova installations over the last 3 months)

When Nova switched to Artifact Hub, we suggested to use a different approach when using the Artifact Hub API as we anticipated this might happen (#61 (comment), #61 (comment)). The suggested approach has worked great so far in other similar scenarios like Harbor and Helm exporter, so I still think it's the way to go and I would strongly recommend you to switch to it.

I'd be happy to work with you on this next week so that the impact for your users is minimal. We should be able to prepare and deploy a new endpoint for you quickly as soon as we agree on what's needed 😉

Related to #61, #94

rbren · 2023-05-05T15:53:31Z

Thanks for the heads up!

I just emailed you a link to my calendar--hopefully we can figure out a better solution going forward.

codatory · 2023-05-06T13:33:54Z

I'm guessing this is why I started getting errors from artifacthub today?

Error getting artifacthub package repos: failed to search for packages for term metallb

My little cluster only has 10 releases currently, so if this is related to the rate limiting that's probably going to affect most users...

HariSekhon · 2023-05-07T12:09:44Z

This has broken my CI/CD using Nova too:

https://github.com/HariSekhon/Kubernetes-configs/actions/runs/4904719846/jobs/8758019857#step:14:9

tegioz · 2023-05-08T09:10:14Z

Hi @rbren

My suggestion would be to fetch a snapshot of all the data you may possibly need using a single request to a new endpoint. This new endpoint would basically return a dump of all Artifact Hub packages containing only the fields you need to make Nova work.

On our side, we'd add this new integration endpoint, something similar to the one we added for Harbor. On your side, you'd need to change how you talk to the Artifact Hub API to fetch this data once when the program starts and use it as you need for all your operations. Caching this data on your side and only refresh it if it's older than a couple of hours (that's probably how frequently we'll regenerate it) would be a nice thing to have to make the tool faster in those cases where the local cache can help and save some requests.

Going this way, we can generate this data dump once every couple of hours and make sure it's cached and served from the CDN, which would absorb essentially all traffic from Nova installations.

So to get this started, we've prepared a sample data file based on a conversation we had with @lucasreed around the time of #61. We can remove any data you don't need from it or add an additional field if needed, that's the part we need to agree on. The fewer data fields we have the better, as the resulting dump will be lighter.

The data dump would be a json file containing an array of entries, one per package. At the moment each entry looks like this:

{
    "name": "vault-secrets-operator",
    "description": "Official Vault Secrets Operator Chart",
    "links": [
        {
            "url": "https://github.com/hashicorp/vault-secrets-operator",
            "name": "source"
        }
    ],
    "repository": {
        "name": "hashicorp",
        "url": "https://helm.releases.hashicorp.com",
        "official": true,
        "verified": true
    },
    "version": {
        "latest": "0.1.0-beta",
        "all": [
            "0.1.0-beta"
        ]
    }
}

You can inspect the fields you may receive (some packages may not have some of the data) in the sql function used to generate the data dump:

create or replace function get_nova_dump()
returns setof json as $$
    select coalesce(json_agg(json_strip_nulls(json_build_object(
        'name', p.name,
        'description', s.description,
        'links', s.links,
        'maintainers', (
            select json_agg(json_build_object(
                'name', m.name
            ))
            from maintainer m
            join package__maintainer pm using (maintainer_id)
            where pm.package_id = p.package_id
        ),
        'official', p.official,
        'repository', json_build_object(
            'name', r.name,
            'url', r.url,
            'official', (
                case when r.official=true then r.official else null end
            ),
            'verified', (
                case when r.verified_publisher=true then r.verified_publisher else null end
            )
        ),
        'version', json_build_object(
            'latest', p.latest_version,
            'all', (
                select json_agg(version) from snapshot where package_id = p.package_id
            )
        )
    ))), '[]')
    from package p
    join repository r using (repository_id)
    join snapshot s using (package_id)
    where p.latest_version = s.version
    and r.repository_kind_id = 0;
$$ language sql;

Using the data fields shown above, the full dump size is ~5.4MB as of today and it's generated in about 200ms. I will share it by email with you now so that you can start working with it when you have a chance. Once you can confirm that you are happy with it, we'll make it available at https://artifacthub.io/api/v1/nova (it should be ready and deployed by the next working day).

Hope this helps. If you have any question please let us know 🙂

rbren · 2023-05-08T11:52:04Z

This is great, thanks! I think there's just a few more fields we'd need:

Icon
HomeURL
For each version, we need to know if it's Deprecated (we could possibly make a separate request to get this for only the installed version)
For each version, we need to know AppVersion as well (ditto)

FWIW, here's the struct for Nova's output, which we'd want to maintain backward compatibilty with. I've annotated what comes from the cluster vs AH.

VersionInfo contains both chart and app version.

type ReleaseOutput struct {
    ReleaseName string `json:"release"` (cluster)
    ChartName   string `json:"chartName"` (AH)
    Namespace   string `json:"namespace,omitempty"` (cluster)
    Description string `json:"description"` (AH)
    Home        string `json:"home,omitempty"` (AH)
    Icon        string `json:"icon,omitempty"` (AH)
    Installed   VersionInfo (cluster)
    Latest      VersionInfo (AH)
    IsOld       bool   `json:"outdated"` (cluster/nova)
    Deprecated  bool   `json:"deprecated"` (AH)
    HelmVersion string `json:"helmVersion"` (cluster/nova)
    Overridden  bool   `json:"overridden"` (cluster/nova)
}

I'll get working on a draft PR that ingests your sample JSON

rbren · 2023-05-08T11:54:15Z

Also I don't think we need links

rbren · 2023-05-08T12:14:15Z

Actually--we do need links, as well as maintainers, for some internal matching logic

tegioz · 2023-05-08T12:23:21Z

Done 👍

The structure has changed a bit to accommodate the changes around versions:

{
    "name": "secrets-store-csi-driver-provider-gcp",
    "description": "A Helm chart for Google Secret Manager Provider for Secret Store CSI Driver",
    "links": [
        {
            "url": "https://github.com/portefaix/portefaix-hub/tree/master/charts/secrets-store-csi-driver-provider-gcp",
            "name": "source"
        },
        {
            "url": "https://github.com/kubernetes-sigs/secrets-store-csi-driver",
            "name": "secrets-store-csi-driver"
        },
        {
            "url": "https://github.com/GoogleCloudPlatform/secrets-store-csi-driver-provider-gcp",
            "name": "secrets-store-csi-driver-provider-gcp"
        },
        {
            "url": "https://portefaix.xyz",
            "name": "Portefaix"
        }
    ],
    "home_url": "https://charts.portefaix.xyz",
    "maintainers": [
        {
            "name": "nlamirault"
        }
    ],
    "repository": {
        "name": "portefaix-hub",
        "url": "https://charts.portefaix.xyz/"
    },
    "versions": [
        {
            "pkg": "0.1.0",
            "app": "1.3.5"
        },
        {
            "pkg": "0.2.0",
            "app": "1.3.5"
        },
        {
            "pkg": "0.3.0",
            "app": "1.3.5"
        },
        {
            "pkg": "0.4.0",
            "app": "1.0.0"
        },
        {
            "pkg": "0.5.0",
            "app": "1.0.0"
        }
    ]
}

The deprecated field will appear only when the value is true. A logo field has been added as well. The maintainers where already there, except the email which has been intentionally omitted.

The changes have increased the size of the data file to 9.8MB. Will send the new version to you by email right now.

tegioz · 2023-05-08T12:24:31Z

Do you need the link's name? I can remove it to save some space.

tegioz · 2023-05-08T12:29:46Z

(we could possibly make a separate request to get this for only the installed version)

Let's try to not make additional requests please 😇

rbren · 2023-05-08T12:47:50Z

Thanks! We do need the link name.

A few more fields 🙏

Deprecated
LogoImageID
LatestVersion (I think we lost this with the rewrite)
Repository.VerifiedPublisher

rbren · 2023-05-08T12:53:25Z

Here's a WIP PR: #217

tegioz · 2023-05-08T12:55:14Z

Thanks! We do need the link name.

No worries! 🙂

A few more fields 🙏

Deprecated

This one is already there, I left a note after the example :)

LogoImageID

I have already added the logo url (field named logo), I thought you wanted that one 😅 May I ask why do you need the logo_image_id and how are you using it? (just in case you are making any assumptions about it that do not align with how it works)

LatestVersion (I think we lost this with the rewrite)

It's there but not explicitly labeled, you can get it from the list of versions.

Repository.VerifiedPublisher

That's already there too, but only visible when set to true. Same applies to official.

Please take a look at the sql function generating it for all the possible options:

create or replace function get_nova_dump()
returns setof json as $$
    select coalesce(json_agg(json_strip_nulls(json_build_object(
        'name', p.name,
        'description', s.description,
        'links', s.links,
        'home_url', s.home_url,
        'logo', s.logo_url,
        'maintainers', (
            select json_agg(json_build_object(
                'name', m.name
            ))
            from maintainer m
            join package__maintainer pm using (maintainer_id)
            where pm.package_id = p.package_id
        ),
        'official', p.official,
        'repository', json_build_object(
            'name', r.name,
            'url', r.url,
            'official', (
                case when r.official=true then r.official else null end
            ),
            'verified', (
                case when r.verified_publisher=true then r.verified_publisher else null end
            )
        ),
        'versions', (
            select json_agg(json_build_object(
                'pkg', version,
                'app', app_version,
                'deprecated', nullif(deprecated, false)
            ))
            from snapshot
            where package_id = p.package_id
        )
    ))), '[]')
    from package p
    join repository r using (repository_id)
    join snapshot s using (package_id)
    where p.latest_version = s.version
    and r.repository_kind_id = 0;
$$ language sql;

rbren · 2023-05-08T13:01:33Z

Ahh thanks, I was looking at the JSON output--the SQL is helpful.

logo_url is all we need--disregard the logo_image_id.

For latest version, how should we grab this? Is it the last element of the array? Or do we need to do a version comparison on everything in the list to find the max?

tegioz · 2023-05-08T13:05:54Z

logo_url is all we need--disregard the logo_image_id.

Cool 👍

For latest version, how should we grab this? Is it the last element of the array? Or do we need to do a version comparison on everything in the list to find the max?

You'd need to do a version comparison. I was just trying to save some bytes here and there, but if it's a problem I can add it back.

rbren · 2023-05-08T13:14:10Z

Probably better to add it back--the different version comparison tools out there yield different results for a bunch of edge cases, so better if we're consistent w/ what AH has as latest

rbren · 2023-05-08T13:14:51Z

Also--we can drop both official fields, not using that today

rbren · 2023-05-08T13:15:14Z

Oh one more--can we get maintainer.email?

tegioz · 2023-05-08T13:19:36Z

Oh one more--can we get maintainer.email?

I don't think it's a good idea to have all Helm charts maintainers emails in a single file like this. Sorry but I can't add this one.

I'll drop the official fields and add the latest version a bit later, no problem 👍

rbren · 2023-05-08T13:24:23Z

That's fair. I'll adjust our matching algorithm accordingly

sudermanjr · 2023-05-08T13:38:14Z

This is all really great, thanks for working on this everyone. Only one comment:

I think we should keep the official and verified fields to add them to the matching algorithm if we don’t have that already.

tegioz · 2023-05-08T14:16:14Z

The following changes have been applied:

Added latest_version field at the root of the entry
Renamed home_url field to home

File size is now 10MB (1.7 compressed).

I haven't removed yet the official fields as per @sudermanjr comment. If during your day you think this will be the final version I could work on the new endpoint tomorrow in my morning, so you would have it ready by yours :)

rbren · 2023-05-08T16:05:54Z

Aweseome! I assume latest_version is a string?

Updated my PR with the latest changes, including use of the official fields

tegioz · 2023-05-08T16:06:59Z

Aweseome! I assume latest_version is a string?

Yes, that's right 👍

rbren · 2023-05-08T16:12:08Z

OK--as best I can tell, we're good to go! If you're able to provide the full JSON output as it stands right now (e.g. by emailing me the file) that will make it easier to be sure

tegioz · 2023-05-08T16:14:33Z

I sent v3 by email a couple of hours ago. This time it was compressed, maybe it ended up in the spam folder..

rbren · 2023-05-08T17:17:41Z

Oh that's great thank you. I think it's one or two versions behind though--this is what I have from the email. E.g. the version field I think is now versions and differently structured.

This one was from 7h ago, so maybe I'm missing one, but not seeing it in spam

    {
        "name": "cp4d-deployer",
        "description": "A Helm chart to initiate CP4D deployment using the Cloud Pak Deployer",
        "repository": {
            "name": "cloud-native-toolkit",
            "url": "https://charts.cloudnativetoolkit.dev"
        },
        "version": {
            "latest": "1.0.0",
            "all": [
                "1.0.0"
            ]
        }
    },

tegioz · 2023-05-09T06:08:01Z

Something must have blocked it/them then (I didn't get any error back though). These were the subjects:

tegioz · 2023-05-09T11:17:01Z

New endpoint deployed!

https://artifacthub.io/docs/api/#/Integrations/getNovaDump

rbren · 2023-05-09T11:54:33Z

It's working! Will push my latest changes as soon as the GitHub API comes back up.

Seems like this particular endpoint won't be rate limited for the same UserAgent--is that right?

tegioz · 2023-05-09T12:05:56Z

It's working! Will push my latest changes as soon as the GitHub API comes back up.

Awesome! 🚀

Seems like this particular endpoint won't be rate limited for the same UserAgent--is that right?

Other general API rate limits will still apply, but those shouldn't be a problem given how it's used now. We'll keep an eye on it and if it becomes a problem at some point we'll let you know 🙂

rbren · 2023-05-16T16:01:52Z

This is fixed as of Nova 3.6.3. Thanks all for the help and the quick turnaround!

HariSekhon · 2023-05-16T19:06:05Z

Confirmed, my CI/CD is working again, thanks for everybody's work on this issue!

tegioz added the triage This bug needs triage label May 5, 2023

sudermanjr added enhancement Adding additional functionality or improvements and removed triage This bug needs triage labels May 5, 2023

sudermanjr assigned rbren May 5, 2023

tegioz mentioned this issue May 9, 2023

Add API endpoint for Fairwinds Nova artifacthub/hub#3032

Merged

rbren closed this as completed May 16, 2023

tegioz mentioned this issue Sep 15, 2023

Support OpenTF style artifacts artifacthub/hub#3313

Open

tegioz mentioned this issue Jul 11, 2024

Pagination-Total-Count Header Inaccessible in API Response Headers artifacthub/hub#3933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Artifact Hub API rate limit #214

Artifact Hub API rate limit #214

tegioz commented May 5, 2023

rbren commented May 5, 2023

codatory commented May 6, 2023

HariSekhon commented May 7, 2023 •

edited

Loading

tegioz commented May 8, 2023

rbren commented May 8, 2023 •

edited

Loading

rbren commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

tegioz commented May 8, 2023 •

edited

Loading

tegioz commented May 8, 2023

rbren commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023

rbren commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023

sudermanjr commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023 •

edited

Loading

tegioz commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023 •

edited

Loading

tegioz commented May 9, 2023

tegioz commented May 9, 2023

rbren commented May 9, 2023

tegioz commented May 9, 2023

rbren commented May 16, 2023

HariSekhon commented May 16, 2023

Artifact Hub API rate limit #214

Artifact Hub API rate limit #214

Comments

tegioz commented May 5, 2023

rbren commented May 5, 2023

codatory commented May 6, 2023

HariSekhon commented May 7, 2023 • edited Loading

tegioz commented May 8, 2023

rbren commented May 8, 2023 • edited Loading

rbren commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

tegioz commented May 8, 2023 • edited Loading

tegioz commented May 8, 2023

rbren commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023

rbren commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023

sudermanjr commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023 • edited Loading

tegioz commented May 8, 2023

rbren commented May 8, 2023

tegioz commented May 8, 2023

rbren commented May 8, 2023 • edited Loading

tegioz commented May 9, 2023

tegioz commented May 9, 2023

rbren commented May 9, 2023

tegioz commented May 9, 2023

rbren commented May 16, 2023

HariSekhon commented May 16, 2023

HariSekhon commented May 7, 2023 •

edited

Loading

rbren commented May 8, 2023 •

edited

Loading

tegioz commented May 8, 2023 •

edited

Loading

rbren commented May 8, 2023 •

edited

Loading

rbren commented May 8, 2023 •

edited

Loading