Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bioregistry hit project size limit on PyPI #1100

Closed
cthoyt opened this issue Apr 18, 2024 · 2 comments
Closed

Bioregistry hit project size limit on PyPI #1100

cthoyt opened this issue Apr 18, 2024 · 2 comments

Comments

@cthoyt
Copy link
Member

cthoyt commented Apr 18, 2024

On the automated update on April 17th, 2024, we finally hit the project size limit for PyPI. Here's the error log from the CI attempt to upload it to PyPI:

ERROR HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/. Project size too large. Limit for project 'bioregistry' total size is 10 GB. See https://pypi.org/help/#project-size-limit

A few options for going forward:

  1. Remove some older versions, maybe come up with a schedule so after 1 year, only 1 upload per month is kept
  2. Remove some old ones, and then make the update schedule less frequent (maybe once per week instead of nightly)
  3. Ask PyPI for increase in size? I don't think this is necessarily the best

In parallel, it might be worth checking if we can reduce the data that's packaged. For example, it's still nice to keep raw data in version control, but maybe we don't need raw data in the package.

@matentzn
Copy link
Collaborator

Wow! I think package size probably hast to be limited to be long term sustainable. Never knew about this limit!

cthoyt added a commit that referenced this issue Apr 18, 2024
@cthoyt
Copy link
Member Author

cthoyt commented Apr 18, 2024

I just deleted about 50 old versions (evenly spaced out days). Between that, changing updates to weekly, and reducing the size, this should be good until then end of the year when I'll be back and can come up with a more long-term solution

@cthoyt cthoyt closed this as completed Apr 18, 2024
cthoyt added a commit that referenced this issue Apr 18, 2024
This PR does the following:

1. Consolidates the external registry getters (in
`bioregistry.external`), the external registry alignment classes (in
`bioregistry.align` the data artifacts (in `bioregistry.data.external`),
and a few (3) configuration files (in `bioregistry.data`) into a single
hierarchy in `bioregistry.external`.
2. Moves the metaregistry curation sheets and the raw data from the
repositories out of the `src/` structure. They're now in the
`/exports/alignment/` and `/exports/raw/` folders, respectively. The
point of this is to reduce the size of the package that gets sent to
PyPI, related to #1100
3. minor version bump to 0.11.X series

In theory, this shouldn't affect any downstream uses, since the
`bioregistry.align` submodule isn't really for external users.
cthoyt added a commit that referenced this issue Apr 18, 2024
Related to #1100 since we hit the PyPI size limit. Regardless of the
solution we pick going forwards for reducing the existing size, we will
run into it again (eventually), so this will slow that down.

- [x] Update cron schedule in GitHub actions
- [x] Update text on website
- [x] Add additional tutorial/guide on how to run this manually
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants