Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit repo size #21820

Open
wants to merge 151 commits into
base: main
Choose a base branch
from
Open

Conversation

DmitryFrolovTri
Copy link
Contributor

@DmitryFrolovTri DmitryFrolovTri commented Nov 15, 2022

The goal of this PR is to define a repo limit size. I think org and user level restriction could come later.
Thanks to @sapk as this is a continuation of his work that was started in #7833
Would address (except for LFS): #3658

**Screenshots:** global limit

global_repo_size_limit

individual repo size limit for admin

repository_settings_for_admin

app.ini with settings

app ini

how currently push is rejected if limit is reached
change_rejected_from_ui

TODO: (1)

  • Calculate Push Size (sapk had still have some corner-case to test mostly to not block deletion and some force push)
  • Edit max repo size
  • Enforce repo size
  • Fix tests
  • Add ability to have the feature on or off in the config (repository section in config), default - FALSE - A boolean parameter in config to enable the size-feature checking. If it is enabled then size limit is enforced. There is a field to edit size limit in the repo settings UI. The size limit is taken into account. If disabled then repo size limit is ignored. We can leave the size limit field in the repo settings UI and allow to edit it, however, the color of size limit value should be grayed. (telling that it is not working)
    the config parameter - ENABLE_SIZE_LIMIT = true/false
  • Add ability to have a global repo limit per installation that is enforced unless an individual repo size limit present (repository section in config) - DEFAULT: 0 - global default size limit for any repo where individual size limit is not defined. if 0 - no limit if >0 limit is set so that if a repo has 0/undefined sizelimit this configuration limit will be used instead. Config parameter REPO_SIZE_LIMIT = XXXX (should accept bytes, K, and M, and G, may be T like 1000 - 1000 bytes, 1K - 1 kilobyte. 1M - 1 M megabyte.)
  • UI to enable/disable set global repository limit (Note. The setting get's reset every gitea restart to what it was in the config)

TOFIX: (2)

  • UI operations trigger 500 error when repo is over - present a correct message instead
Deletion of file from UI trigger 500 when repo is over. -> TODO catch this specific error. 2019/08/16 05:23:58 ...uters/repo/editor.go:432:DeleteFilePost() [E] DeleteRepoFile: git push: remote: Gitea: new repo size is over limitation 10000 To /home/sapk/go/src/code.gitea.io/gitea/data/repositories/sapk/test.git ! [remote rejected] d9629b41f9c58da756cf806aabf5811b1ff45b50 -> master (pre-receive hook declined) error: impossible de pousser des références vers '/home/sapk/go/src/code.gitea.io/gitea/data/repositories/sapk/test.git'
Creation of branch from UI trigger 500 when repo is over. -> TODO catch this specific error. 2019/08/16 05:28:42 ...uters/repo/branch.go:287:CreateBranch() [E] CreateNewBranch: Push: exit status 1 - remote: Gitea: new repo size is over limitation 10000 To /home/sapk/go/src/code.gitea.io/gitea/data/repositories/sapk/test.git ! [remote rejected] test -> test (pre-receive hook declined) error: impossible de pousser des références vers '/home/sapk/go/src/code.gitea.io/gitea/data/repositories/sapk/test.git'

Note: If functionality to control size is enabled and a push triggers a size check (it's size would breach the limit) then push will be accepted only if total size of not referenced objects (removed size) is over or equal to total size of newly added objects in push. Since git controls when not referenced objects are purged and it's not fast this condition could last for a while. Administrator of instance could speed it up via following steps:

  1. Done from the data folder of the specific repository on the gitea server <path_to_gitea_server_folder>/data/gitea-repositories/<user>/<repository>:
git reflog expire --expire-unreachable=all --all
git gc --prune=now
  1. Execute Git GC from the UI.
    This would free up all not referenced objects and update repository size in UI. On next push (if push size is smaller then limit) adding new objects will be allowed.

===== All below is for another set of PRs once the above is accepted =====

  • Not agreed yet. Update Git GC logic to allow for faster space release
  • Review the need to test doDeleteCommitAndPush type of test - Not Needed - Reverted

NEXT PR (3)

  • LFS objects size should be added into repo size and calculated in .Size repo attr so I will skip LFS checks.
  • Enforce repo size with lfs added
  • adequate error messages to user upon lfs operation (if they are failed due to size)
  • add tests for lfs sizes
  • add lfs specific repo limit size configuration option and UI to edit
  • prevent migrating repositories that would overflow the limit

NEXT PR (4)

  • Add a hard limit on repository size that would cancel any operation: (config option with percentage of overage, special message). No commit can be accepted in such case. This is because the limit is "soft" i.e. it does allow to increase space, but it would already prevent push that moves the repository over.
  • Develop a go-git variant for size calculation

EVEN further PR (5)

  • Add a per user/organisation account size limit that can be set by administrator (Global per repo used everywhere unless a per repo limit is set. If an action crosses the border of such account limit the action should be denied)
  • implement an organization/user level size restriction

Related: #3658 #7833

@techknowlogick techknowlogick mentioned this pull request Nov 15, 2022
5 tasks
}

// CountObjectsWithEnv returns the results of git count-objects on the repoPath with custom env setup
func CountObjectsWithEnv(ctx context.Context, repoPath string, env []string) (*CountObject, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need a go-git variant too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @KN4CK3R would we need this into this PR or it's ok to add later?
@truecode112 fyi.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KN4CK3R could you please advise if it is still a "may" on this item or it is actually "must"?

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Nov 15, 2022
@lafriks
Copy link
Member

lafriks commented Nov 15, 2022

Does it count also LFS? (currently repo size is git repo size + LFS object size)

Imho we should finally split to have two repo sizes in db (git repo size and LFS object size) and both should have different limits settable.

@DmitryFrolovTri
Copy link
Contributor Author

@lafriks it does count the LFS now and the repo limit applies to both, which I think is correct.

@KN4CK3R
Copy link
Member

KN4CK3R commented Nov 15, 2022

LFS size is not counted. And LFS is tricky because we may end up with LFS files uploaded and the pointer push is denied because of the limit or (worse) the pointer push is allowed and the file upload fails afterwards because of the limit.

@DmitryFrolovTri
Copy link
Contributor Author

Ok will look into this further. Thank you!

@lafriks
Copy link
Member

lafriks commented Nov 15, 2022

Yes that's why my suggestion was for those to be different limits and splitting also how sizes are saved in database for repo

@lunny lunny added the type/enhancement An improvement of existing functionality label Nov 16, 2022
@DmitryFrolovTri
Copy link
Contributor Author

Hi @techknowlogick @kdumontnu I have fixed the opencollective link to make it easy to donate. I've moved the funds I was able to gather 5$ - there :) Here is the new link for this activity: https://opencollective.com/oss-code-ge/projects/gitea-limit-repository-size

@morevnaproject
Copy link

Hi @DmitryFrolovTri
We are interested in this feature and contributed to your campaign.
Will spread a word in our social medias. Keep it up! 💪

@techknowlogick
Copy link
Member

@DmitryFrolovTri I've heard back from OC, and because you are not using them as a fiscal sponsor that's why I couldn't make the transfer. So I will just mention it here that upon completion of this PR/issue we will pay out $500 from our collective for the bounty (this amount was chosen to limit tax burden to whoever gets paid out), and @sapk for your work so far, if you are interested, we can pay you for your work so far (reach out to me via email and I can get you sorted).

@DmitryFrolovTri
Copy link
Contributor Author

@techknowlogick I've since then re-registered and the link above is with the opencollective (OC host) now.

@a1012112796
Copy link
Member

a1012112796 commented Mar 14, 2024

maybe you can try add file size limit like github at first. ref: https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#file-size-limits, looks it can be a node to split this big pull request.

@silverwind
Copy link
Member

silverwind commented Mar 28, 2024

Per-file limit would definitely help as a first step. I'm looking for ways to prevent users from pushing big files. Don't care if it works for LFS as I'm not using that and never will. Also, it would be nice to have a "warning threshold", as well at which point a warning message will be emitted via git protocol.

@silverwind
Copy link
Member

silverwind commented Mar 29, 2024

I think it's time to split out such smaller features like per-file limit into smaller PRs. This PR has gone unfinished for far too long.

@DmitryFrolovTri
Copy link
Contributor Author

Yes correct been a while. Issue is that lfs skews the measurement and needs to be excluded, but by default it is included by internal gitea mechanisms, so the check without lfs which is the main idea hear is incomplete without some lfs parsing. I've been struggling to find time so far. Still my goal to finish this. Too bad it takes time. If anybody willing to help I would definitely team up.

@DmitryFrolovTri
Copy link
Contributor Author

maybe you can try add file size limit like github at first. ref: https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#file-size-limits, looks it can be a node to split this big pull request.

I would vote for this to be a separate feature. It can be even done differently.

@DmitryFrolovTri
Copy link
Contributor Author

Will be looking for a person to help me continue either here or somewhere else to speed up.

@Zankaria
Copy link

Looking forward to this: we just got DoS by a 700GB repo

@techknowlogick
Copy link
Member

Hi @DmitryFrolovTri I'm attempting to push conflict resolutions to this branch could you check the "allow edits from maintainers" checkbox? Otherwise, would you be able to add me as a contributor to your fork so I can push to this branch?

@DmitryFrolovTri
Copy link
Contributor Author

Hi @DmitryFrolovTri I'm attempting to push conflict resolutions to this branch could you check the "allow edits from maintainers" checkbox? Otherwise, would you be able to add me as a contributor to your fork so I can push to this branch?

The checkbox is already clicked. I've invited you to the fork

@github-actions github-actions bot added modifies/go Pull requests that update Go code modifies/templates This PR modifies the template files docs-update-needed The document needs to be updated synchronously labels Sep 10, 2024
@markkrj
Copy link

markkrj commented Sep 11, 2024

@techknowlogick couldn't https://codeberg.org/forgejo/forgejo/pulls/4212 be ported to Gitea? 🤔

@lunny
Copy link
Member

lunny commented Sep 11, 2024

@techknowlogick couldn't https://codeberg.org/forgejo/forgejo/pulls/4212 be ported to Gitea? 🤔

Looks interesting. We could have a better one.

@lunny lunny added this to the 1.24.0 milestone Dec 17, 2024
@lunny lunny mentioned this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-update-needed The document needs to be updated synchronously lgtm/blocked A maintainer has reservations with the PR and thus it cannot be merged modifies/api This PR adds API routes or modifies them modifies/docs modifies/go Pull requests that update Go code modifies/migrations modifies/templates This PR modifies the template files modifies/translation size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/docs This PR mainly updates/creates documentation type/feature Completely new functionality. Can only be merged if feature freeze is not active.
Projects
None yet
Development

Successfully merging this pull request may close these issues.