-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container registry - layer drops #19586
Comments
Update - this repeated with other blobs, when pushing other image, contains this blob. Blob's file was removed from FS, but in DB table package_blob row with it presents, and because it blob cant'be re-uploaded with docker push |
Can't reproduce on [1.17.0+dev-568-gcbd45471b] it seems to be fixed. |
Gitea [1.17.0+dev-587-g9ea920640] Directory if i restore It was when i build && push image with
Gitea log
|
I see series of HTTP500, usually after some layers drop. Now I can't find if something layers are missing, maybe it's a new bug. Just in case, I attach a full push log.
|
Could you enable dev mode in app.ini so that more detail could be displayed. |
The problem is caused by concurrency requests to database. In TryInsertPackage/GetOrInsertVersion: Get -> Insert. Then |
Ok |
Hi! With mode=dev. This appears every time while building cross-platform images with Builder log:
Gitea log:
|
I'm experiencing this problem on v1.17.0-rc1, running on Kubernetes. I have this symptom: #19586 (comment) |
I also experienced this issue with v1.17.0-rc1. When configuring Docker or Podman to use just one thread it works fine, as soon as you configure more parallel threads to push the container layers, error 500 appears. So it seems like a concurrency issue. |
nope not a concurrency issue, just tried this, |
I think this might solve the issue but I'm not sure if it's gonna cause the whole transaction to fail instead. diff --git a/models/packages/package.go b/models/packages/package.go
index e39a7c4e4..f9bd6c765 100644
--- a/models/packages/package.go
+++ b/models/packages/package.go
@@ -128,14 +128,11 @@ func TryInsertPackage(ctx context.Context, p *Package) (*Package, error) {
LowerName: p.LowerName,
}
- has, err := e.Get(key)
- if err != nil {
- return nil, err
- }
- if has {
- return key, ErrDuplicatePackage
- }
- if _, err = e.Insert(p); err != nil {
+ if _, err := e.Insert(p); err != nil {
+ // Try to get the key again
+ if has, _ := e.Get(key); has {
+ return key, ErrDuplicatePackage
+ }
return nil, err
}
return p, nil
Another option is to parse the error returned back. |
It IS a concurrency issue - just an internal concurrency issue to do with transactional isolation etc. The |
The TryInsert* functions within the packages models make incorrect assumptions about transactional isolation within most databases. It is perfectly possible for a SELECT to return nothing but an INSERT fail with a duplicate in most DBs as it is only INSERT that the locking occurs. This PR changes the code to simply try to insert first and if there is an error then attempt to SELECT from the table. If the SELECT works then the INSERT error is assumed to have been related to the unique constraint failure. This technique avoids us having to parse the error returned from the DBs as these are varied and different. If the SELECT fails then the INSERT error is returned to the user. Fix go-gitea#19586 Signed-off-by: Andrew Thornton <art27@cantab.net>
But why won't any subsequent pushes work? I've tried many times and basically every second or third image pushed has a dropped layer in it. The database reports the layer exists but the layer doesn't exist in the FS, shouldn't there be some amount of validation. |
I've experienced this problem when testing a new Gitea instance in different configurations:
The result appears to be the same: Image pushes mostly fail, retrying them sometimes works. This occurred for a variety of public and private images. Is there a workaround for this issue available? Would building Gitea + the TryInsert PR help? |
It seems some layers are being silently dropped when pushing. When pulling a recently pushed image it kept getting stuck retrying one particular layer. I checked the contents of the named volume and there is nothing in the I did a push to an instance of the official Docker |
Hi, I did the same - recover layer file from backup. Image pull become sucessfull but after some pushes with this layer - it drops again. |
I'm not sure but it's probably because it doesn't actually check the fs but just checks the database if this layer was ever added. If the database has it recorded it assumes it exists. There really should be a fs check added in their too along with the database one to make sure this problem doesn't come up again. |
Copy from https://codeberg.org/woodpecker-plugins/plugin-docker-buildx/issues/42 -- What I think the problem with two uploads uploading the same file is:
Now we have valid database entries from Upload 2 but the blob in the file storage is missing. Every upload afterwards is skipped because the deduplication checks if the blob hash is already present. This does not check the file storage layer because there should be never a missing blob. So a reupload is only possible after all packages which reference this blob are deleted and the registry clean up job removes the unreferenced blob entries. |
This PR addresses #19586 I added a mutex to the upload version creation which will prevent the push errors when two requests try to create these database entries. I'm not sure if this should be the final solution for this problem. I added a workaround to allow a reupload of missing blobs. Normally a reupload is skipped because the database knows the blob is already present. The workaround checks if the blob exists on the file system. This should not be needed anymore with the above fix so I marked this code to be removed with Gitea v1.20. Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
This PR addresses go-gitea#19586 I added a mutex to the upload version creation which will prevent the push errors when two requests try to create these database entries. I'm not sure if this should be the final solution for this problem. I added a workaround to allow a reupload of missing blobs. Normally a reupload is skipped because the database knows the blob is already present. The workaround checks if the blob exists on the file system. This should not be needed anymore with the above fix so I marked this code to be removed with Gitea v1.20. Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
This PR addresses go-gitea#19586 I added a mutex to the upload version creation which will prevent the push errors when two requests try to create these database entries. I'm not sure if this should be the final solution for this problem. I added a workaround to allow a reupload of missing blobs. Normally a reupload is skipped because the database knows the blob is already present. The workaround checks if the blob exists on the file system. This should not be needed anymore with the above fix so I marked this code to be removed with Gitea v1.20. Co-authored-by: Lunny Xiao <xiaolunwen@gmail.com>
I've got this issue again today: #21736
What are the steps to fix this? Last time I had to write a script to delete all docker packages in instance because web ui on 1.17 only allows deleting a single version. Is there a way to tell which version hash
|
Deleting blob from postgres table as suggested in my issue did work after package was pushed again: delete from package_blob where hash_sha256 = 'c51da6ab853721f149e9dea36102d7aabe02e0958ad199661066c851e391ede2';
DELETE 1 |
I don't think this was caused by concurrent pushes. I am running a small instance with only a few docker containers. All containers are pushed by drone. concurrency:
limit: 1 and a few steps before / after docker push. Also none other docker builds were running at that time in drone |
is a pipeline setting and not a docker setting. |
it is. my point is that it could never push concurrently. and after checking all runs around that time there were gaps in 10/20 mins between any docker pushes |
@KN4CK3R What is the fix for this? I get this all the time and on push, some layers show a retry at least once. |
This is fixed since Gitea 1.17.4 (#21862) |
Description
Hi!
The layer
4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1
is constantly disappearing, it contains 1024 zero-bytes. The 'gitea/packages/4f/4f' folder remains empty. If I upload an archived version of a layer, everything works until any image containing this layer is deleted, and after that the layer disappears. At this time in the gitea logsI attach a copy of the layer.
Gitea Version
1.17.0+dev-511-g71bafa026
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Screenshots
No response
Git Version
No response
Operating System
No response
How are you running Gitea?
in docker
Database
MySQL
The text was updated successfully, but these errors were encountered: