-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
cron.update_mirrors broken in Gitea 1.16.0 #18607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I increased it to
and the cron.update_mirrors run's ~8-10 minutes. Second time ~12 minutes. Means for me: 2 runns -> 4000 updated 2000 mirrors. but ~20 minutes for 4000 mirrors it's to fast for me. Questions:
Sometimes I see what Gitea is doing and sometimes not Just missing the git call for a lot projects. And it will be good to know that this cron was running (start date, finish date) and how many repos was updated. You are showing some infos for some crons. But not for the mirror cron. I just have to know: is the process working fine or not? Can I lost data? |
I run today manually the cron. Gitea updated run some mirror updates. But the mirror.updated_unix column hasn't after the run ZERO changes. Now I call the curl
But the whole mirror update process looks brocken. |
After spending some hours with Gitea 1.16.0 I'm disapointed. Disapointed cause Gitea is not working like in 1.15.x for mirror sync. Yes, there was changes. But the changes shouldn't change the existing behavior. I sended ~40.000 curl calls (4x for all mirrors) to get this mirror_update status 1785 mirrors are not up2date. This is my current config:
If I now start the update_mirrors cron manually, the oldest mirrors don't be updated. Nothing happens. Why? If I open the repo with the oldest sync date and I click on sync. It's not syncing. Why not? How long have I to wait? |
Are there any logs? |
Hm ... if I switch TO DEUBUG I see a lot of SQL's. Or this: #16982 (comment) I think, the "problem" is the new feature implemented in #16982. Or you did other big changes. |
I added
and than
and restart gitea. After I run mirror-cron manually I see no changes. And I can bet, that if I would do my ~9000 curl calls, some of the mirrors will be synced. The 1 repo with mirror.updated_unix = 2022-02-06 is a new repo which I added today. |
I've held off replying to this issue because I'm struggling to completely understand what is the problem and what is actually happening. This is despite you having posted 11 comments... So can we please have a succinct three line description as to what is happening, what you want to happen and what your configuration is? (that includes cron configuration.) I will also remind you that you SPECIFICALLY asked for LIMIT_SIZE - I checked with you as to how it was supposed to work and I wrote a very long explanation as to what it was doing. There have been many changes to mirroring this is not the only thing. |
Sorry ... I was tested and I posted every try. And there are some problems now. Started two times after gitea restart. But no last start date set
Yes, this was a wish. But this implementation should not change the "normal process". This should be only for small optimization for people with a lot of mirrors on the gitea instance.
Short version on my problem. I have 9000+ mirrors. The mirror sync cron was running one time a week and needed 3h to update all mirrors. And after update to gitea 1.16.0 it's not working anymore. The cron runs only for 8-10 seconds. To update ~90% of the repos I sended yesterday 4 times 9000+ curl calls
In this case the sync worked. And now after flush-queue (cause it could be corrupt) if I start the mirror cron manually no one mirror will be updated. I tryied this 3 times. My config now:
|
@zeripath when I change the loglevel to
than I see 9000+ (each for every mirror) select statements in the log
and than this
select. and no one mirror will be updated. This are the 8-10 seconds the cron.update_mirrors needs to run and finish. |
OK let's look at that configuration first. You have way too many mirrors to for a persistable-channel queue to ever be the correct queue for you. Therefore change your update_checker queue to use a level queue and get rid of the channel queue related things. [repository]
ROOT = /var/lib/gitea/repositories
[mirror]
DEFAULT_INTERVAL = 8h
[queue.mirror]
TYPE=level; <- You have way too many mirrors for a channel or persistent-channel to be the right queue for you
; Update mirrors
[cron.update_mirrors]
; Every day at 4AM
SCHEDULE = 0 0 4 * * *
PULL_LIMIT = -1
PUSH_LIMIT = -1 Next I think we need address what the
It will then queue PULL_LIMIT pull mirrors and PUSH_LIMIT push mirrors for update. The updates will be done by the workers on the other end of the queue. This will depend on your general queue configuration but often this is a scaling worker pool up to a maximum of 10 workers. So ... If you are finding that nothing is being queued ... It would be useful to check the values of The PULL_LIMIT and PUSH_LIMIT code is working perfectly as described and so your problem is elsewhere. Now to explain why we changed the default PULL_LIMIT and PUSH_LIMIT. The vast majority of installs will benefit from this change as most people do not:
Your situation represents an edgecase of edgecases. But I personally have tried to provide you with ways to make your personal situation easier. The change to use a normal queue for mirrors (#17326) allows you the option to use a level queue for your underlying queue thus prevent gitea from seizing up due to a blocked queue. The PULL_LIMIT and PUSH_LIMIT options were requested by you gives you other options to consider changing your cron configuration back to /10 minutes (but you might actually need a PULL_LIMIT/PUSH_LIMIT to be percentages of the total number of mirrors (not sure here.)) You're not running Gitea in a normal way and that means you will always need to carefully think about things. In your situation you need to tune things properly and that is what we have provided for you. |
This is not defined in https://github.com/go-gitea/gitea/blob/main/custom/conf/app.example.ini and https://docs.gitea.io/en-us/config-cheat-sheet/#mirror-mirror
My current status for the mirrors This means, that all 9000+ mirrors should be synced now.
;)
Thx!
I added
and restarted my Gitea. But the cron do the same: nothing. The sync update with curl and directly in repo settings is working. |
Change from what? Info is one of the lowest log levels and it will not be giving us any special information. [log]
MODE=console, traceconsole
LEVEL=info
[log.traceconsole]
MODE=console
LEVEL=trace
EXPRESSION=services/mirror
Would be more useful. OR even whilst gitea is running you can simply run: ./gitea manager logging add console --name traceconsole --level TRACE --expression services/mirror And it will add trace level console logger that will emit TRACE level logs from events in the services/mirror files
That is your 9000 repositories being loaded and then added to the update queue.
These are push mirrors and it is clear that you have none.
Yes this is correct because the cron.update_mirrors task is simply adding mirrors to the queue to be updated - that is all it has ever done. It has never represented the actual work of doing the updating. The work of updating a mirror will be done by workers on the queue. The mirror queue will scale its workers to do account for the amount of things in the queue.
I've tried to explain this to you before - the cron task update_mirrors DOES NOT represent the actual work of doing the updating. It has never done that. Previously you've had a proxy of this because in your situation you've been blocking the whole queue due to the number of mirrors you have. |
What do you mean with "You're not running Gitea in a normal way"? ;) If someone (not I) is running Gitea which is used by a lot of people with a lot of repos and mirrors. Than that person get the same problems.
Is this an design "problem"? Understand or not. I'm not understand what exactly is blocked? The workes for the mirrors queue?
It works with Gitea 1.15.x. -> is the old way isn't implemented anymore? And the solution now is? At the moment I can send one a week the curl calls to update the mirrors. |
Thus you get a gitea/services/mirror/mirror.go Lines 67 to 70 in f393bc8
This will get pushed to the queue unless it is already in the queue. gitea/services/mirror/mirror.go Lines 92 to 99 in f393bc8
gitea/services/mirror/mirror.go Lines 101 to 104 in f393bc8
Now you assert that calling sync with curl works. So let's follow what that does: gitea/routers/api/v1/repo/mirror.go Line 17 in f393bc8
gitea/routers/api/v1/repo/mirror.go Line 51 in f393bc8
gitea/services/mirror/mirror.go Lines 151 to 165 in f393bc8
Which pushes to the same queue Now you might argue that that push doesn't have a Has wrapped around it but... There's a Has internally in the push. So what have we found:
I guess the question I have is what is making you think this isn't working? So... One thing you could do is simply change the next_update_unix for a mirror manually. Put the tracer logger I suggested above on. And then click the Cron task button and follow what the logs do. |
Results for next_update_unix If I use curl, that do something, but not for all curl calls. I made 547 curl calls for 547 repos. And after gitea was ready, 266 repos still was not updated. Than I made second run for the 266 repos. And this show me, that the sync mirror process isn't working. Cause in this case the manual cron start will update all my mirrors. Before new manual start Manual start ... And this are the changes Can you explain this? Gitea updates only 4 mirrors with mirror_next_update_unix = '2022-02-06' And if I start the cron 2nd time ... no one repo will be updated. |
Go to monitor and tell me how many workers you have in your mirror queue right now. (Not initial configuration) |
But the number of workers it not important. If I put 1000 items into the queue and I have only 10 workers, that it need longer. And Gitea is not updating more than 1 repo in a row. |
That is initial configuration |
If I refresh the config side for the mirror queue I don't see any changes. |
So... Mirror-channel? |
I guess a trick for that is to wait until that worker was due to timeout and then add another worker manually yourself and see if that finishes off the work |
Actually a flush worker would be better |
Yup I bet this is this problem. When the zero worker times out unless there's a push the lack of worker won't be noticed. Workaround just set workers=1 in [Queue.mirrors] or flush the queue. I'll have a think - likely when the managedQueue loses its final worker and if there's something in the queue it should zeroboost again. |
I added this
and set the limits to 100 (only for testing)
Than restart gitea And start the cron manually ... no repo was synced. I see this and no other activity. after 8-10 seconds this task is finished. I made 4 tests
|
But what is the difference to curl? curl is an external call which put the mirror into the queue. And it works. What is the difference between the cron call? If you try to explain this, you will find the "problem". ;) |
It is possible for the zero worker to timeout before all the work is finished. This may mean that work may take a long time to complete because a worker will only be induced on repushing. Fix go-gitea#18607 Signed-off-by: Andrew Thornton <art27@cantab.net>
* Restart zero worker if there is still work to do It is possible for the zero worker to timeout before all the work is finished. This may mean that work may take a long time to complete because a worker will only be induced on repushing. Also ensure that requested count is reset after pulls and push mirror sync requests and add some more trace logging to the queue push. Fix #18607 Signed-off-by: Andrew Thornton <art27@cantab.net>
Backport go-gitea#18658 It is possible for the zero worker to timeout before all the work is finished. This may mean that work may take a long time to complete because a worker will only be induced on repushing. Also ensure that requested count is reset after pulls and push mirror sync requests and add some more trace logging to the queue push. Fix go-gitea#18607 Signed-off-by: Andrew Thornton <art27@cantab.net>
* Restart zero worker if there is still work to do (#18658) Backport #18658 It is possible for the zero worker to timeout before all the work is finished. This may mean that work may take a long time to complete because a worker will only be induced on repushing. Also ensure that requested count is reset after pulls and push mirror sync requests and add some more trace logging to the queue push. Fix #18607 Signed-off-by: Andrew Thornton <art27@cantab.net> * Update modules/queue/workerpool.go
After Run:
Wait for it to finish. Shutdown Gitea and delete the /data/queues/common folder. Restart. Gitea is syncing only ~220 mirrors. |
Are you running 1.16-head? |
No. I can't run this on my instance. |
I'm not suggesting that you run 1.17/main - I am simply suggesting that you move the 1.16-dev or 1.16 which tracks the backports and bug fixes that will become 1.16.2 in future in the next week. Are you at least running with |
Powered by Gitea Version: 1.16.1
|
Yes. I restarted repeated flush-queues ... and started the mirror process again. |
It looks better now. And Gitea is syncing the mirrors. |
Just one mirror - if you want to stop all mirroring you'll need to stop the queue worker. |
Thx. I will try this other time. Some update ... After last try:
with manual cron start I could update all mirrors in one row. Good. Looks like this helps. But ... About 16-18 hours (DEFAULT_INTERVAL = 8h) later I wanted see what happens, if I start the cron again. My actual configuration is:
But "nothing" happens. Gite updates ~10 mirrors. |
* Restart zero worker if there is still work to do It is possible for the zero worker to timeout before all the work is finished. This may mean that work may take a long time to complete because a worker will only be induced on repushing. Also ensure that requested count is reset after pulls and push mirror sync requests and add some more trace logging to the queue push. Fix go-gitea#18607 Signed-off-by: Andrew Thornton <art27@cantab.net>
Gitea Version
1.16.0
Git Version
2.25.1
Operating System
Ubuntu 20.04.3
How are you running Gitea?
Precompiled gitea-1.16.0-linux-amd64
Database
PostgreSQL
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Description
I'm hosting a lot of mirrors on my Gitea instance. Some days ago I updated to 1.16.0.
I see -> it started at 04:00 AM, but ...
Last run with Gitea 1.15.x:
cron.update_mirrors needed with Gitea 1.15.x ~3h to update all mirrors. The cron runs fridays at 04:00 AM. And today was the first run.
I saw in gitea that the cron runs -> run count was 1. But the monitoring data shows no activity (RAM usage, CPU usage, Disk usage) between 04:00 and 07:00 AM.
Than I started (4-6x times) the cron in Gitea admin web console. It runs for some seconds. and I saw some activity.
But there is no date for the last run of the cron
After the 4-6 runs the sorting by last update time isn't showing any changes
All the projects with Feb 04 date are created today. But I can't see the updated mirrors when I change the sort order.
Why cron.update_mirrors is not updating all my mirrors? I can't see any errors in the Gitea logs.
Screenshots
No response
The text was updated successfully, but these errors were encountered: