-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on git-sync running as a CronJon #439
Comments
It looks like `git gc` was running and terminated early (e.g. your pod
died) and since you user persistent storage, it's "corrupt". I am trying
to reproduce this so I can figure out what to do, but not having any luck.
You can't send me a tarfile of your repo in this state, can you?
…On Sun, Aug 8, 2021 at 11:53 PM michal-jagiello-tmpl < ***@***.***> wrote:
Hi,
I'm using git-sync v3.3.4 in a CrobJob as an initContainer. It's a
definition:
- name: clone-results-repo
image: "{{ .Values.init.cloneRepo.image.repository }}:{{ .Values.init.cloneRepo.image.tag }}"
imagePullPolicy: {{ .Values.init.cloneRepo.image.pullPolicy }}
volumeMounts:
- name: persistent-storage
mountPath: /git
env:
- name: GIT_SYNC_REPO
value: {{ .Values.git.url }}
- name: GIT_SYNC_ONE_TIME
value: "true"
- name: GIT_SYNC_BRANCH
value: my_awesome_branch
- name: GIT_SYNC_DEPTH
value: "1"
- name: GIT_SYNC_USERNAME
valueFrom:
secretKeyRef:
name: {{ include "my-awesome-app.fullname" . }}-git-credentials
key: GIT_PULL_USERNAME
- name: GIT_SYNC_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "my-awesome-app.fullname" . }}-git-credentials
key: GIT_PULL_PASSWORD
- name: GIT_SYNC_ROOT
value: /git
- name: GIT_SYNC_TIMEOUT
value: "99999"
There is another container which also mounts persistent-storage volume
and consumes the data from the cloned repo. The issue is that after few
succeeded executions I have always the same error:
INFO: detected pid 1, running init handler
I0809 06:29:23.648815 11 main.go:507] "level"=0 "msg"="starting up" "pid"=11 "args"=["/git-sync"]
I0809 06:29:24.077876 11 main.go:1003] "level"=0 "msg"="update required" "rev"="HEAD" "local"="1172cc4eeed3a3dd6d5e8fb65f3c15134adf9f32" "remote"="bfa07ea5354c25fa7e267dbcb6bbb305f2bd315f"
I0809 06:29:24.077969 11 main.go:690] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="bfa07ea5354c25fa7e267dbcb6bbb305f2bd315f"
E0809 06:29:24.702409 11 main.go:172] "msg"="too many failures, aborting" "error"="Run(git gc --prune=all): exit status 128: { stdout: "", stderr: "fatal: gc is already running on machine 'my-awesome-app-1628460000-pr7v8' pid 49 (use --force if not)\n" }" "failCount"=0
and pod my-awesome-app-1628460000-pr7v8 does not exist anymore.
Repository has c.a. 7GB of data (if it matters).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#439>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVHN2MBDEQNNAVEBU5DT353P7ANCNFSM5BZMTLUA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
@thockin no, unfortunately not :| is that |
I'm not really a git expert, so that's what I want to figure out :(
…On Tue, Aug 10, 2021, 4:05 AM michal-jagiello-tmpl ***@***.***> wrote:
@thockin <https://github.com/thockin> no, unfortunately not :| is that git
gc runs asyncronously? Can that process be somehow interrupted?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#439 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVFSSDXXZDPTFXHYHI3T4EBYTANCNFSM5BZMTLUA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
I mean, I could just add a |
You could try Or we could catch this specific case ("already running") and not treat it as fatal. Without a repro, it's scary. |
I've run that cron with
but there is in doc:
I see that you call |
As far as I can see, that error only happens if a prune is running in
background. That can happen, I think, if the repo hits some "dirty"
metric, which is possible on a shared volume. I am trying to figure out if
there is REALLY a prune running, or just some state left on disk.
Hence why a repro would be helpful. :)
…On Wed, Aug 11, 2021, 12:18 AM michal-jagiello-tmpl < ***@***.***> wrote:
I've run that cron with
env:
....
- name: GIT_SYNC_GIT_CONFIG
value: "gc.autoDetach:false"
but there is in doc
<https://git-scm.com/docs/git-gc#Documentation/git-gc.txt-gcautoDetach>:
gc.autoDetach
Make git gc --auto return immediately and run in background if the system supports it. Default is true.
I see that you call git gc prune --all here
<https://github.com/kubernetes/git-sync/blob/259f7d80007b0f2342756b359c3a7f3a80e99348/cmd/git-sync/main.go#L712>
.
Maybe the solution could be to add --disable-git-gc flag if user is
absolutely sure what they doing and I could care myself about that. I'll
run git gc by myself once a day and not every few hours?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#439 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVHJBH377S47KUG3CSLT4IP35ANCNFSM5BZMTLUA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Over the past few days we started to experience such issue! Logs: + playbook-slackbot-deployment-69756b6699-sc5c4 › playbook-sync
playbook-slackbot-deployment-69756b6699-sc5c4 playbook-sync 2021-10-19T13:38:03.158022829+02:00 INFO: detected pid 1, running init handler
playbook-slackbot-deployment-69756b6699-sc5c4 playbook-sync 2021-10-19T13:38:03.197820105+02:00 I1019 11:38:03.197508 11 main.go:507] "level"=0 "msg"="starting up" "pid"=11 "args"=["/git-sync"]
playbook-slackbot-deployment-69756b6699-sc5c4 playbook-sync 2021-10-19T13:38:03.197887375+02:00 I1019 11:38:03.197689 11 main.go:860] "level"=0 "msg"="cloning repo" "origin"="https://github.com/<org>/<repo>.git" "path"="/git"
playbook-slackbot-deployment-69756b6699-sc5c4 playbook-sync 2021-10-19T13:39:47.561229307+02:00 I1019 11:39:47.560916 11 main.go:690] "level"=0 "msg"="syncing git" "rev"="HEAD" "hash"="dd21cb48350c2354a4b36ad535173ff962e75fad"
playbook-slackbot-deployment-69756b6699-sc5c4 playbook-sync 2021-10-19T13:42:26.640201065+02:00 E1019 11:42:26.639933 11 main.go:172] "msg"="too many failures, aborting" "error"="Run(git gc --prune=all): context deadline exceeded: { stdout: "", stderr: "" }" "failCount"=0
- playbook-slackbot-deployment-69756b6699-sc5c4 › playbook-sync Config: - name: playbook-sync
image: k8s.gcr.io/git-sync/git-sync:v3.3.4
env:
- name: GIT_SYNC_USERNAME
value: "user"
- name: GIT_SYNC_PASSWORD
valueFrom:
secretKeyRef:
name: playbooks-bot-tokens
key: github
- name: GIT_SYNC_ROOT
value: "/git"
- name: GIT_SYNC_REPO
value: "https://github.com/<org>/<repo>.git"
- name: GIT_SYNC_BRANCH
value: "master"
volumeMounts:
- name: playbooks-shared-data
mountPath: /git |
Increasing the |
When I get some time I will try to force a repro case.
…On Tue, Oct 19, 2021 at 9:30 AM Christopher Cutajar < ***@***.***> wrote:
Increasing the GIT_SYNC_TIMEOUT to 300 from the default 120 seems to have
helped in resolving the issue
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#439 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVE45UF2UXBOHYP5NQTUHWMJPANCNFSM5BZMTLUA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I also just hit this - it's definitely a case where the garbage collection was interrupted while running on persistent storage. You can repro by kicking off a GC and then killing the pod before the GC finishes. Manually running |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
So there seems to be a few issues with GC
At least #3 I was able to force repro, and git seems smart enough to realize that the remembered PID is dead, so not an issue. To fix #2 we can set autoDetach to false. That converts #2 into #1 We probably want to use |
Also we should probably set prunExpire to something other than "all" or "now" (e.g. 1.hour.ago) |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
So, the v4 branch has a LOT of changes around GC. I can't figure out how to force this situation to happen now. I'm going to close this and if someone can make it happen again (once I cut a v4, that is) then we can re-examine. |
Hi,
I'm using git-sync
v3.3.4
in a CrobJob as an initContainer. It's a definition:There is another container which also mounts
persistent-storage
volume and consumes the data from the cloned repo. The issue is that after few succeeded executions I have always the same error:and pod
my-awesome-app-1628460000-pr7v8
does not exist anymore.Repository has c.a. 7GB of data (if it matters).
The text was updated successfully, but these errors were encountered: