Skip to content

remote: add keep-alive and references to the repository #387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 4, 2017

Conversation

carlosmn
Copy link
Member

@carlosmn carlosmn commented Jul 4, 2017

Especially in 1.8, the garbage collector can decide to finalize an object even
as we are in one of its methods. This means it can free a remote while we're in
one of its calls, as we're referencing the pointer inside the object, rather
than the Remote itself.

carlosmn added 2 commits July 4, 2017 12:53
Especially in 1.8, the garbage collector can decide to finalize an object even
as we are in one of its methods. This means it can free a remote while we're in
one of its calls, as we're referencing the pointer inside the object, rather
than the `Remote` itself.
We need to use `runtime.KeepAlive()` which only exists past Go 1.7. Furthermore,
Go 1.7 is the latest supported by the language team.
@carlosmn carlosmn merged commit c71c935 into master Jul 4, 2017
@carlosmn carlosmn deleted the cmn/remote-refs branch July 4, 2017 12:09
navytux added a commit to navytux/git-backup that referenced this pull request Feb 14, 2025
Alain reports that lab.nexedi.com backup restoration sometimes fails with error like

    ...
    # file gitlab/misc -> .../srv/backup/backup-gitlab.git/gitlab-backup.Pj0fpp/gitlab_backup/db/database.pgdump/7630.dat/7630.dat.ry main.cmd_restore: main.blob_to_file: write .../srv/backup/backup-gitlab.git/gitlab-backup.Pj0fpp/gitlab_backup/db/database.pgdump/7630.dat/7630.dat.ry: bad address

which means that write system call invoked by writefile at tail of blob_to_file returned EFAULT.

The blob_to_file function is organized approximately as this:

    blob_to_file(blob_sha1, path) {
        blob = ReadObject(blob_sha1, git.ObjectBlob)
        blob_content = blob.Data()
        writefile(path, blob_content)
    }

and getting EFAULT inside writefile means that blob_content points to
some unmapped memory.

How that could be?

The answer is that blob.Data(), as implemented by git2go, returns []byte
that points to Cgo memory owned by blob object, and the blob object has
finalizer that frees that memory, which sometimes leads to libc
allocator to also return freed region completely back to OS by doing
munmap:

    https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L345-L359
    https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L162-L177
    https://github.com/libgit2/git2go/blob/v31.7.9-0-gcbca5b8/odb.go#L322-L325

and if that happens we see the EFAULT, but if no munmap happens we can
be saving corrupt data to restored file.

The OdbObject.Data even has comment about that - that one needs to keep
the object alive until retrieved data is used:

    // Data returns a slice pointing to the unmanaged object memory. You must make
    // sure the object is referenced for at least as long as the slice is used.
    func (object *OdbObject) Data() (data []byte) {

but this comment was added in 2017 in libgit2/git2go@55a109614151
as part of libgit2/git2go#393 while doing
"KeepAlive all the things" to fix segmentation faults and other
misbehaviours.

I missed all that because we switched blob_to_file from `git cat-file`
to git2go in 2016 in fbd72c0 (Switch file_to_blob() and blob_to_file()
to work without spawning Git subprocesses) and we never actively worked
on that part of code anymore. For the reference the git2go introduction
to git-backup happened on that same day in 2016 in 624393d (Hook in
git2go  (cgo bindings to libgit2)).

The problem of memory corruption inside blob_to_file can be reliably
reproduced via injecting the following patch

    blob_to_file(blob_sha1, path) {
        blob = ReadObject(blob_sha1, git.ObjectBlob)
        blob_content = blob.Data()
   +    runtime.GC()
        writefile(path, blob_content)
    }

which leads to e.g. the following test failure at every test run:

    === RUN   TestPullRestore
    ...
    # file b1	-> /tmp/t-git-backup2575257088/1/symlink.file
        git-backup_test.go:109: git-backup_test.go:297: lab.nexedi.com/kirr/git-backup.cmd_restore: lab.nexedi.com/kirr/git-backup.blob_to_file: symlink ^D<80><8c>þ�^@^@^2h space + α /tmp/t-git-backup2575257088/1/symlink.file: invalid argument

and the memory corruption can be fixed reliably by adding proper
runtime.KeepAlive so that the blob object assuredly stays alive during
writefile call:

    blob_to_file(blob_sha1, path) {
        blob = ReadObject(blob_sha1, git.ObjectBlob)
        blob_content = blob.Data()
        writefile(path, blob_content)
   +    runtime.KeepAlive(blob)
    }

However going through git2go code it could be seen that it is full of
Go <-> C interactions and given that there is a track records of catching
many crashes due to not getting lifetime management right (see
e.g. libgit2/git2go#352, libgit2/git2go#334,
libgit2/git2go#553, libgit2/git2go#513,
libgit2/git2go#373, libgit2/git2go#387
and once again libgit2/git2go#393) there is no
guarantee that no any other similar issue is there anywhere else
besides OdbObject.Data().

With that we either need to put a lot of runtime.KeepAlive after every
interaction with git2go, and put it properly, switch back to `git
cat-file` and similar things reverting fbd72c0 and friends, or do
something else.

As fbd72c0 explains switching back to `git cat-file` will slowdown
files restoration by an order of magnitude. Putting runtime.KeepAlive is
also not practical because it is hard to see all the places where we
interact with git2go, even indirectly, and so it is easy to make mistakes.

-> Thus let's keep the code that interacts with git2go well localized
   (done by previous patch), and let's make a copy over every string or
   []byte object we receive from git2go with adding careful
   runtime.KeepAlive post after that.

This fixes the problem of blob_to_file data corruption and it should fix
all other potential memory corruption problems we might ever have with
git2go due to potentially improper usage on git-backup side.

The copy cost is smaller compared to the cost of either spawning e.g. `git
cat-file` for every object, or interacting with `git cat-file --batch`
server spawned once, but still spending context switches on every request
and still making the copy on socket or pipe transfer. But most of all the
copy cost is negligible to the cost of catching hard to reproduce crashes or
data corruptions in the production environment.

For the reference the time it takes to restore "files" part of
lab.nexedi.com backup was ~ 1m51s before this patch, and became ~ 1m55s
after this patch indicating ~ 3.5% slowdown for that part. Which could be
said as noticeable but not big, and since most of the time is spent
during git pack restoration, taking much more time than files, those
several seconds of slowdown become completely negligible.

/reported-by @alain.takoudjou, @tomo
/reported-at https://www.erp5.com/group_section/forum/Gitlab-backup-zDVMZqaMAK/view?list_start=15&reset=1#2074747282
/cc @jerome, @rafael
/reviewed-on https://lab.nexedi.com/kirr/git-backup/-/merge_requests/12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant