Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Git clone hangs with large repo #969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Abbyyan opened this issue Dec 19, 2019 · 3 comments
Closed

Git clone hangs with large repo #969

Abbyyan opened this issue Dec 19, 2019 · 3 comments
Labels

Comments

@Abbyyan
Copy link

Abbyyan commented Dec 19, 2019

I've used Gitpython to clone some repos but it hangs with a specified repo with size of 17G. I've create a Pool to do git clone using Gitpython. There is a large git repo and needs more time than others to clone. Each process do a clone work for one repo. The Pool i used as follows:

  multi_res = [p.apply_async(runfunc, args=(
            incl_info, project_root, skip_dirs,)) for incl_info in incl_infos]
    LogInfo('Waiting for all subprocesses done...')
    for i in range(len(incl_infos)):
        while not multi_res[i].ready():
            LogInfo("Downloading now")
            time.sleep(5)
    p.close()
    p.join()

It works perfectly in most case. But will often hangs in the largest repo. It's wired that when i just clone the repo individually, It works fine. So i wonder if there is some block in python multiprocessing Pool at first.

I've strace the hanged git clone process . The git process output as follows:

Process 27649 attached
read(6, 0x7ffc36dae050, 4)              = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL, si_value={int=2895997, ptr=0x2c307d}} ---
rt_sigreturn()                          = 0
read(6, 0x7ffc36dae050, 4)              = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL, si_value={int=2895997, ptr=0x2c307d}} ---
rt_sigreturn()                          = 0
read(6, 0x7ffc36dae050, 4)              = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL, si_value={int=2895997, ptr=0x2c307d}} ---
rt_sigreturn()                          = 0
read(6, 0x7ffc36dae050, 4)              = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL, si_value={int=2895997, ptr=0x2c307d}} ---
rt_sigreturn()   

The git-lfs output as follows:

Process 28006 attached
[ Process PID=28006 runs in 32 bit mode. ]
futex(0x88b982c, FUTEX_WAIT_PRIVATE, 0, NULL

But when i replace the git.repo.clone_from with shell script git clone in a new subprocess, it works fine. So maybe there are some block in git.repo.clone_from, and i wonder whether it's solved. Thanks a lot.

@Byron
Copy link
Member

Byron commented Dec 29, 2019

Thanks for the detailed investigation! If memory serves, the way GitPython handles progress reporting on long-running clones can be prone to hanging. Even though it was thought to be fixed, apparently there is still a chance of it failing.

The workaround proposed here is certainly preferred over using GitPython at all, since it's doing what's needed much more directly. GitPython in the end just spawns a git process itself and fails to properly handle it's output on long-running process.

I am closing this issue as I don't think the underlying cause can clearly be determined or fixed, and due to the presence of a viable workaround. Please feel free to keep commenting here in case you would propose a different way of handling this - your opinion would be greatly appreciated.

@Byron Byron closed this as completed Dec 29, 2019
@Abbyyan
Copy link
Author

Abbyyan commented Jan 7, 2020

Thanks a lot. I enter the same problem today using checkout function and I wonder if there is a timeout when Gitpython start a new subprocess?

@Byron
Copy link
Member

Byron commented Feb 8, 2020

Unfortunately no, GitPython is inherently synchronous and blocking. Everyone is invited to have a look at the code, communicating to a subprocess should not ever hang :/.

@Byron Byron reopened this Feb 26, 2021
@Byron Byron closed this as completed Feb 26, 2021
@gitpython-developers gitpython-developers locked and limited conversation to collaborators Feb 26, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Development

No branches or pull requests

2 participants