Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git.exe parent process locks index file for child git.exe process #770

Closed
hzshlomi opened this issue Jun 2, 2016 · 34 comments
Closed

git.exe parent process locks index file for child git.exe process #770

hzshlomi opened this issue Jun 2, 2016 · 34 comments

Comments

@hzshlomi
Copy link

hzshlomi commented Jun 2, 2016

I did see similar closed issues

Setup

MINGW64
$ git --version
git version 2.7.4.windows.1
Win7 64 bit

  • What options did you set as part of the installation? Or did you choose the
    defaults?
    pull+rebase

Details

  • Which terminal/shell are you running Git from?
    Bash (MINGW64), no other GUI tool installed
  • What commands did you run to trigger this issue? If you can provide a
    git pull
  • What did you expect to occur after running these commands?
    to be updated against remote
  • What actually happened instead?
    it triggered "Auto packing" and failed (see output message below), while the process holding the lock was .. "git.exe":
    git.exe (pid=34040; "git pull")
    -> git.exe (pid=28552; "git fetch -update-head-ok")
    -> git.exe (pid=12732; "git gc -auto")
    -> git.exe (pid=12732; "git repack -d -l -A --unpack-unreachable=2.weeks.ago")

where 34040 holds "006efeeb01a6252adb97aafb1cbb04232f915794.idx",
and 12732 fails to unlink it.

The output:
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
Counting objects: 309941, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (72671/72671), done.
Writing objects: 100% (309941/309941), done.
Total 309941 (delta 186940), reused 301892 (delta 181069)
Unlink of file '.git/objects/pack/pack-006efeeb01a6252adb97aafb1cbb04232f915794.idx' failed. Should I try again? (y/n) y

@dscho
Copy link
Member

dscho commented Jun 2, 2016

I did see similar closed issues

Which ones?

git version 2.7.4.windows.1

This version was released on March 18th, and there have been no less than four releases since then, each of which with its respective number of fixes.

The smart thing would be to verify that this is still an issue with Git for Windows v2.8.3.

What commands did you run to trigger this issue? If you can provide a
git pull

It is unlikely that this comes as a surprise to you: this does not reproduce the problem here.

@hzshlomi
Copy link
Author

hzshlomi commented Jun 7, 2016

Which ones?

#500, #446, #233
https://github.com/git-for-windows/git/issues?q=Unlink+of+file+failed+is%3Aclosed

What commands

I am really sorry to inform you that the sequence of command used to trigger the issue is just "git pull". I will try harder next time.

To get serious, I thought the info I provided does give a good lead to find the issue. Now I dont mind installing another version, providing logs, or whatever, but I do not feel that you actually followed the information I did provide.

Do you want me to install v2.8.3?
Thanks.

@dscho
Copy link
Member

dscho commented Jun 7, 2016

I am really sorry to inform you that the sequence of command used to trigger the issue is just "git pull"

You do not need to be sorry about this. All it means is that I cannot reproduce this issue here, and therefore I am unlikely to be able to help. So it is me who is sorry.

@bwijen
Copy link

bwijen commented Jun 8, 2016

where 34040 holds "006efeeb01a6252adb97aafb1cbb04232f915794.idx",
and 12732 fails to unlink it.

@dscho: This looks like the same cause as #755 (which I think also holds for the other issues)

On both linux and windows file handles are inherited by child processes,
however deleting an inherited file on linux works, on windows it does not.

Might we extend the solution of #755 to all open calls?
(Opening with O_CLOEXEC / O_NOINHERIT that is.)

That would of course mean, figuring out if we can make git-upload-pack et al. work with this...

@dscho
Copy link
Member

dscho commented Jun 8, 2016

@bwijen but that O_NOINHERIT flag would not close the handle when spawning fetch, would it?

I guess that we need to insert a close_all_packs(); before https://github.com/git-for-windows/git/blob/v2.8.4.windows.1/builtin/pull.c#L851, but we would need a real MCVE to prove that this solves the problem. An MCVE that works on every computer, preferably as a test in Git's own regression test suite.

Which would require some real work... And somehow I wonder whether @hzshlomi will do that.

@hzshlomi
Copy link
Author

hzshlomi commented Jun 8, 2016

@dscho you know, at this point, it actually gets interesting.
You need a "real MCVE"? from me?? you are the code maintainer -> this is a bug you should fix -> why won't you provide one? do you always expect users of your code to write code? in GIT's case that may work but I just dont understand that approach.

look here: http://stackoverflow.com/help/mcve.
""When asking a question about a problem caused by your code, you will get much better answers if you provide code people can use to reproduce the problem""
I think it's you that is expected to provide a "real MCVE", and yet if you really cant find the energy to do that, just define what you really need and I will.

@PhilipOakley
Copy link

@hzshlomi Just to say, 'be careful of what you wish for'.

Given that Git is not designed for Windows, and that without @dscho's continued perseverance, we would not even have what we have now.

Research does show that it's only by using tests first view of Test Driven Development (and in this case, particularly for tricky 'bug' cases) that you actually get any improvement in efficiency.

So this problem need a "Test", which is what an MVCE is. Without an easy way to poke the bug, we won't know if we've found it, or fixed it, nor let it come back via some regression in future change.

So anything you can do to even create a 30 (or however many it takes) line test script that will 'just work' in the SDK would be a start, so then it can be tightened and tightened like a noose around the bug. This may require a bit of google archaeology to bring together all the historic bits. Someone, anyone, need to provide some time into the problem, if they have it. For that is the limited resource. @dscho can do wonders when pointed in the right direction by a good MVCE, but he is a VOLUNTEER.

So if you can volunteer a short test script 'that just runs' easily, then that would be a start. I appreciate your time may be as limited as dscho's but if you need it fixed ...

@hzshlomi
Copy link
Author

hzshlomi commented Jun 8, 2016

@PhilipOakley
Well, that is a totally different tone altogether
Truth be told, I don't "need" this fix as in I cant live without it.
Now I had no clue @dscho is a volunteer, and I do take this chance to say this as explicit as it gets:

@dscho, thank you very much for the wonderful job you guys are doing for the community. I mean every word, no cynicism, not one bit. I really do appreciate every second of your time put into it.

Having that said, the reason I reported is that I know other people that faced the issue and had varying degrees of success walking around the problem (I just use Win, Linux interchangeably).

So, to a lesser degree, I was too giving a very modest contribution, not looking for a free service but donating an input.

While doing so, I do not expect reactions like:

"It is unlikely that this comes as a surprise to you: this does not reproduce the problem here."

That tone is not called for. If you decide to contribute, go ahead. Mocking is not a contribution.
Now, I really just did a "pull" so I wasn't just being lazy collecting information.

I also did and still do am willing to put some time into it, just like I wrote before.

If I knew the code, I would do it, but I don't.

@PhilipOakley Thank you too!

@bwijen
Copy link

bwijen commented Jun 9, 2016

@bwijen but that O_NOINHERIT flag would not close the handle when spawning fetch, would it?

Hmm, you are right...

I guess that we need to insert a close_all_packs(); before https://github.com/git-for-windows/git/blob/v2.8.4.windows.1/builtin/pull.c#L851

@dscho: If that is the case, this needs to be fixed upstream right?

I will try to create a mcve, to see if I'm right about the cause...

@dscho
Copy link
Member

dscho commented Jun 9, 2016

If that is the case, this needs to be fixed upstream right?

@bwijen eventually, yes. The way I prefer to work on Windows-related issues is to get them fixed in Git for Windows first, to be able to verify the fix, and then (leisurely) submit the fixes upstream.

I will try to create a mcve, to see if I'm right about the cause...

Awesome, thank you! (happy)

@hzshlomi
Copy link
Author

@dscho

$ cat mcve-770.sh
#!/bin/bash

git init origin
cd origin
touch test1.txt
git add test1.txt
git commit -m v1
originpath=pwd
cd ..
git clone file://$originpath clone
cd origin
cd ../clone
git config gc.autoPackLimit 1
git pull

$ ./mcve-770.sh
Initialized empty Git repository in C:/Program Files/Git/git-for-windows-issue-770/origin/.git/
[master (root-commit) e9c35dd] v1
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 test1.txt
Cloning into 'clone'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
Checking connectivity... done.
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 3 (delta 0)
Unlink of file '.git/objects/pack/pack-7f065b66383f8af58ac120594ed27ba48fdb98ea.pack' failed. Should I try again? (y/n)

@hzshlomi
Copy link
Author

@dscho

$ git --version
git version 2.8.4.windows.1

$ ./mcve-770.sh
Initialized empty Git repository in C:/Program Files/Git/git-for-windows-issue-770/origin/.git/
[master (root-commit) d484a7a] v1
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 test1.txt
Cloning into 'clone'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
Checking connectivity... done.
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 3 (delta 0)
Unlink of file '.git/objects/pack/pack-fee13c2e332a865c17c4e4204fcab00e7b292894.pack' failed. Should I try again? (y/n)

@hzshlomi
Copy link
Author

@PhilipOakley

MCVE - done.
verified on latest - done.

@bwijen
Copy link

bwijen commented Jun 12, 2016

I cannot reproduce on a Win7 32 bit VM.
I tried both 2.7.4 and 2.8.4:

Initialized empty Git repository in C:/git-sdk-32/test/origin/.git/
[master (root-commit) 532d198] v1
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 test1.txt
Cloning into 'clone'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
Checking connectivity... done.
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 3 (delta 0)
Already up-to-date.

@hzshlomi
Copy link
Author

@bwijen
Win7 64 bit

@bwijen
Copy link

bwijen commented Jun 12, 2016

Win7 64 bit

I will retry tomorrow on a 64bit system

@hzshlomi
Copy link
Author

Tried it on another friend's computer, did NOT reproduce. Will see what's the difference.

@bwijen
Copy link

bwijen commented Jun 14, 2016

Also tried it on a Win7 64-bit system, could NOT reproduce...

@maoueh
Copy link

maoueh commented Jun 18, 2016

@hzshlomi I had an old 2.4.2.windows1 installed and was able to reproduce:

$ ./mcve-770.sh
Initialized empty Git repository in C:/Gnu/tmp/mcve-770/origin/.git/
[master (root-commit) acc2c0a] v1
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 test1.txt
Cloning into 'clone'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
Checking connectivity... done.
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 3 (delta 0)
Unlink of file '.git/objects/pack/pack-961ca715ed599f232a47cbb8cb888aa2b60d1308.pack' failed. Should I try again? (y/n) n
Unlink of file '.git/objects/pack/pack-961ca715ed599f232a47cbb8cb888aa2b60d1308.idx' failed. Should I try again? (y/n) n
Already up-to-date.

Then, I upgraded to latest 2.9.0.windows version and it seems to be fixed in this version for me:

# Removed old clone and origin folder
$ ./mcve-770.sh
Initialized empty Git repository in C:/Gnu/tmp/mcve-770/origin/.git/
[master (root-commit) c512b7d] v1
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 test1.txt
Cloning into 'clone'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
Checking connectivity... done.
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 3 (delta 0)
Already up-to-date.

@hzshlomi Use triple backticks when posting code snippet in comments, this will nicely format them in a code block like above. Furthermore, it will avoid losing the backticks `. Here the reformatted MCVE slighlty modified:

#!/bin/bash

pushd . > /dev/null
git init origin
cd origin
touch test1.txt
git add test1.txt
git commit -m v1
origin_path=`pwd`
popd > /dev/null

pushd . > /dev/null
git clone file://${origin_path} clone
cd clone
git config gc.autoPackLimit 1
git pull
popd > /dev/null

@larsxschneider
Copy link
Member

larsxschneider commented Aug 8, 2016

I just ran into the same issue and I have a reproducible test case here (branch based on Git for Windows 2.9.2):
https://github.com/larsxschneider/git/tree/protocol-filter/unlink-error

I use the latest Git for Windows SDK, I cd into the Git source and run:

make && cd t && ./t0021-conversion.sh -v

This is what I get:

Unlink of file 'test.r' failed. Should I try again? (y/n) warning: unable to unlink test.r: Invalid argument

My machine:

OS Name Microsoft Windows 8.1 Enterprise    
Version 6.3.9600 Build 9600 
System Type x64-based PC    
Processor   Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz, 2661 Mhz, 6 Core(s), 12 Logical Processor(s)   
Processor   Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz, 2661 Mhz, 6 Core(s), 12 Logical Processor(s)   
BIOS Version/Date   Hewlett-Packard 786G4 v03.19, 11.03.2011    
SMBIOS Version  2.6 
Embedded Controller Version 255.255 
BIOS Mode   Legacy  
Secure Boot State   Unsupported 
PCR7 Configuration  Binding Not Possible    
Windows Directory   C:\Windows  
System Directory    C:\Windows\system32 
Boot Device \Device\HarddiskVolume1 
Hardware Abstraction Layer  Version = "6.3.9600.17196"  
Page File   C:\pagefile.sys 
Hyper-V - VM Monitor Mode Extensions    Yes 
Hyper-V - Second Level Address Translation Extensions   Yes 
Hyper-V - Virtualization Enabled in Firmware    Yes 
Hyper-V - Data Execution Protection Yes 

Unfortunately my test is not minimal as it contains a new feature that I am working. However, the feature and the tests run clean on OS X and Linux. Plus my code does not open a the test.r file at all. There is some code that uses fd's but I added something to prove it is not executed:
https://github.com/larsxschneider/git/blob/protocol-filter/unlink-error/convert.c#L666

Can you give me a hint how to debug this further?

@larsxschneider
Copy link
Member

OK. I think I found a workaround. If I kill my child process right before unpack_trees() then it works. I still think there is a problem and I don't fully understand this solution but it works for now 😄

larsxschneider@29faa48

@larsxschneider
Copy link
Member

larsxschneider commented Aug 9, 2016

New theory:

  1. Git does some stuff, opens files etc.
  2. Git finds a file that requires filtering.
  3. Git starts the long running filter process. On Windows this process would inherit all file handles.
  4. Git parent process releases all open file handles but the filter process still keeps them.
  5. Git removes a file that it had opened earlier
  6. 💣

@whoisj
Copy link

whoisj commented Aug 9, 2016

Would something like

---
 compat/mingw.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index d41898e..8f4c4f9 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -1460,8 +1460,13 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen

        wenvblk = make_environment_block(deltaenv);

+       SECURITY_ATTRIBUTES sa;
+       sa.nLength = sizeof(sa);
+       sa.lpSecurityDescriptor = 0;
+       sa.bInheritHandle = 0;
+
        memset(&pi, 0, sizeof(pi));
-       ret = CreateProcessW(*wcmd ? wcmd : NULL, wargs, NULL, NULL, TRUE,
+       ret = CreateProcessW(*wcmd ? wcmd : NULL, wargs, &sa, &sa, FALSE,
                flags, wenvblk, dir ? wdir : NULL, &si, &pi);

        free(wenvblk);
--

resolve the issue?

@sinbad
Copy link

sinbad commented Aug 10, 2016

As I understand it the default behaviour for CreateProcess is that file handles are not inherited (https://msdn.microsoft.com/en-us/library/windows/desktop/ms683463(v=vs.85).aspx); but the above change is worth a shot just to rule that out.

@larsxschneider
Copy link
Member

Thanks @whoisj ! That is definitively a step into the right direction!

@sinbad: Right now all handles are inherited because bInheritHandles is set to TRUE:
ret = CreateProcessW(*wcmd ? wcmd : NULL, wargs, NULL, NULL,TRUE,...

Git Source
CreateProcessW docs

Unfortunately, the snippet above does not work because the process does not even inherit the stdin/stdout handles. I try to enable inheritances for stdin/stdout explicitly but no luck so far:
https://blogs.msdn.microsoft.com/oldnewthing/20111216-00/?p=8873

@larsxschneider
Copy link
Member

In my Git for Windows SDK _WIN32_WINNT is defined as 0x0502 (I think it is set here). Unfortunately, STARTUPINFOEX is only available if _WIN32_WINNT >= 0x0600 (see
@Alexpux's source) and STARTUPINFOEX is required to selectively disable inheritance for handles according to @oldnewthing's blog post above.

Does anyone see an easier approach to this problem?

@larsxschneider
Copy link
Member

larsxschneider commented Aug 10, 2016

OK, different approach. Instead of selectively disabling the inheritance on process creation, I disable inheritance when the file is opened in the parent process (similar to the solution here). This seems to work fine:

 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index db27766..f481dee 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -159,7 +159,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 {
    int match = -1;
-   int fd = open(ce->name, O_RDONLY);
+   int fd = open(ce->name, O_RDONLY | O_CLOEXEC);

    if (fd >= 0) {
        unsigned char sha1[20];

Do you think this would have undesired implications?

@whoisj
Copy link

whoisj commented Aug 10, 2016

I worry about the effects on "internal files", like the index and pack files.

I think this makes perfect sense for worktree files.

@larsxschneider
Copy link
Member

Thanks @whoisj !

Could you review, and if applicable, acknowledge the patch on the mailing list?
http://public-inbox.org/git/20160810130411.12419-16-larsxschneider%40gmail.com/

@whoisj
Copy link

whoisj commented Aug 11, 2016

Could you review, and if applicable, acknowledge the patch on the mailing list?
http://public-inbox.org/git/20160810130411.12419-16-larsxschneider%40gmail.com/

I'll review the patch manually ASAP, but I am not setup, currently, to do anything with the mailing list. (looks at Exchange with a bitterness)

@whoisj
Copy link

whoisj commented Aug 12, 2016

@larsxschneider I've sorry to report that I was not able to get to the review today, and I'm leaving on vacation in just a few hours. I highly doubt the family would be very appreciative of my spending time reviewing code on vacation, but I'll attempt to find some time.

At worse, I'll return in a week and find time them. Again, apologies for the delay.

@larsxschneider
Copy link
Member

@whoisj No worries at all! Enjoy your vacation and please don't look at the code before you return 😎

@dscho
Copy link
Member

dscho commented Oct 4, 2016

The fix was already included in the latest Git for Windows version. Closing.

@dscho dscho closed this as completed Oct 4, 2016
@larsxschneider
Copy link
Member

@dscho I just send out a fix to the Git mailing list: http://public-inbox.org/git/20161024180300.52359-1-larsxschneider@gmail.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants