Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use env as interpreter for all exe and scripts. #3869

Closed
wants to merge 1 commit into from

Conversation

Berrysoft
Copy link

@Berrysoft Berrysoft commented May 26, 2022

Git uses a lot scripts which requires bash, perl, and python. The interpreter is specified by shebang. However, Windows don't know about shebang, and CreateProcess could not spawn a script file.

Therefore, git chose to parse the shebang and find the interpreter itself. The interpreter is found in PATH. It is reasonable because git is a Windows native program, and doesn't know anything about POSIX path. However, the method is not that perfect.

Git requires a MSYS2 perl. However, MSYS2 also provides a MinGW perl. If both perl are installed (for example, texlive requires a MinGW perl), git will mistakenly find the MinGW perl, and scripts require perl (e.g. git-send-email) will raise errors. The shebang is right #!/usr/bin/perl, but actually git only recognizes perl and finds the first one in the PATH.

Luckily there's a program in MSYS2 called env. It is commonly used in *NIX systems to find the right environment. The MSYS2 env could not only parse the shebang, but also recognize all kinds of path: POSIX path, Windows path, and even mixed path (C:/msys64/usr/bin\less.exe). Technically, it could be the interpreter for all executables and scripts.

This pull request removes the naïve parser of shebang, and just finds env.exe as the interpreter. It should be in the PATH in MSYS2 environment. I don't open this one to the upstream git/git because it assumes MSYS2 env.

Git requires a MSYS2 perl. However, there are many distributions
of perl. If another perl is in the PATH in front of the MSYS2 one,
git will mistakenly find it. The shebang is right #!/usr/bin/perl,
but actually git only recognizes perl and finds the first one in
the PATH.

This pull request removes the naïve parser of shebang, and just finds
`env.exe` as the interpreter. It should be in the `PATH` in MSYS2.

Signed-off-by: Yuyi Wang <Strawberry_Str@hotmail.com>
@dscho
Copy link
Member

dscho commented May 26, 2022

Hmm.

While I understand the motivation, the performance penalty of adding yet another instance of the MSYS2 runtime will add to the slow performance we unfortunately already experience on Windows. Granted, this is only for scripts, but hooks are also run as scripts and running them through env will result in the MSYS2 runtime to be spun up twice: once for env, once for perl/bash. Likewise, aliases. And the few remaining scripted commands of Git. It might not sound like much, but it accumulates (remember that scripts usually call git multiple times).

Besides, it is only an implementation detail that MinGit includes a working env.exe. To be honest, I do not recall why we included it, the reason might have vanished and it will only be a matter of time when we exclude it altogether.

My favored solution to the problem would be to fix git-send-email.perl so that it works with regular Windows versions of Perl, and so that it stops pretending that all the world looks like Unix paths.

@Berrysoft
Copy link
Author

Yes, I agree that it is a performance regression. I've also considered a solution that, only pass to env when there is a shebang. However, most common commands of git are built-in. Many scripts uses, for example, git-status or git-show, and they won't execute other processes. Commands like git-send-email won't be executed many times in practical use cases. Some others may be, I have to admit.

I don't know why env.exe is in the distribution of MinGit either. But anyway it is there, and it's only ~50KB, while bash.exe is ~2.5MB. Maybe env.exe is there for some other scripts, for example, jeprof from mingw-w64-jemalloc in MinGit. As I've said, that's why I opened the PR here but not in upstream. If you don't like this solution, we could consider to fallback to the original method when env.exe is not in PATH.

I don't know well about perl and the scripts. Maybe it is practical to make them work with MinGW perl, but I don't know how. It seems to me that this pull request makes all things work successfully with a little change. It is a little radical, but I can add a fallback if you want. However, if you insist to modify the scripts rather than forwarding to env, it would also be great, and I'll wait for your good news:)

@rimrul
Copy link
Member

rimrul commented May 30, 2022

The shebang is right #!/usr/bin/perl, but actually git only recognizes perl and finds the first one in the PATH.

Why do we perform a PATH lookup when we have an absolute unix path here? Do we fail to recognize /usr/bin/perl.exe as /usr/bin/perl? Or do we not know how to convert /usr/bin/perl to a windows path?

I feel like we should recognize that as an absolute path and that the appropriate executable exists in that path, before trying to look for alternatives in PATH.

@Berrysoft
Copy link
Author

Why do we perform a PATH lookup when we have an absolute unix path here?

Because git-for-windows is a Windows native executable, and doesn't know anything about UNIX path. UNIX path should be handled by MSYS2 executables. env is an MSYS2 executable here.

I've just come across another solution: grap some part of MSYS2 code related to UNIX path handling and merge into (or static/dynamic link to) git. The root of MSYS2 could be hardcoded ../../ or specified by environmental variables.

@dscho
Copy link
Member

dscho commented May 30, 2022

The shebang is right #!/usr/bin/perl, but actually git only recognizes perl and finds the first one in the PATH.

Why do we perform a PATH lookup when we have an absolute unix path here?

Historical reasons, as Msys had its perl.exe in /bin.

It seems to me that this pull request makes all things work successfully with a little change.

Letting all Git commands exit with code 123 would also be a little change. Just because some PR introduces a little change does not make it correct. The performance regression this PR introduces makes it incorrect.

Maybe it is practical to make them work with MinGW perl, but I don't know how.

Maybe we can start by showing the symptoms how MINGW perl fails to execute the command?

@Berrysoft
Copy link
Author

The performance regression this PR introduces makes it incorrect.

Well, I agree it is a performance regression in some cases, but I don't agree that it is incorrect. It doesn't produce incorrect results, on the contrary, it finds correct interpreter and makes subcommands work correctly. Anyway, I'm now using git with this patch and I don't think the performance regression is remarkable.

Maybe we can start by showing the symptoms how MINGW perl fails to execute the command?

I need to have another check then. You can try yourself with git send-email and MinGW perl. I found it could not read the config correctly, thus could not send the email.

I'd like to remind that not only the perl script may fail when using different interpreter, but maybe the python script etc. as well. You cannot deal with all cases, and it will be a lot of patches. Some third-party scripts may also face problems and you cannot fix all of them.

If git can find the MinGW perl instead of MSYS2 one, it may also find the cygwin one, or the one in the Program Files. The same applies to python. It may find the MinGW one, cygwin one, official one, or the stub to the Microsoft Store (which will silently fail if it is not installed). All these complicated cases start from the day when git tried to parse shebang on Windows rather than forwarding them to a POSIX environment.

The calling of env.exe will of course brings a performance regression, if you continuously call non-builtin commands like git repack or git-send-email. But this solution can handle most of cases except cygwin. The env.exe may come from cygwin, and all interpreters will be the cygwin ones. Actually the cygwin env.exe works well in most cases without perl, because the cygwin perl cannot find the git perl libraries. If that happens, I think it's the user's duty to make PATH tidy.

@rimrul
Copy link
Member

rimrul commented May 30, 2022

Some third-party scripts may also face problems and you cannot fix all of them.

Neither can env

If git can find the MinGW perl instead of MSYS2 one, it may also find the cygwin one, or the one in the Program Files.

No. The MinGW one here is special in that It'll be on the PATH before the Msys2 one, when installing Git for Windows inside Msys2 proper. All the others can't.

The same applies to python. It may find the MinGW one, cygwin one, official one, or the stub to the Microsoft Store (which will silently fail if it is not installed).

All of those except the stub are equally fine to us.
(As long as we don't have a result of #3661)

All these complicated cases start from the day when git tried to parse shebang on Windows rather than forwarding them to a POSIX environment.

By extension of that logic, Git on Windows should just do the proper Windows thing and treat only file extensions in PATHEXT as executable. Not executing scripts with shebangs would also kinda solve all shebang issues.

The env.exe may come from cygwin, and all interpreters will be the cygwin ones.

At least currently it can't. It could potentially in the future with MinGit, but that would mean MinGit is basically broken if you don't have Msys2 or Cygwin installed. So that's probably not a good Idea.

@Berrysoft
Copy link
Author

Neither can env

env can do the right thing if the shebang is right. If the shebang specifies /usr/bin/env python3 and a MinGW python is not working, it will really be an issue env cannot solve. CMakeLists.txt of git simply replaces them to /usr/bin/python. It is not always right, though.

I agree that env cannot solve the performance regression, but haven't seen issues it introduces besides that.

The MinGW one here is special in that It'll be on the PATH before the Msys2 one

Other PATH entries may be before the MinGW one.

All of those except the stub are equally fine to us.

I'm not rather sure about that, because the scripts may not deal with paths and OS specific things well.

Git on Windows should just do the proper Windows thing and treat only file extensions in PATHEXT as executable.

Many subcommands are scripts and the filename doesn't end with an extension. Windows cannot know which interpreter is correct. If you add an extension, the subcommands will become git send-email.perl or git filter-repo.py. All these attempts needs much more (and maybe dirty) logic.

Not executing scripts with shebangs would also kinda solve all shebang issues.

Yes, not executing git would solve all git issues.

The env.exe may come from cygwin, and all interpreters will be the cygwin ones.

At least currently it can't. It could potentially in the future with MinGit, but that would mean MinGit is basically broken if you don't have Msys2 or Cygwin installed. So that's probably not a good Idea.

No, I'm not suggesting to include cygwin into MinGit instead of MSYS2. It's just talking about possible cases. I understand why there's MSYS2 in MinGit.

@Berrysoft
Copy link
Author

By extension of that logic...

Well, I actually didn't understand this extension. Of course a Windows native executable cannot handle POSIX path well. Let's forward things to professionals. Windows affair on Win32, and POSIX affair inside POSIX-compatible environment (MSYS2, or cygwin, or even WSL).

@rimrul
Copy link
Member

rimrul commented May 30, 2022

Neither can env

env can do the right thing if the shebang is right. If the shebang specifies /usr/bin/env python3 and a MinGW python is not working, it will really be an issue env cannot solve. CMakeLists.txt of git simply replaces them to /usr/bin/python. It is not always right, though.

Oh, yes. The current code would just use env (without arguments) as the parser for the script otherwise. I can see how that wouldn't be ideal.

The MinGW one here is special in that It'll be on the PATH before the Msys2 one

Other PATH entries may be before the MinGW one.

No, we already prevent that.

All of those except the stub are equally fine to us.

I'm not rather sure about that, because the scripts may not deal with paths and OS specific things well.

Sure, but we have no way to tell what a user script expects in that regard. All of them are equally as likely to be the correct python for this specific script.

Git on Windows should just do the proper Windows thing and treat only file extensions in PATHEXT as executable.

Many subcommands are scripts and the filename doesn't end with an extension. Windows cannot know which interpreter is correct. If you add an extension, the subcommands will become git send-email.perl or git filter-repo.py. All these attempts needs much more (and maybe dirty) logic.

I'm fully aware. I wasn't suggesting to do that. I was exaggerating this "native Windows applications shouldn't handle POSIX things" thought. The whole concept that shebangs have meanings (and files with a shebang need some kind of special treatment) is as foreign to Windows as Unix paths.

The env.exe may come from cygwin, and all interpreters will be the cygwin ones.

At least currently it can't. It could potentially in the future with MinGit, but that would mean MinGit is basically broken if you don't have Msys2 or Cygwin installed. So that's probably not a good Idea.

No, I'm not suggesting to include cygwin into MinGit instead of MSYS2. It's just talking about possible cases. I understand why there's MSYS2 in MinGit.

No, what I meant was, that we'll always find the Msys2 env.exe before we would find a Cygwin env.exe, unless we were to exclude it from MinGit as dscho alluded to above. And in that hypothetical case a MinGit with this patch, which could indeed find a Cygwin env.exe would be broken on its own.

@Berrysoft
Copy link
Author

No, we already prevent that.

Well, I've never noticed this tool...and never used it before. It seems a wrapper shipped with MinGit.

we'll always find the Msys2 env.exe before we would find a Cygwin env.exe

Yes, with git-wrapper. According to the wiki, it is called when using git-bash, cmd/git and bash. It is reasonable if the PATH is set to cmd and a user only uses the cmd/git.

Well, you convinced me that this PR is not a must for git-for-windows distributions, and even not a bug fix. However, I'm not using the specific distribution, but MSYS2 with a MinGW git. As I commented at first, I installed a MinGW perl in the same path as git. git-wrapper could not deal with this case, and I didn't even compiled a git-wrapper. I'd like to make git work correctly in this occasion. If the patch is not accepted, that's still OK. Anyway I'll patch the code for personal use.

@dscho
Copy link
Member

dscho commented May 30, 2022

I'm now using git with this patch and I don't think the performance regression is remarkable.

That's not a good measure of the performance impact. I doubt you're working on a project the size of, say, Microsoft Office, let alone with dozens of custom scripted commands or hooks.

"Works on my computer" is rightfully a frowned-upon reaction in our industry.

[the Git wrapper] seems a wrapper shipped with MinGit.

The Git wrapper is installed into /cmd/git.exe, and since /cmd/ is what gets added to the PATH, this is how you would call git.exe in CMD or PowerShell, i.e. when not in Git Bash.

But I fear that we're spending a lot of time on arguing about an approach that will not make it into Git for Windows, and that time may very well be better spent on diagnosing how the Git commands that are implemented in Perl fail, actually.

You may find that #1604 may have addressed at least some of these issues already, but in the end the progress on that PR stalled. Maybe you can revive it because you have a vested interest in getting this to work?

@Berrysoft
Copy link
Author

Berrysoft commented May 30, 2022

That's not a good measure of the performance impact.

I'd like to see more professional benchmarks, too.

we're spending a lot of time on arguing about an approach that will not make it into Git for Windows

I'm sorry about that. It would be nice to make git more workable with MSYS2 environment. I personally don't like duplicate runtimes in one system, for example, two MSYS2 runtimes, and many chromiums with electrons.

Maybe you can revive it because you have a vested interest in getting this to work?

Yes, maybe. I'm not good at perl though. I'll try looking into it, but may not succeed.

@Berrysoft Berrysoft closed this May 30, 2022
@dscho
Copy link
Member

dscho commented May 30, 2022

I'd like to see more professional benchmarks, too.

I don't have any current ones, and then only ones whose output I am not at liberty to share verbatim, just the conclusion.

We did have reports in the past (but I could not find them in a brief search) that git difftool, back when it was still a Perl script, took ages to start up.

I'm not good at perl though

Nobody is. But together we can address your issue, I am certain of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants