Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secret scan pre-commit crashing on Windows when doing big merges #1032

Closed
mherzberg opened this issue Dec 13, 2024 · 2 comments
Closed

secret scan pre-commit crashing on Windows when doing big merges #1032

mherzberg opened this issue Dec 13, 2024 · 2 comments
Labels
status:new This issue needs to be reviewed type:bug Something isn't working

Comments

@mherzberg
Copy link

Environment

  • ggshield version: 1.34.0
  • Operating system (Linux, macOS, Windows): Windows
  • Operating system version: Windows 11 Pro 23H2
  • Python version: 3.12

Describe the bug

When the user is about to commit a merge with many files and long file names, ggshield secret scan pre-commit crashes with [WinError 206] The filename or extension is too long. This is because ggshield is building and executing a git command that can be longer then the maximum path length on Windows, which is approximately 32,767 with "long paths" enabled.

Steps to reproduce:

You can use the following bash script in a Git shell on Windows to reproduce the error:

#!/usr/bin/env bash

cd $(mktemp -d)
git init

function commit_files() {
    for i in $(seq -f "%0200g" 200)
    do
        echo $1 > $i.txt
    done
    git add .
    git commit -m "msg" --no-verify
}

git checkout -b a
commit_files x

git checkout -b b
commit_files foo

git checkout a
commit_files bar

git merge b
git add .

ggshield secret scan pre-commit --verbose --debug

This script creates 200 files with file names that are a bit longer than 200 characters each, creates some changes for each of them across two branches, and finally merges the 200 changes from both branches. The kind of merge within this script is a bit artifical but demonstrates the issue that we observed during real merges.

Actual result:

Traceback (most recent call last):
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\cmd\utils\common_decorators.py", line 18, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\cmd\secret\scan\precommit.py", line 85, in precommit_cmd
    commit = Commit.from_merge(ctx_obj.exclusion_regexes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\core\scan\commit.py", line 87, in from_merge
    shas_in_merge_branch = dict(
                           ^^^^^
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\core\scan\commit_utils.py", line 370, in get_file_sha_in_ref
    output = git(["ls-tree", "-z", ref] + files, cwd=cwd)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\pipx\venvs\ggshield\Lib\site-packages\ggshield\utils\git_shell.py", line 206, in git
    result = subprocess.run(
             ^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python312\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\User\AppData\Local\Programs\Python\Python312\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\User\AppData\Local\Programs\Python\Python312\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [WinError 206] The filename or extension is too long
                    20408:6608 [D] ggshield.__main__:56 scan exit_code=128

The full debug log can be found here: error.log

Expected result:

ggshield scanning for secrets and not crashing.

Possible solutions:

One could start breaking the single ls-tree command into multiple ones once the command grows too long (depending on platform and "long path" configuration). This might apply for other subprocess executions, too.

@mherzberg mherzberg added status:new This issue needs to be reviewed type:bug Something isn't working labels Dec 13, 2024
@mathieubellon
Copy link
Collaborator

Thanks for the report @mherzberg, we are looking into it

@mherzberg
Copy link
Author

Many thanks for working on this issue, it is much appreciated.

  1. I'm afraid on my Windows machine I still get the error as described in my original post. Only if I use the GG_GIT_CONFIG variable (see here), with a file that has core.longpaths = true set, the fix is fully working for me. I tested this both when using 1.36.0 installed via pipx and when using the PyInstaller variant. I wonder if core.longpaths should perhaps be enabled by default by ggshield? Is setting the GG_GIT_CONFIG variable the expected way of using ggshield?
  2. I noticed that you introduced a limit on the number of files per call but are not checking the length of the filenames - the file names in my proof of concept are only a bit over 200 characters long, but wouldn't the limit of 32,767 characters still be exceeded if the individual filepaths are longer than ~330 characters? Keeping in mind it is not only the name that counts but the full path relative to the Git repo root, which can get rather long depending on the programming language and frameworks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:new This issue needs to be reviewed type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants