Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corruption: git "stage selected ranges" on non-UTF8 #144469

Closed
shoffmeister opened this issue Mar 5, 2022 · 3 comments
Closed

Corruption: git "stage selected ranges" on non-UTF8 #144469

shoffmeister opened this issue Mar 5, 2022 · 3 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug *duplicate Issue identified as a duplicate of another issue(s) git GIT issues

Comments

@shoffmeister
Copy link

shoffmeister commented Mar 5, 2022

Issue Type: Bug

The "Stage selected ranges" feature of the built-in git extension corrupts the source code if the source code is in a non-UTF-8 text encoding.

This is specific to the vscode git extension; plain command-line git (git add -p) does not suffer from this defect.

The following script sets up a clean git repo, produces a text file in text encoding cp1252 (which is Windows) containing "Zürich rocks" (which rocks the git extension's boat, ha!), constructs a good vscode workspace for that, and then we play with vscode to demonstrate the problem.

git init wombat
cd wombat

rm -rf ./myfile.txt
for l in {1..10}; do echo "${l}" >> myfile.txt; done;

echo "Zürich rocks!" >> myfile.txt

for l in {1..10}; do echo "${l}" >> myfile.txt; done;

UTF8COUNT=$(file myfile.txt | grep -c 'UTF-8')
if [[ ${UTF8COUNT} -ne 1 ]]; then
  echo "test setup error"
  exit 1
fi

iconv -f utf8 -t cp1252 -o myfile-cp1252.txt myfile.txt 

git add myfile*
git commit -m"baseline"

sed --in-place "s/5/x/g" myfile-cp1252.txt

cat > wombat.code-workspace <<EOF

{
        "folders": [
                {
                        "path": "."
                }
        ],
        "settings": {
                "files.encoding": "windows1252"
        }
}
EOF

code --disable-extensions ./wombat.code-workspace

At this stage, you have vscode running without any extensions; your workspace is enabled for cp1252, and myfile-cp1252.txt contains two changes.

In vscode, go to the git extension view, click on myfile-cp1252.txt, take note of the umlaut in Zürich being all fine and dandy.

Now hightlight one of the changed lines and context menu -> "Stage selected ranges", and notice now

image

IOW, the left-hand side has data corruption, because something in the implementation of the vscode git extension actively performs magic text processing without retaining the workspace encoding cp1252.

Commentary:

Using git add -p, you will notice that this works just fine - because git in itself is not aware, and does not care, about text encodings.

What's corrupted by "Stage selected ranges" is obviously also corrupted by "Revert selected ranges".

I do not think that this is a regression in 1.65.

VS Code version: Code 1.65.0 (b5205cc, 2022-03-02T11:12:36.248Z)
OS version: Linux x64 5.16.11-200.fc35.x86_64
Restricted Mode: No

@Lemmingh
Copy link
Contributor

Lemmingh commented Mar 6, 2022

/duplicate #111915

When invoking "Git: Stage Selected Ranges" (git.stageSelectedRanges), VS Code converts the text in the editor to binary via a Writable stream, but unfortunately it always uses utf8 because the encoding information is unavailable (#824).

@shoffmeister
Copy link
Author

This indeed smells like a duplicate - and I am very seriously surprised that a data corruption defect has been surviving for so long in vscode.

How can we trust an IDE if it destroys code? This is the built-in(!) git extension!

Whether or not encoding information is not part of the extension system - if extensions do not get this information, I guess it's prime time to provide that, given that the concept of "workspace encoding" is there - and every. single. source. file. in a non-unicode encoding will be broken by the current behaviour.

@lszomoru lszomoru added git GIT issues bug Issue identified by VS Code Team member as probable bug labels Mar 9, 2022
@lszomoru
Copy link
Member

@shoffmeister, sorry for not getting back to you on this until now. I can confirm that this is indeed a duplicate of #111915. I will be closing this issue as duplicate and I will be updating #111915 as I further investigate the issue.

@lszomoru lszomoru added the *duplicate Issue identified as a duplicate of another issue(s) label Mar 23, 2022
@github-actions github-actions bot locked and limited conversation to collaborators May 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug *duplicate Issue identified as a duplicate of another issue(s) git GIT issues
Projects
None yet
Development

No branches or pull requests

3 participants