Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle quotes and colon in filenames #173

Open
andreineculau opened this issue Oct 14, 2023 · 3 comments
Open

Handle quotes and colon in filenames #173

andreineculau opened this issue Oct 14, 2023 · 3 comments
Labels

Comments

@andreineculau
Copy link
Collaborator

andreineculau commented Oct 14, 2023

Somewhere we don't escape filenames properly, ending up with errors like

fatal: ambiguous argument ':"Workspace/ETL/Jobb/Generalist Marketplace/ismdou - Know the identifiers of the table \"FOO\".py"': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
fatal: path 'Workspace/ETL/Jobb/Generalist Marketplace/ismdou - Service Finder Analysis (stakeholder ' does not exist (neither on disk nor in the index)
fatal: path 'Workspace/Users/username@example.com/Accelerators/Wide&Deep Recommender/MR 01' does not exist (neither on disk nor in the index)
fatal: path 'Workspace/Users/username@example.com/Accelerators/Wide&Deep Recommender/MR 02' does not exist (neither on disk nor in the index)
fatal: path 'Workspace/Users/username@example.com/Accelerators/Wide&Deep Recommender/MR 03' does not exist (neither on disk nor in the index)

The first fatal is due to double quotes in the filename. The rest are due to colon in the filename (not visible, but right after the path in the error message; filename is truncated in the error message).

PS: The worst of it is that the commit goes through, so you end up with a commit that should have encrypted files, but instead files are in plain text.

@jmurty
Copy link
Collaborator

jmurty commented Oct 22, 2023

Hi, I have done some testing and the problem is triggered by the presence of quote characters (") in filenames, how these quotes are output by Git commands and then fed into follow-on commands within Transcrypt.

At multiple places in the Transcrypt script we use a command like this to find encrypted files (tracked files to which the crypt filter is applied):

git -c core.quotePath=false ls-files | git -c core.quotePath=false check-attr --stdin filter | awk 'BEGIN { FS = ":" }; /crypt$/{ print $1 }'

These commands try to avoid problems with special/unusual characters in filenames by disabling Git's core.quotePath config setting, but unfortunately per the documentation:

Double-quotes, backslash and control characters are always escaped regardless of the setting of this variable.

Running the above command on a repo containing files with quote characters produces output like this:

"ismdou - Know the identifiers of the table \"FOO\".py.secret"
sensitive_file

It is because the ismdou… file gets quoted and quote-escaped in this way that follow-up commands fail.

Specifically for your reported issue, the pre-commit hook is failing because it's passing the quoted file path/name to a command git show :"${secret_file}" to determine whether the file is properly encrypted, but this gets interpreted as the following which doesn't work: git show :'"ismdou - Know the identifiers of the table \"FOO\".py.secret"'

Although the pre-commit hook is the culprit it this case, the same problem manifests for the --list command (shows over-quoted output) and the --show-raw command (will not work if you name the file, and also fails if you use the wildcard --show-raw=* option):

./transcrypt --show-raw=*
==> "ismdou - Know the identifiers of the table \"FOO\".py.secret" <==
fatal: path '"ismdou - Know the identifiers of the table \"FOO\".py.secret"' does not exist in 'HEAD'

So I can tell what the problem is, but how to fix it is less clear. We need to do one of:

  • come up with an alternative to the piped commands I listed at the top, so we can identify and act on encrypted files from Git's file listing. I think this is the fix we should pursue, but it won't be easy (see below)
  • somehow remove the outer quotes and un-escape inner quotes for over-quoted filenames before we feed them to git show commands. I doubt there is any sensible way to do this in bash.

I have experimented with using the -z option instead of -c core.quotePath=false to identify encrypted files via ls-files and check-attr Git commands. The -z option use NUL to delimit filenames, instead of newlines, but importantly avoids quoting file names at all (emphasis mine):

Without the -z option, pathnames with "unusual" characters are quoted as explained for the configuration variable core.quotePath (see git-config[1]). Using -z the filename is output verbatim and the line is terminated by a NUL byte.

The trick will be figuring out how to string together the commands to work with NUL-delimited outputs, especially given that the command sequence relies on adding – then removing – the suffix : filter: crypt to the same line as the original filename of encrypted files.

So far I've gotten as far as the following, which avoids the unwanted filename quoting using -z with the first ls-files command, but then unfortunately adds it back in with the following check-attr command:

git ls-files -z | awk 'BEGIN { RS = "\0" }; { print $0 }' | git check-attr --stdin filter

I suspect a proper fix will require replacing the single-line piped commands with a function instead that will iterate over unquoted filenames (thanks to the -z option), run the check-attr command on each one individually to identify encrypted files, then add the unquoted filename of just the encrypted files to an output string.

jmurty added a commit that referenced this issue Oct 22, 2023
- Add `_list_encrypted_files()` function that returns get un-quoted
  filenames for encrypted files by using the `-z` option to Git's
  `ls-files` command, with follow-on special handling
- Use this function instead of long piped commands previously used
  to list encrypted filenames, which failed if these names contained
  double-quote characters due to the insufficient work-around of
  unsetting the `core.quotePath` config option.
@jmurty
Copy link
Collaborator

jmurty commented Oct 22, 2023

I've started working towards a fix for this in PR #174

In my manual testing the PR works for most situations, but tests are failing for a few use-cases that have been broken by my changes.

@jmurty
Copy link
Collaborator

jmurty commented Oct 25, 2023

The potential fix on branch 173-handle-quotes-in-filenames is now passing all unit tests and works for my manual testing of file names containing double-quotes. Can you try it and see if it works for you @andreineculau?

jmurty added a commit that referenced this issue Jan 4, 2025
* 173-handle-quotes-in-filenames:
  Fix formatting
  Avoid unusual file-quoting issues breaking check for encrypted files by not checking file name directly
  Fix listing of encrypted files to respect non-default contexts
  Update changelog
  Correct tests' expected values for ls-crypt commands
  Move uninstall work that relies on `ls-crypt` earlier to before removal of transcript from Git repo
  Fix handling of encrypted files with double-quotes in name. #173

# Conflicts:
#	CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants