-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of special characters in filenames #155
Conversation
dd1b72e
to
29922ba
Compare
In some cases, backslashes in a filenames were already escaped when emitting a #line directive, but sometimes no escaping happened. In any case, quotes in a filename should also be escaped. This commit introduces a helper function `QuoteCppString()` to handle quoting and escaping strings in a generic way. This new helper function is used in the code itself, but not in testcases yet. Note that ctags does not quite properly handle special characters and escaping in filenames. When parsing line markers, it just ignores escapes, so any escapes from the #line marker will be present in the ctags output as well. Before this commit, they would be copied unmodified into the #line directives generated from the ctags filenames. Now that these line directives have escaping applied, the filenames read from ctags need to be unescaped, to prevent double escapes. This unescaping already happened in CollectCTagsFromSketchFiles, but is now moved to parseTag where it is more appropriate and applies to all users. Since ctags does not handle escapes in line markers, it cuts the filename off at the first double quote. If a filename contains a double quote, the filename as output by ctags will be cut off before it, leaving the escaping \ at the end. Before this commit, this trailing \ would be output directly into the generated #line marker, which confused gcc and made the build fail. With this commit, this trailing \ is escaped itself, making the output syntactically valid, so the build now works with double quotes in a filename. However, due to this filename cut off, arduino-builder will no longer generate prototypes for functions in files that have a double quote in them (since it can no longer detect them as sketch functions), so double quotes (and probably tabs too) in filenames are still not completely supported yet. Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
In some testcases, the generated output is checked, which contains #line directives with filenames. Previously, this compared against the filename with just backslashes escaped, but this would be incorrect when there are quotes in the filename. Since the previous commit, the arduino-builder code itself uses a new `utils.QuoteCppString` function to apply escaping, so this applies that same escaping. Part of the changes modify the test code directly, part of the changes replace the `EscapeBackSlashes` "function" that was used in interpolating some text files with a call to QuoteCppString instead. Note that QuoteCppString already adds quotes around its result, so those are removed from the places where this function is called. Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This allows properly parsing and unescaping a string literal. This function is not currently used (it was written to prepare for better parsing of #line directives, which might or might not be added later), but it might be relevant for other things in the future as well. Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
29922ba
to
e418963
Compare
✅ Build completed. ⬇️ Build URL: ℹ️ To test this build:
|
// and just cuts off the filename at the first double quote it | ||
// sees. This means any backslashes are still escaped, and need | ||
// to be unescape, and any quotes will just break the build. | ||
tag.Filename = strings.Replace(parts[1], "\\\\", "\\", -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use utils.ParseCppString
?
Having a special handling here looks weird, I'd use the full parsing even if ctags doesn't handle \"
. Or maybe there are corner cases I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but:
parseCppString
requires double quotes around the string, which should then be added here before passing the string toparseCppString
(not really a problem)- If a double quote is present in a filename, it is escaped by gcc, but then ctags chops the string at the escaped double quote, causing the filename to end in a backslash. Parsing this by
parseCppString
will fail, so this would need extra special casing.
Mostly regarding the latter, it seemed better to just handle backslashes here, in a way that also "fixes" any double quotes broken by ctags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a double quote is present in a filename, it is escaped by gcc, but then ctags chops the string at the escaped double quote, causing the filename to end in a backslash. Parsing this by parseCppString will fail, so this would need extra special casing.
I see, probably we can just ignore parseCppString
errors, and just use the filename as is (parseCppString
already return the input filename unchanged if the parsing fails).
In any case a file containing a double quote will not work, but having the full parsing in place may allows future versions of ctags to run properly (without the need to touch this part again).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true, yes. However, if ctags would properly handle escaping, it would need to unescape quotes and backslashes when reading the line directives, but it would need to escape backslashes and tabs when generating its output, since its fields are currently tab-separated. Alternatively, it would need to add quotes around the filename and escape quotes instead of tabs, meaning that tabs inside quotes are implicitely "escaped" (i.e. are not seen as a field boundary). In either case, arduino-builder needs more changes than you're proposing: splitting the ctags output on tabs will no longer work, since escaped tabs inside the filename should be skipped.
One alternative that would work without major changes with your suggestion is that tabs (and possibly other characters as well) are escaped using octal escapes (e.g. \011
for tab). This is technically supported in a preprocessor string, though gcc line markers do not currently emit them, so utils.parseCppString
does not currently parse them either.
Hence I'm inclined to leave this as it is now, and only improve the parsing here if it ever becomes relevant due to ctags improvements (which might not ever happen if we replace ctags soon).
This lets arduino-builder output correct #line directives, even if there are backslashes or doublequotes in the filenames. Backslashes were already handled in most cases, but this makes sure doublequotes are also handled. However, due to limitations in ctags, this does not actually make doublequotes in filenames work completely (prototypes are not generated), at least the build does not completely break now when there is a doublequote in a filename.