Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of special characters in filenames #155

Merged
merged 3 commits into from
May 26, 2016

Conversation

matthijskooijman
Copy link
Collaborator

This lets arduino-builder output correct #line directives, even if there are backslashes or doublequotes in the filenames. Backslashes were already handled in most cases, but this makes sure doublequotes are also handled. However, due to limitations in ctags, this does not actually make doublequotes in filenames work completely (prototypes are not generated), at least the build does not completely break now when there is a doublequote in a filename.

In some cases, backslashes in a filenames were already escaped when
emitting a #line directive, but sometimes no escaping happened. In any
case, quotes in a filename should also be escaped.

This commit introduces a helper function `QuoteCppString()` to handle
quoting and escaping strings in a generic way. This new helper function
is used in the code itself, but not in testcases yet.

Note that ctags does not quite properly handle special characters and
escaping in filenames. When parsing line markers, it just ignores
escapes, so any escapes from the #line marker will be present in the
ctags output as well. Before this commit, they would be copied
unmodified into the #line directives generated from the ctags filenames.  Now that these line directives have escaping applied, the filenames read from ctags need to be unescaped, to prevent double escapes. This
unescaping already happened in CollectCTagsFromSketchFiles, but is now
moved to parseTag where it is more appropriate and applies to all users.

Since ctags does not handle escapes in line markers, it cuts the
filename off at the first double quote. If a filename contains a double
quote, the filename as output by ctags will be cut off before it,
leaving the escaping \ at the end. Before this commit, this trailing \
would be output directly into the generated #line marker, which confused
gcc and made the build fail. With this commit, this trailing \ is
escaped itself, making the output syntactically valid, so the build now
works with double quotes in a filename. However, due to this filename
cut off, arduino-builder will no longer generate prototypes for
functions in files that have a double quote in them (since it can no
longer detect them as sketch functions), so double quotes (and probably
tabs too) in filenames are still not completely supported yet.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
In some testcases, the generated output is checked, which contains #line
directives with filenames. Previously, this compared against the
filename with just backslashes escaped, but this would be incorrect when
there are quotes in the filename. Since the previous commit, the
arduino-builder code itself uses a new `utils.QuoteCppString` function to
apply escaping, so this applies that same escaping.

Part of the changes modify the test code directly, part of the changes
replace the `EscapeBackSlashes` "function" that was used in
interpolating some text files with a call to QuoteCppString instead.

Note that QuoteCppString already adds quotes around its result, so those
are removed from the places where this function is called.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
This allows properly parsing and unescaping a string literal. This
function is not currently used (it was written to prepare for better
parsing of #line directives, which might or might not be added later),
but it might be relevant for other things in the future as well.

Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl>
@ArduinoBot
Copy link
Contributor

✅ Build completed.

⬇️ Build URL: http://downloads.arduino.cc/PR/arduino-builder/arduino-builder-155.zip

ℹ️ To test this build:

  1. Replace arduino-builder binary (you can find it where you installed the IDE) with the provided one

@facchinm facchinm added this to the nwrt milestone May 23, 2016
// and just cuts off the filename at the first double quote it
// sees. This means any backslashes are still escaped, and need
// to be unescape, and any quotes will just break the build.
tag.Filename = strings.Replace(parts[1], "\\\\", "\\", -1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use utils.ParseCppString?

Having a special handling here looks weird, I'd use the full parsing even if ctags doesn't handle \". Or maybe there are corner cases I'm missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but:

  • parseCppString requires double quotes around the string, which should then be added here before passing the string to parseCppString (not really a problem)
  • If a double quote is present in a filename, it is escaped by gcc, but then ctags chops the string at the escaped double quote, causing the filename to end in a backslash. Parsing this by parseCppString will fail, so this would need extra special casing.

Mostly regarding the latter, it seemed better to just handle backslashes here, in a way that also "fixes" any double quotes broken by ctags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a double quote is present in a filename, it is escaped by gcc, but then ctags chops the string at the escaped double quote, causing the filename to end in a backslash. Parsing this by parseCppString will fail, so this would need extra special casing.

I see, probably we can just ignore parseCppString errors, and just use the filename as is (parseCppString already return the input filename unchanged if the parsing fails).
In any case a file containing a double quote will not work, but having the full parsing in place may allows future versions of ctags to run properly (without the need to touch this part again).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, yes. However, if ctags would properly handle escaping, it would need to unescape quotes and backslashes when reading the line directives, but it would need to escape backslashes and tabs when generating its output, since its fields are currently tab-separated. Alternatively, it would need to add quotes around the filename and escape quotes instead of tabs, meaning that tabs inside quotes are implicitely "escaped" (i.e. are not seen as a field boundary). In either case, arduino-builder needs more changes than you're proposing: splitting the ctags output on tabs will no longer work, since escaped tabs inside the filename should be skipped.

One alternative that would work without major changes with your suggestion is that tabs (and possibly other characters as well) are escaped using octal escapes (e.g. \011 for tab). This is technically supported in a preprocessor string, though gcc line markers do not currently emit them, so utils.parseCppString does not currently parse them either.

Hence I'm inclined to leave this as it is now, and only improve the parsing here if it ever becomes relevant due to ctags improvements (which might not ever happen if we replace ctags soon).

@cmaglie cmaglie merged commit e418963 into arduino:master May 26, 2016
@cmaglie cmaglie modified the milestones: 1.3.19, nwrt May 26, 2016
@matthijskooijman matthijskooijman deleted the filename-quoting branch May 31, 2016 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants