-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
globset: support backslash escaping #811
Conversation
@bmalehorn Thanks for putting the work into this! #526 raised a number of issues with this, particularly with respect to Windows. Could you explain how those issues are addressed in this PR? I also see a test that is only run on Unix, but no comment explaining why. Could you elaborate on that as well? |
Aside from the Windows path issue, if i'm reading it right, this doesn't actually conform to the real-world behaviour of either (For comparison, here is where i provided my own attempt at fixing it, and as mentioned i think it behaves pretty well on UNIX, but it leaves the Windows issues unaddressed just like this one does.) |
@okdana That's not quite correct, However I'm realizing now what you two probably realized a while ago, that this is going to break I suppose the real problem stems from using the same glob code for both So I suppose the only ways to parse .gitignores 100% accurately are:
Both of these kind of suck so I can see why this issue hasn't been closed yet. Does that explanation make sense to everyone? Perhaps it's best to leave everything as it is: don't annoy Windows users, don't be inconsistent, but do parse a rare .gitignore pattern incorrectly. |
How did you come to that conclusion?
Or with Git:
I personally would like |
@okdana Sorry if I wasn't clear, but I was describing the behavior of my patch, not
I propose a compromise: on Windows, globset would behave as-is, but on Unix, we would correct it to behave like fnmatch (like in your patch). Windows users could still use |
Oh, sorry.
That's how PHP does it. Not sure if it's the best option or not, but it's an option... |
This sounds plausible, but we need to be fastidious in how we go about this. That is, I suspect this should be a configuration knob on the fn forward_slash_separator(&mut sefl, yes: bool) -> &mut GlobBuilder<'a> { We should document this to say that when a forward slash is a separator, then it will never be interpreted as the start of an escape sequence. But when disabled, I'm tempted to say that this should be disabled by default on all non-Windows platforms and enabled by default on Windows platforms, but that kind of conditional logic in a somewhat fundamental crate like this seems... unwise. Alas, we already have some conditional logic in that we always treat Line 730 in 597bf04
And looking at that more, explicitly calling out "forward slash" in the API seems wrong too, since we've broken the "path separator" abstraction by doing so. What about this instead? /// When enabled, a forward slash (`\`) may be used to escape
/// special characters in a glob pattern. Additionally, this will
/// prevent `\` from being interpreted as a path separator on all
/// platforms.
///
/// This is enabled by default on platforms where `\` is not a
/// path separator and disabled by default on platforms where `\`
/// is a path separator.
fn forward_slash_escape(&mut sefl, yes: bool) -> &mut GlobBuilder<'a> { Regrettably, there is no way around a breaking change here, which means this will require another semver bump. /sigh |
@retep998 Do you have any opinions on how |
Nit-pick: An option like that sounds reasonable to me for whatever it's worth. The only concerns i can think of with the platform-specific method are (1) it might trick Windows users into thinking that Git supports I never use Windows, so i have zero personal stake in how well |
🤣 Thanks for that. Derp.
Yeah, I share your trepidation, definitely. I don't use Windows either, so I basically have no idea what the common or expected usage patterns are. Bascially, I hear your argument, but I also think, "Shouldn't Windows users be able to use their native path separator? Maybe they are just annoyed that git doesn't allow it?" But yeah, hopefully @retep998 can shed some light. @roblourens might also have opinions! |
Any changes to make |
I understand this for the CLI, definitely. But is it also true for But that is a good point about the CLI. We may want to toggle this behavior differently depending on whether the globs are in .gitignore or the CLI. |
On Windows, in |
All right, I guess that's good enough for me. @bmalehorn I think my comment here still seems mostly relevant: #811 (comment) That is, we still need to make |
This all makes sense and sounds like the most reasonable solution. I'll update this PR tomorrow. |
From `man 7 glob`: One can remove the special meaning of '?', '*' and '[' by preceding them by a backslash, or, in case this is part of a shell command line, enclosing them in quotes. Conform to glob / fnmatch / git implementations by making `\` escape the following character - for example `\?` will match a literal `?`. However, only enable this by default on Unix platforms. Windows builds will continue to use `\` as a path separator, but can still get the new behavior by calling `globset.backslash_escape(true)`.
Adding tests for the `Globset::backslash_escape` option was a bit involved, since the default value of this option is platform-dependent. Extend the options framework to hold an `Option<T>` for each knob, where `None` means "default" and `Some(v)` means "override with `v`". This way we only have to specify the default values once in `GlobOptions::default()` rather than replicated in both code and tests. Finally write a few behavioral tests, and some tests to confirm it varies by platform.
ddc3030
to
50bdc19
Compare
Use the new `Globset::backslash_escape` knob to conform to git behavior: `\` will escape the following character. For example, the pattern `\*` will match a file literally named `*`. Also tweak a test in ripgrep that was relying on this incorrect behavior. Closes BurntSushi#526.
50bdc19
to
862fd65
Compare
I probably should have commented when I updated the PR. @BurntSushi thoughts on the updated PR? |
From
man 7 glob
:Conform to glob / fnmatch / git implementations by parsing
\?
,\*
and
\[
as literal?
,*
and[
. If a backslash is followed byanything else, parse it as before.
Closes #526