Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the TODO test in ack-w.t #14

Closed
petdance opened this issue Dec 1, 2011 · 11 comments
Closed

Fix the TODO test in ack-w.t #14

petdance opened this issue Dec 1, 2011 · 11 comments

Comments

@petdance
Copy link
Collaborator

petdance commented Dec 1, 2011

No description provided.

@hoelzro
Copy link
Collaborator

hoelzro commented Mar 8, 2012

More specifically, this one:

local $TODO = q{I can't figure why the -w works from the command line, but not inside this test};

In my experience, this invocation doesn't on the command line either, from either ack 1 or 2. The command line in question is this:

ack mu. -w -h t/text

The -w flag throws on a boundary check (\b) on each side of the search term, but only if the corresponding side ends in an alphanumeric. What we really need is a regular expression that matches a non-space character on one side, and a space character on the other, similar to \b.

@gedge
Copy link

gedge commented Dec 7, 2013

The test does fail on the command-line, for me, too.

Why is \b put either side of the regexp only when it has \w each side? This breaks any (?:alternative|list) too.

Can't we just make \b...\b cuddling unconditional?

@epa
Copy link

epa commented Apr 9, 2014

The documentation says

       -w, --word-regexp
       Force PATTERN to match only whole words.  The PATTERN is wrapped with "\b" metacharacters.

So it does sound that the code should unconditionally add \b on each side.

@hoelzro
Copy link
Collaborator

hoelzro commented Apr 9, 2014

No matter what we do, the docs have to change. The two sentences you've pointed out are contradictory, as wrapping things with \b unconditionally will fail to watch whole words only (at least, for liberal definitions of 'word' here). I'm personally in favor of removing the second sentence, as it's an implementation detail, subject to change, and not every ack user is familiar with Perl's regular expressions and \b.

@petdance
Copy link
Collaborator Author

petdance commented Apr 9, 2014

The reason it says that is because this issue of "how should -w work" has been around since the dawn of time. I'm not saying we shouldn't change the \b-wrapping behavior, but we've been down the road of "This is what -w should do! No, this is what it should do!" before.

@epa
Copy link

epa commented Apr 9, 2014

I don't have strong views on the particular semantics of -w and I think the documented behaviour is as good as any. We just need to make sure the code and the test suite match the documentation!

@gedge
Copy link

gedge commented Apr 12, 2014

ack has three interpretations of -w.

Two documentation sentences specify different cases:

  1. "whole words" makes the assumption that the supplied pattern will match /^\w+$/ (great for simple documentation and simple use cases)
  2. the second sentence has much wider scope (e.g. ack -w 'a|an' will end up as \ba|an\b and will match the a in after and also the an in bran - fail) - the simplistic first sentence no longer applies

The code (perl) is a third interpretation on the word theme, implementing sentence 2 conditionally and (worse, IMO) partially (we might get zero, one or two \b affixed to the PATTERN). The recent commit merely resolved one of the alternation issues.

The question remains: What is the most desired interpretation (and how do we document it so that users understand it)?

For me, the answer to that is to unconditionally apply \b(?:PATTERN)\b - with the following documentation:

Word match (the PATTERN matches if it is bounded by word breaks). PATTERN is wrapped in \b(?:....)\b

In short, I want ack -w 'a|an|th(?:e|at|is)' to match only the words a, an, the, that and this.

@epa
Copy link

epa commented Apr 12, 2014

I agree 100% with gedge, and would add that peeking at the pattern (to see if it begins with a word character, or whatever) is generally a broken approach because it will never understand regexp syntax sufficiently.

@epa
Copy link

epa commented Jan 14, 2015

A possible fix for this is in issue #445 .

@petdance
Copy link
Collaborator Author

This will get fixed in ack3.

@petdance petdance added the ack3 label Mar 14, 2017
@petdance petdance removed this from the Sooner milestone Mar 18, 2017
@petdance
Copy link
Collaborator Author

This behavior is redone in ack3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants