Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gherkin: Commented Cucumber tags still being picked up in getTags() #721

Closed
mjordanBetBright opened this issue Oct 2, 2019 · 19 comments · Fixed by #880
Closed

gherkin: Commented Cucumber tags still being picked up in getTags() #721

mjordanBetBright opened this issue Oct 2, 2019 · 19 comments · Fixed by #880
Labels
🐛 bug Defect / Bug language: java library: gherkin 🧷 pinned Tells Stalebot not to close this issue

Comments

@mjordanBetBright
Copy link

Summary

Commenting out a single tag above a scenario (leaving the first tag uncommented). The Scenario object still picks up the commented tag using getTags()

Expected Behavior

The expected behaviour can be seen if you comment out the entire line of tags. In this case, none of the tags are picked up by getTags()

Current Behavior

Commenting a tag at the end of a list of tags on a scenario, and checking the getTags() values. The uncommented tag(s) are visible (expected), but the commented out tag(s) are also visible. It should be noted that where the comment starts, it is added to the corresponding tag value (e.g. instead of @exampleTag, it's now #@exampleTag)

image
image

Your Environment

  • Version used: cucumber-jvm 4.3.1
  • Operating System and version: Win 10
@stale
Copy link

stale bot commented Dec 15, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

@stale stale bot added the ⌛ stale Will soon be closed by stalebot unless there is activity label Dec 15, 2019
@mpkorstanje mpkorstanje added 🐛 bug Defect / Bug 🧷 pinned Tells Stalebot not to close this issue and removed ⌛ stale Will soon be closed by stalebot unless there is activity labels Dec 15, 2019
@mpkorstanje
Copy link
Contributor

Can confirm this bug is still present in Gherkin 8/9. Basically any line staringt with a @ is interpreted as a tag and then split by white space.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Dec 25, 2019

And that is exactly what happens.

FeatureHeader! := #Language? Tags? #FeatureLine DescriptionHelper

A feature header has a language definition, tags, feature line and description helper.

Tags! := #TagLine+

The tags are defined as a tag line.

    @Override
    public boolean match_TagLine(Token token) {
        if (token.line.startsWith(GherkinLanguageConstants.TAG_PREFIX)) {
            setTokenMatched(token, TokenType.TagLine, null, null, null, token.line.getTags());
            return true;
        }
        return false;
    }
    @Override
    public List<GherkinLineSpan> getTags() {
        return getSpans("\\s+");
    }

Which is then matched and split on whitespace.

@mpkorstanje
Copy link
Contributor

The go implementation is definitely different. Tokens that don't start with @ are melded into the previous tag.

func (m *matcher) MatchTagLine(line *Line) (ok bool, token *Token, err error) {
	if line.StartsWith(TAG_PREFIX) {
		var tags []*LineSpan
		var column = line.Indent()
		splits := strings.Split(line.TrimmedLineText, TAG_PREFIX)
		for i := range splits {
			txt := strings.Trim(splits[i], " ")
			if txt != "" {
				tags = append(tags, &LineSpan{column, TAG_PREFIX + txt})
			}
			column = column + utf8.RuneCountInString(splits[i]) + 1
		}

		token, ok = m.newTokenAtLocation(line.LineNumber, line.Indent()), true
		token.Type = TokenTypeTagLine
		token.Items = tags
	}
	return
}

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Dec 25, 2019

Javascript does the same as go.

  public getTags() {
    let column = this.indent + 1
    const items = this.trimmedLineText.trim().split('@')
    items.shift()
    return items.map(function(item) {
      const length = item.length
      const span = { column, text: '@' + item.trim() }
      column += length + 1
      return span
    })
  }

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Dec 25, 2019

    @property
    def tags(self):
        column = self.indent + 1
        items = self._trimmed_line_text.strip().split('@')
        tags = []
        for item in items[1:]:
            tags.append({'column': column, 'text': '@' + item.strip()})
            column += len(item) + 1
        return tags

Same for python.

So I'm confident in saying that the Java implementation of Gherkin is splitting the tags incorrectly.

However Gherkin also doesn't support comments anywhere but at the start of the line. It is not possible to comment tags inline. So the syntax highlighting in IDEA is incorrect.

The alternative would be to use one line per tag.

@Tag one
#@Commented tag two
Scenario: Some scenario

@luke-hill
Copy link
Contributor

I can take a look at this in a week or so Rien for Ruby unless you're comfortable fixing it yourself. Let me know.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Jan 30, 2020

I just got started but you're welcome to help!

In pseudo code I think the solution to should be

uncommented line := remove everything after the # from the line.
items := split the uncommented line on `@`. 

for each item in items:
   firstToken:= remove everything after the first whitespace space character
   create gherkin line span using the first token

And I'm not sure if this the right solution. We may as well throw a parse error if we encounter whitespace in a tag.

@aslakhellesoy
Copy link
Contributor

aslakhellesoy commented Jan 30, 2020

Sorry I am late to the party here.

I don't think we should allow spaces in tags.
All tags must start with a @

Gherkin only allows comments if they are preceded by zero or more whitespace, and I don't think we should make an exception for lines with tags.

So these lines should all cause parse errors:

@tag blah
@tag blah @other
@tag #blah
@tag #blah @other

This should be treated as a regular comment (everything on the line is ignores)

# @this @is @ignored
# @this is also @ignored

@mpkorstanje
Copy link
Contributor

Gherkin only allows comments if they are preceded by zero or more whitespace, and I don't think we should make an exception for lines with tags.

IDEA, Eclipse and this syntax highlighter on Github all disagree.

Feature: example
  @Tag #comment
  Scenario: example

Sensibly so (at least for java) because @tag #@commentedTag worked as expected. And I don't think we should break that now, it would add complexity for every one. I'd rather conform to what has become a defacto standard.

@mpkorstanje
Copy link
Contributor

Gherkin only allows comments if they are preceded by zero or more whitespace.

I think I found were this went wrong. The example below also works in IDEA, Eclipse and Github.

Feature: example
  @Tag#comment
  Scenario: example

People have read this as [\s]*<tag>, rather then ^[\s]*<tag>

@aslakhellesoy
Copy link
Contributor

I think we should throw a parse error if a # character is found after a @ character. I understand this will be a backwards incompatible change, but I still think we should do it. It's easy for people to fix.

We need a few more files in testdata/bad to illustrate this. I think this is sufficient:

@tag #notallowed
@tag#notallowedeither
@tag notallowed

The algorithm: If a line matches /^\s@/ it's potentially a tag line.
Split on /\s+/, trim spaces in each element, verify that:

  • It starts with @
  • Doesn't contain #

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Jan 30, 2020

I understand this will be a backwards incompatible change, but I still think we should do it. It's easy for people to fix.

I'm not sure about that. Fixing individual feature files is easy sure.

However IDEs and syntax highlighters update on a much slower cycle. And most Cucumber users will depend on their IDE to tell them if their feature file is valid or not. This inconsistency will cause allot of confusion for a long time.

While IDEs are updating we'll also see them supporting different Gherkin versions. So depending on the editor a feature file may or may not look valid. Again this adds to the confusion.

It is also not a major new feature like rules so there is a good chance it won't be fixed pro-actively either if at all.

So overall this will be quite hurtful to the ecosystem. Implementation wise, both parsing and rejecting the line have comparable complexity. So I don't see a reason not to parse.

@aslakhellesoy
Copy link
Contributor

Fair points about IDEs. We can allow comments after the initial @. I still think we should disallow spaces in tag names though. Allowing spaces would also require allowing it in tag-expressions, and I don't want to open that can of worms.

@foo bar -> parse error
@foo bar @zap -> parse error
@foo bar @zap #comment -> parse error
@foo # bar @zap -> ['@foo']

WDYT?

@mpkorstanje
Copy link
Contributor

I still think we should disallow spaces in tag names though.

Agreed. They're functionally not usable and IDEs don't accept them either.

@aslakhellesoy
Copy link
Contributor

What about this though:

@foo @bar#zap

Does that become ['@foo,' '@bar#zap'] or ['@foo', '@bar']? I think the latter. No space required before the # - the comment starts as soon as it's seen.

@aslakhellesoy
Copy link
Contributor

Here is another one:

@foo@bar

Parse error?

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Jan 30, 2020

This looks okay:

Feature: example
  @Tag#comment @Tag@Tag
  Scenario: example

If we use \s# as the comment delimiter and @ as the tag delimiter (ruby, js and go do this) we should be alright.

'@foo @bar#zap' => ['@foo', '@bar#zap']
'@foo@bar' =>  ['@foo', '@bar']

We don't need to worry about lines with only comments as they're handled by the parser already.

'#foo @bar'  => [] //Handled by the parser
'   #bar'  => []  //Handled by the parser

@mpkorstanje
Copy link
Contributor

Got a good solution for Java in #880. Also got rid of the scanner so it should be easier to port to other languages. There is some extra character counting going on but we can ignore that in other languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Defect / Bug language: java library: gherkin 🧷 pinned Tells Stalebot not to close this issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants