gherkin: Commented Cucumber tags still being picked up in getTags() #721

mjordanBetBright · 2019-10-02T17:01:44Z

Summary

Commenting out a single tag above a scenario (leaving the first tag uncommented). The Scenario object still picks up the commented tag using getTags()

Expected Behavior

The expected behaviour can be seen if you comment out the entire line of tags. In this case, none of the tags are picked up by getTags()

Current Behavior

Commenting a tag at the end of a list of tags on a scenario, and checking the getTags() values. The uncommented tag(s) are visible (expected), but the commented out tag(s) are also visible. It should be noted that where the comment starts, it is added to the corresponding tag value (e.g. instead of @exampleTag, it's now #@exampleTag)

Your Environment

Version used: cucumber-jvm 4.3.1
Operating System and version: Win 10

stale · 2019-12-15T11:54:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs.

mpkorstanje · 2019-12-25T14:39:09Z

Can confirm this bug is still present in Gherkin 8/9. Basically any line staringt with a @ is interpreted as a tag and then split by white space.

mpkorstanje · 2019-12-25T14:46:56Z

And that is exactly what happens.

FeatureHeader! := #Language? Tags? #FeatureLine DescriptionHelper

A feature header has a language definition, tags, feature line and description helper.

Tags! := #TagLine+

The tags are defined as a tag line.

    @Override
    public boolean match_TagLine(Token token) {
        if (token.line.startsWith(GherkinLanguageConstants.TAG_PREFIX)) {
            setTokenMatched(token, TokenType.TagLine, null, null, null, token.line.getTags());
            return true;
        }
        return false;
    }

    @Override
    public List<GherkinLineSpan> getTags() {
        return getSpans("\\s+");
    }

Which is then matched and split on whitespace.

mpkorstanje · 2019-12-25T14:56:51Z

The go implementation is definitely different. Tokens that don't start with @ are melded into the previous tag.

func (m *matcher) MatchTagLine(line *Line) (ok bool, token *Token, err error) {
	if line.StartsWith(TAG_PREFIX) {
		var tags []*LineSpan
		var column = line.Indent()
		splits := strings.Split(line.TrimmedLineText, TAG_PREFIX)
		for i := range splits {
			txt := strings.Trim(splits[i], " ")
			if txt != "" {
				tags = append(tags, &LineSpan{column, TAG_PREFIX + txt})
			}
			column = column + utf8.RuneCountInString(splits[i]) + 1
		}

		token, ok = m.newTokenAtLocation(line.LineNumber, line.Indent()), true
		token.Type = TokenTypeTagLine
		token.Items = tags
	}
	return
}

mpkorstanje · 2019-12-25T14:59:18Z

Javascript does the same as go.

  public getTags() {
    let column = this.indent + 1
    const items = this.trimmedLineText.trim().split('@')
    items.shift()
    return items.map(function(item) {
      const length = item.length
      const span = { column, text: '@' + item.trim() }
      column += length + 1
      return span
    })
  }

mpkorstanje · 2019-12-25T15:26:53Z

    @property
    def tags(self):
        column = self.indent + 1
        items = self._trimmed_line_text.strip().split('@')
        tags = []
        for item in items[1:]:
            tags.append({'column': column, 'text': '@' + item.strip()})
            column += len(item) + 1
        return tags

Same for python.

So I'm confident in saying that the Java implementation of Gherkin is splitting the tags incorrectly.

However Gherkin also doesn't support comments anywhere but at the start of the line. It is not possible to comment tags inline. So the syntax highlighting in IDEA is incorrect.

The alternative would be to use one line per tag.

@Tag one
#@Commented tag two
Scenario: Some scenario

luke-hill · 2020-01-30T12:59:47Z

I can take a look at this in a week or so Rien for Ruby unless you're comfortable fixing it yourself. Let me know.

mpkorstanje · 2020-01-30T13:04:58Z

I just got started but you're welcome to help!

In pseudo code I think the solution to should be

uncommented line := remove everything after the # from the line.
items := split the uncommented line on `@`. 

for each item in items:
   firstToken:= remove everything after the first whitespace space character
   create gherkin line span using the first token

And I'm not sure if this the right solution. We may as well throw a parse error if we encounter whitespace in a tag.

aslakhellesoy · 2020-01-30T15:15:35Z

Sorry I am late to the party here.

I don't think we should allow spaces in tags.
All tags must start with a @

Gherkin only allows comments if they are preceded by zero or more whitespace, and I don't think we should make an exception for lines with tags.

So these lines should all cause parse errors:

@tag blah
@tag blah @other
@tag #blah
@tag #blah @other

This should be treated as a regular comment (everything on the line is ignores)

# @this @is @ignored
# @this is also @ignored

mpkorstanje · 2020-01-30T15:22:33Z

Gherkin only allows comments if they are preceded by zero or more whitespace, and I don't think we should make an exception for lines with tags.

IDEA, Eclipse and this syntax highlighter on Github all disagree.

Feature: example
  @Tag #comment
  Scenario: example

Sensibly so (at least for java) because @tag #@commentedTag worked as expected. And I don't think we should break that now, it would add complexity for every one. I'd rather conform to what has become a defacto standard.

mpkorstanje · 2020-01-30T15:33:26Z

Gherkin only allows comments if they are preceded by zero or more whitespace.

I think I found were this went wrong. The example below also works in IDEA, Eclipse and Github.

Feature: example
  @Tag#comment
  Scenario: example

People have read this as [\s]*<tag>, rather then ^[\s]*<tag>

aslakhellesoy · 2020-01-30T15:47:39Z

I think we should throw a parse error if a # character is found after a @ character. I understand this will be a backwards incompatible change, but I still think we should do it. It's easy for people to fix.

We need a few more files in testdata/bad to illustrate this. I think this is sufficient:

@tag #notallowed

@tag#notallowedeither

@tag notallowed

The algorithm: If a line matches /^\s@/ it's potentially a tag line.
Split on /\s+/, trim spaces in each element, verify that:

It starts with @
Doesn't contain #

mpkorstanje · 2020-01-30T16:10:00Z

I understand this will be a backwards incompatible change, but I still think we should do it. It's easy for people to fix.

I'm not sure about that. Fixing individual feature files is easy sure.

However IDEs and syntax highlighters update on a much slower cycle. And most Cucumber users will depend on their IDE to tell them if their feature file is valid or not. This inconsistency will cause allot of confusion for a long time.

While IDEs are updating we'll also see them supporting different Gherkin versions. So depending on the editor a feature file may or may not look valid. Again this adds to the confusion.

It is also not a major new feature like rules so there is a good chance it won't be fixed pro-actively either if at all.

So overall this will be quite hurtful to the ecosystem. Implementation wise, both parsing and rejecting the line have comparable complexity. So I don't see a reason not to parse.

aslakhellesoy · 2020-01-30T16:16:44Z

Fair points about IDEs. We can allow comments after the initial @. I still think we should disallow spaces in tag names though. Allowing spaces would also require allowing it in tag-expressions, and I don't want to open that can of worms.

@foo bar -> parse error
@foo bar @zap -> parse error
@foo bar @zap #comment -> parse error
@foo # bar @zap -> ['@foo']

WDYT?

mpkorstanje · 2020-01-30T16:17:43Z

I still think we should disallow spaces in tag names though.

Agreed. They're functionally not usable and IDEs don't accept them either.

aslakhellesoy · 2020-01-30T16:20:56Z

What about this though:

@foo @bar#zap

Does that become ['@foo,' '@bar#zap'] or ['@foo', '@bar']? I think the latter. No space required before the # - the comment starts as soon as it's seen.

aslakhellesoy · 2020-01-30T16:21:34Z

Here is another one:

@foo@bar

Parse error?

mpkorstanje · 2020-01-30T16:28:12Z

This looks okay:

Feature: example
  @Tag#comment @Tag@Tag
  Scenario: example

If we use \s# as the comment delimiter and @ as the tag delimiter (ruby, js and go do this) we should be alright.

'@foo @bar#zap' => ['@foo', '@bar#zap']
'@foo@bar' =>  ['@foo', '@bar']

We don't need to worry about lines with only comments as they're handled by the parser already.

'#foo @bar'  => [] //Handled by the parser
'   #bar'  => []  //Handled by the parser

mpkorstanje · 2020-01-30T17:58:08Z

Got a good solution for Java in #880. Also got rid of the scanner so it should be easier to port to other languages. There is some extra character counting going on but we can ignore that in other languages.

aslakhellesoy added the library: gherkin label Oct 16, 2019

stale bot added the ⌛ stale Will soon be closed by stalebot unless there is activity label Dec 15, 2019

mpkorstanje added 🐛 bug Defect / Bug 🧷 pinned Tells Stalebot not to close this issue and removed ⌛ stale Will soon be closed by stalebot unless there is activity labels Dec 15, 2019

mpkorstanje added the language: java label Jan 19, 2020

mpkorstanje mentioned this issue Jan 30, 2020

gherkin: Fix parsing of commented tags #880

Merged

11 tasks

mpkorstanje closed this as completed in #880 Feb 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gherkin: Commented Cucumber tags still being picked up in getTags() #721

gherkin: Commented Cucumber tags still being picked up in getTags() #721

mjordanBetBright commented Oct 2, 2019

stale bot commented Dec 15, 2019

mpkorstanje commented Dec 25, 2019

mpkorstanje commented Dec 25, 2019 •

edited

Loading

mpkorstanje commented Dec 25, 2019

mpkorstanje commented Dec 25, 2019 •

edited

Loading

mpkorstanje commented Dec 25, 2019 •

edited

Loading

luke-hill commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020 •

edited

Loading

aslakhellesoy commented Jan 30, 2020 •

edited

Loading

mpkorstanje commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020

aslakhellesoy commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020 •

edited

Loading

aslakhellesoy commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020

aslakhellesoy commented Jan 30, 2020

aslakhellesoy commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020 •

edited

Loading

mpkorstanje commented Jan 30, 2020

gherkin: Commented Cucumber tags still being picked up in getTags() #721

gherkin: Commented Cucumber tags still being picked up in getTags() #721

Comments

mjordanBetBright commented Oct 2, 2019

Summary

Expected Behavior

Current Behavior

Your Environment

stale bot commented Dec 15, 2019

mpkorstanje commented Dec 25, 2019

mpkorstanje commented Dec 25, 2019 • edited Loading

mpkorstanje commented Dec 25, 2019

mpkorstanje commented Dec 25, 2019 • edited Loading

mpkorstanje commented Dec 25, 2019 • edited Loading

luke-hill commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020 • edited Loading

aslakhellesoy commented Jan 30, 2020 • edited Loading

mpkorstanje commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020

aslakhellesoy commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020 • edited Loading

aslakhellesoy commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020

aslakhellesoy commented Jan 30, 2020

aslakhellesoy commented Jan 30, 2020

mpkorstanje commented Jan 30, 2020 • edited Loading

mpkorstanje commented Jan 30, 2020

mpkorstanje commented Dec 25, 2019 •

edited

Loading

mpkorstanje commented Dec 25, 2019 •

edited

Loading

mpkorstanje commented Dec 25, 2019 •

edited

Loading

mpkorstanje commented Jan 30, 2020 •

edited

Loading

aslakhellesoy commented Jan 30, 2020 •

edited

Loading

mpkorstanje commented Jan 30, 2020 •

edited

Loading

mpkorstanje commented Jan 30, 2020 •

edited

Loading