Add named groups for python #316

jsa34 · 2024-12-06T08:02:44Z

🤔 What's changed?

Expression matching now returns a tuple: the value as before and an optional name if the expression has a name or the regex uses a named capture group.

The format for the Cucumber Expression when specifying name AND type is:

"There are {step_count:int} steps."

Where the part before the colon is the name of the arg, and after is the type we are currently using for Cucumber Expressions.

The return type is change is currently a breaking change, and will have obviously need to put it probably behind some feature flag, and make the default return the old, expected single value again, and the new tuple only if enabled. I have not done this yet as I wanted to check the breadth of test cases if the feature was fully enabled for the PoC before implementing. Also, not sure how best to feature flag!

To resolve #206

⚡️ What's your motivation?

Python (in particular pytest-bdd) uses other args in the step definitions, such as fixtures and reserved args for "datatable" and "docstring", so just mapping step arg values to step args in the expressions is not reliable or user-friendly. It is a blocker currently for adopting Cucumber Expressions into the pytest-bdd framework.

🏷️ What kind of change is this?

💥 Breaking change (incompatible changes to the API)

♻️ Anything particular you want feedback on?

Approach, public API changes, format of the named args, whether it's acceptable in general!

📋 Checklist:

I agree to respect and uphold the Cucumber Community Code of Conduct
I've changed the behaviour of the code
- I have added/updated tests to cover my changes.
My change requires a change to the documentation.
- I have updated the documentation accordingly.
Users should know about my change
- I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

This text was originally generated from a template, then edited by hand. You can modify the template here.

jsa34 · 2024-12-06T08:04:10Z

This is just my first draft where I made sure that everything "works".

If/once the approach is agreed, I will still need to do tests, documentation, etc., but I thought it more prudent to share the current approach I am suggesting before doing these things!

Feedback most welcome!

I also committed quite a bit of refactoring unrelated to this PR, which I will undo and raise separate PRs for

luke-hill · 2025-01-20T19:06:45Z

The last time we've done a big change like this I think was the abolition of the Transformer proc. And the introduction of this (cucumber-expressions library), "proper" (But I could be wrong here). So I'd prefer that we go down the route of releasing this all simultaneously.

In terms of feature flags, I'm happy for it to sit behind feature flags - but I'd rather it be simple and purely from a dev POV to avoid us needing to review a leviathan PR. In other words, by all means I'm happy for it to be technically easier to work in whatever way is best, but come our next full release I'd prefer it to be entirely enabled (The major to include this does not need to be the next one - currently v19)

I'm only one of the main contributors though, so it's not entirely my decision. But when cucumber-expressions were released, they were released simultaneously - Admittedly this was a long time ago with many more different people at the helm.

TL;DR - I'm pro this change, and anti it "sitting behind feature flags when released" (But during development go for it).

davidjgoss · 2025-01-20T21:16:19Z

I don’t have a particular view on the change itself, but do I agree with @luke-hill re flags - I don’t see a lot of value in flags here vs just making it a semver major change and calling attention to it in the release notes. Cucumber implementations that use this library will be pinned to at least a minor range, and other consumers should similarly assume breaking changes in a major.

mpkorstanje

Mostly looks good to me, but I do have some remarks, nothing major though. Apologies if they're a bit scattered, I'm throwing these out as I go through the PR.

Also please do add an entry to the CHANGELOG.

mpkorstanje · 2025-01-23T13:52:15Z

python/cucumber_expressions/argument.py

+        for item in parameter_types_and_names:
+            if not isinstance(item, tuple) or len(item) != 2:
+                raise CucumberExpressionError(
+                    f"Expected a tuple of (ParameterType, Optional[str]), but got {type(item)}: {item}"


This error is very technical. What should a user do if/when they encounter this error?

Also do most users know what a tuple is?

Good point - I'll have a think - this was mainly done for my benefit when debugging!

Tuple is a common type used in Python - the type itself should make sense, but I'll review for clarity

Syntax: {param_name:param_type}
How do you represent the Optional[str] when param_type is not present? With None?
For simplicity sake, could it assume string type if param_type is omitted?
I think this error is a bit redundant because the parser should handle the errors/exceptions. Meaning that if the code is working properly you will never throw this error.

mpkorstanje · 2025-01-23T13:54:27Z

python/cucumber_expressions/expression.py

+    ) -> Tuple[Optional[str], Optional[ParameterType]]:
+        """Helper function to parse the parameter name and return group_name and parameter_type."""
+        if ":" in name:
+            group_name, parameter_type_name = [part.strip() for part in name.split(":")]


This allows for the empty group name, which is distinct from the None group name. Probably now what we want.

It might also be worth while to push this into the parser.

mpkorstanje · 2025-01-23T13:59:12Z

python/cucumber_expressions/expression_factory.py

+    def _extract_text_in_curly_brackets(string: str) -> list:
+        return CURLY_BRACKET_PATTERN.findall(string)
+
+    def is_cucumber_expression(self, expression_string: str):


I don't think this check is simple enough. The primary constraint is explaining to people what is and is not a cucumber expression. For Java I eventually settled on requiring that all regular expressions start with ^ or end with $ and that everything else is a Cucumber expression. This is both simple and unambiguous.

This helps avoid a situation where a user makes a mistake in a Cucumber expression, causing Cucumber to think it is a regular expressions and then fail because the regular expression also isn't valid and results in a very cryptic error message.

Is it worth standardising this check then across all flavours? I have no idea what we do in ruby as I've not dug into this stuff since the initial release some 4/5 years ago

The problem is that I cannot think of an easy way to distinguish a regex from a normal string in Python - the ^$ syntax isn't used, and are generally just strings.

Hence, I thought to try and identify the other way around - seeing if it's a Cucumber Expression. I realise now looking at it that just looking for curly bracket pairs as a discriminator for Cucumber Expressions doesn't fly, so this will need to be fixed.

I was a bit stuck here as I couldn't think of a reliable deterministic manner to distinguish between the two types, so input very welcome!

I think that you can still look for the ^/ $ indicators.
This is a nice rule that deterministically distinguishes cucumber-expressions from a regexp ones.

the ^$ syntax isn't used, and are generally just strings.

Pytest-BDD and other python BDD frameworks, when no specific parser is specified (default), they should check if ^/ $ are present and if so treat it as a regular expression. If not, treat it as a cucumber-expression.
This type of regexp step-def is different from using parsers.re, but I think it could be a replacement for it all together.

mpkorstanje · 2025-01-23T14:02:18Z

python/cucumber_expressions/tree_regexp.py

        if source[index + 1] != "?":
-            # (X)


Please put these comments back. It's really helpful to have a reference here.

mpkorstanje · 2025-01-23T14:08:43Z

python/cucumber_expressions/tree_regexp.py

-            # (?>X)
-            return True
-        # (?<=X) or (?<!X) else (?<name>X)
-        return source[index + 3] in ["=", "!"]


For consistency between implementations it would be good to keep this similar too. It really helps if all the implementations are similar enough that you can reference a language implementation you do know.

Though the syntax for a named group is Python-specific so it would be good to add a separate case for that and comment on it.

mpkorstanje · 2025-01-23T14:21:49Z

python/cucumber_expressions/regular_expression.py


 from cucumber_expressions.argument import Argument
 from cucumber_expressions.parameter_type import ParameterType
 from cucumber_expressions.parameter_type_registry import ParameterTypeRegistry
 from cucumber_expressions.tree_regexp import TreeRegexp

+NAMED_CAPTURE_GROUP_REGEX = re.compile(r"\?P<([^>]+)>")


This should probably be non-greedy.

Is it OK to allow white-spaces in the name of the capture group?
E.g.: ?P<one two> matches the above regexp.

mpkorstanje · 2025-01-23T14:28:12Z

♻️ Anything particular you want feedback on?
Approach, public API changes, format of the named args, whether it's acceptable in general!

I wasn't aware that named capture groups in Python differ from other languages. Where Java, Javascript and Ruby us (?<name>.*) Go and Python use (?P<name>.*). It would be good to have that clear in the code.

I'm also missing some error handling around the : character, which is now a reserved character for parameter names.

And adding to the shared test set, even if failing would be good too.

luke-hill · 2025-01-23T14:31:33Z

I have one request so far (Pre-review).

Should we have a major release where we change cucumber-expressions to ban the : character to at least make the upgrade path a little less restrictive. That to me feels like a nice "small" major release we could do (I'm happy to do the work on this).

Obviously this only holds if the agreed path for naming is as specified here - which I think most of us are happy with

mpkorstanje · 2025-01-23T14:35:40Z

I'm not sure about the release strategy yet. I don't quite have time to sponsor a Java implementation, I'm currently working using the message format everywhere and technical debt that is pulling to the surface.

It does make me favor feature toggles though.

luke-hill

One thing I wanted to ask is whether you'd want to get both named for param types and regex out simultaneously or whether you'd want / consider doing them separately.

Purely thinking about the polyglot implementation (Unless you're volunteering to write a bunch of other flavours?)

luke-hill · 2025-01-25T02:57:14Z

python/cucumber_expressions/argument.py

+        for item in parameter_types_and_names:
+            if not isinstance(item, tuple) or len(item) != 2:
+                raise CucumberExpressionError(
+                    f"Expected a tuple of (ParameterType, Optional[str]), but got {type(item)}: {item}"


Also do most users know what a tuple is?

luke-hill · 2025-01-25T02:58:44Z

python/cucumber_expressions/argument.py

-        tree_regexp: TreeRegexp, text: str, parameter_types: List
+        tree_regexp: TreeRegexp,
+        text: str,
+        parameter_types_and_names: List[Tuple[ParameterType, Optional[str]]],


should this maybe be parameter_types_with_names which then would make sense because the name could often be nil (Which feels "right")

Good point!

luke-hill · 2025-01-25T02:59:29Z

python/cucumber_expressions/argument.py

            raise CucumberExpressionError(
-                f"Group has {len(arg_groups)} capture groups, but there were {len(parameter_types)} parameter types"
+                f"Group has {len(arg_groups)} capture groups, but there were {param_count} parameter types/names"


think the ending of this error shouldn't be amended - the issue is still that there were an incorrect number of parameter types (The names being present / not is irrelevant for the length issue)

luke-hill · 2025-01-25T03:02:24Z

python/cucumber_expressions/expression_factory.py

+    def _extract_text_in_curly_brackets(string: str) -> list:
+        return CURLY_BRACKET_PATTERN.findall(string)
+
+    def is_cucumber_expression(self, expression_string: str):


Is it worth standardising this check then across all flavours? I have no idea what we do in ruby as I've not dug into this stuff since the initial release some 4/5 years ago

luke-hill · 2025-01-25T03:07:16Z

python/cucumber_expressions/tree_regexp.py

+        """
+        group_name_start = index + 3
+        group_name_end = source.find(">", group_name_start)
+        return source[group_name_start:group_name_end]


My python is basically non existent, but above we're using a : b and here we're using a:b - Do they mean diff things if not maybe keep things standard?

In the previous expression the spaces around : can be removed to keep codestyle consistent.
In python if you have a list, you can slice it with the [start_pos:end_pos] operator. It does not care if you have spaces around start_pos/end_pos.

luke-hill · 2025-01-25T03:08:28Z

python/cucumber_expressions/tree_regexp.py

-    def group_builder(self):
-        return self._group_builder
+        # If it's a named group (e.g., (?P<name>...)), it's still a capturing group
+        if source[index + 2] == "P" and source[index + 3] == "<":


Earlier we use a substring over a range and here we're using 2 diff substring char matches.

In my head we should probably be using a range in all situations

I'll take a look 👍

luke-hill · 2025-01-25T03:10:17Z

python/tests/test_expression.py

+
+    def test_documents_match_arguments_with_names_and_spaces(self):
+        values = match(
+            "I have {  cuke_count : int  } cuke(s) and {gherkin_count: int} gherkin(s)",


Is this a python specific interpretation. I'm 90% sure we don't permit spaced out arguments inside the braces. but again I'd need to triple check

Good question.

I can see the removing of space between the name and type could be enforced (i.e. only allow This is a { number:int } number. and not This is a { number: int } number.

I was considering these like f-strings in python (which are a very similar concept).

See the discussion from ruff (linting and formatting tool) discussing this: astral-sh/ruff#9785 (comment)

The left and right padding whitespace is suggested as good practice for readability by one of the Python maintainers who responded to the standards the formatting should adhere to. Happy to not allow any whitespace, though!

I agree to not permit space on either side of colon. So the "middle bit" should always be name:capture-expr However I genuinely don't know about whitespace padding inside braces. I feel we should leave this comment open for people to dig into as/when they have time.

Python f-strings can use complex expressions inside { }, effectively being able to execute code logic.

If cucumber/gherkin syntax is not strict about this, I think it's a personal preference to use these extra white-spaces.
I personally dislike having the extra spaces, because my main focus is usually avoiding line-breaks, and these extra spaces can add up.
Making it strict and disallowing spaces could make the code easier to maintain, but at the same time I imagine it's not difficult to optionally allow white-spaces after { and before }.
For me, disallowing the spaces before and after : is also a bit arbitrary - the readability argument still stands and it's feasible.
With age and experience, I tend to prefer opinionated and strict syntaxes, that give you one clear way to do things right, rather than allowing multiple styles and formats.

jsa34 · 2025-01-28T15:00:44Z

♻️ Anything particular you want feedback on?
Approach, public API changes, format of the named args, whether it's acceptable in general!

I wasn't aware that named capture groups in Python differ from other languages. Where Java, Javascript and Ruby us (?<name>.*) Go and Python use (?P<name>.*). It would be good to have that clear in the code.

I'm also missing some error handling around the : character, which is now a reserved character for parameter names.

And adding to the shared test set, even if failing would be good too.

Great points 👍. I'll update to reflect these

jsa34 · 2025-01-28T15:12:43Z

One thing I wanted to ask is whether you'd want to get both named for param types and regex out simultaneously or whether you'd want / consider doing them separately.

Purely thinking about the polyglot implementation (Unless you're volunteering to write a bunch of other flavours?)

Good question! Honest answer: I have no idea. I was just going for consistency but open to suggestions.

luke-hill · 2025-01-28T15:56:05Z

One thing I wanted to ask is whether you'd want to get both named for param types and regex out simultaneously or whether you'd want / consider doing them separately.
Purely thinking about the polyglot implementation (Unless you're volunteering to write a bunch of other flavours?)

Good question! Honest answer: I have no idea. I was just going for consistency but open to suggestions.

I think making : an invalid character as a small breaking change feels like a good small incremental change and I don't feel that would be a lot of work. It would also be good because we haven't done big sweeping changes so it gets people prepared for "bigger" changes e.t.c.

I'm also 99% sure it would be almost no barrier for people (I can't think of people using : inside a param type name, but I can't be 100% certain obviously.

neskk · 2025-01-31T16:46:13Z

I'm also 99% sure it would be almost no barrier for people (I can't think of people using : inside a param type name, but I can't be 100% certain obviously.

I don't know any programming language that allows : in variable/method/class names.
Even allowing spaces in the param type name seems weird to me.

I'm coming from the cucumber-vscode-extension language-service repository because I wanted to make it support pytest-bdd parsers syntax.
I actually managed to get it to recognize the syntax, but now it can't distinguish between {xpto} being a param type or a named param.
Adding the {param_name:param_type} syntax to cucumber expressions, might make it easier for all these tools to integrate and work well in Python.

luke-hill · 2025-01-31T17:44:18Z

Remember that parameter type name constructs come from a string assignment. The ruby example here takes an input set of parameters as keywords then assigns them to the various properties of the parameter type

        type = options[:type] || Object
        use_for_snippets = if_nil(options[:use_for_snippets], true)
        prefer_for_regexp_match = if_nil(options[:prefer_for_regexp_match], false)

        parameter_type = CucumberExpressions::ParameterType.new(
          options[:name], # HERE
          options[:regexp],
          type,
          options[:transformer],
          use_for_snippets,
          prefer_for_regexp_match
        )

neskk · 2025-02-03T15:24:29Z

I forgot Ruby uses : prefix to identify symbols. I'm not sure if it collides with the syntax {param_name:param_type}

Remember that parameter type name constructs come from a string assignment. The ruby example here takes an input set of parameters as keywords then assigns them to the various properties of the parameter type

I not sure I understand your point. I'm thinking param_type_name is the name of the type of the parameter, and param_name is the name of the argument that will be populated on the receiver function. E.g.:

@when("bla bla bla {arg1:param_type_1} on {arg2:param_type_1}")
def bla_bla_bla(arg1: param_type_1, arg2: param_type_1):
  // do something with arg1, arg2
  pass

luke-hill · 2025-02-03T16:17:09Z

If your parameter name was called foobar or bazbar there is no issue.

If your parameter-type name was called i_am_a_colon:colon_colon:colon and you wanted to name it as input then your new syntax would be {input:i_am_a_colon:colon_colon:colon} that would likely make anything get confused because which : is the delimiter. We need to ensure only 1 : can ever exist. So we need to ban the : from being a valid input char in the naming of a custom parameter type. This will be the breaking change we do before doing this work.

neskk · 2025-02-03T17:11:42Z

I think this is only an issue with Ruby and even then, the : must be the first char in a name, it can never be in the middle of the name.
Anyway, I hope we can make progress on this, because the Cucumber linking and autocomplete plugins for python are broken and unusable with anything other than plain text step-defs (no support for parameters).

luke-hill · 2025-02-03T17:32:42Z

As I just mentioned, this isn't only an issue with Ruby, it's a global one. So to start with we'll release an update that prevents : in the names. Below is how it would be written in ruby and it's implementation in the code.

# support/parameter_types.rb

ParameterType(
  name: "colon_colon:colon:colon",
  regexp: /(anything|regexy)/,
  transformer: ->(word) { word.to_sym }
)

# steps_steps.rb
Given('I am {colon_colon:colon:colon} in as {string}') do |arg1, user|
  # not relevant
end

This code currently works and executes. It will likely execute in many other flavours also.

First commit: add named groups for python

abdb16a

jsa34 added 5 commits December 6, 2024 08:22

Fix flake8

f9a7afb

Remove removing support for py 3.8

ea7cef3

Remove removing support for py 3.8

0efab69

Remove removing support for py 3.8

51a3c34

Fix type hint for Generator

8272b12

mpkorstanje requested review from kieran-ryan and mpkorstanje December 6, 2024 14:44

Implement trimming whitespace for named args in cucumber expression

a51f9de

kieran-ryan added the ⚡ enhancement Request for new functionality label Dec 31, 2024

mpkorstanje marked this pull request as ready for review January 17, 2025 18:31

luke-hill self-requested a review January 20, 2025 19:06

mpkorstanje requested changes Jan 23, 2025

View reviewed changes

mpkorstanje marked this pull request as draft January 23, 2025 13:49

mpkorstanje reviewed Jan 23, 2025

View reviewed changes

luke-hill reviewed Jan 25, 2025

View reviewed changes

Add named groups for python #316

Are you sure you want to change the base?

Add named groups for python #316

Conversation

jsa34 commented Dec 6, 2024 • edited Loading

🤔 What's changed?

⚡️ What's your motivation?

🏷️ What kind of change is this?

♻️ Anything particular you want feedback on?

📋 Checklist:

jsa34 commented Dec 6, 2024 • edited Loading

luke-hill commented Jan 20, 2025 • edited Loading

davidjgoss commented Jan 20, 2025

mpkorstanje left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpkorstanje Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsa34 Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

neskk Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpkorstanje commented Jan 23, 2025 • edited Loading

luke-hill commented Jan 23, 2025

mpkorstanje commented Jan 23, 2025

luke-hill left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsa34 Jan 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsa34 commented Jan 28, 2025

jsa34 commented Jan 28, 2025 • edited Loading

luke-hill commented Jan 28, 2025

neskk commented Jan 31, 2025

luke-hill commented Jan 31, 2025

neskk commented Feb 3, 2025

luke-hill commented Feb 3, 2025 • edited Loading

neskk commented Feb 3, 2025

luke-hill commented Feb 3, 2025

jsa34 commented Dec 6, 2024 •

edited

Loading

jsa34 commented Dec 6, 2024 •

edited

Loading

luke-hill commented Jan 20, 2025 •

edited

Loading

mpkorstanje left a comment •

edited

Loading

mpkorstanje Jan 23, 2025 •

edited

Loading

jsa34 Jan 28, 2025 •

edited

Loading

neskk Jan 31, 2025 •

edited

Loading

mpkorstanje commented Jan 23, 2025 •

edited

Loading

jsa34 Jan 28, 2025 •

edited

Loading

jsa34 commented Jan 28, 2025 •

edited

Loading

luke-hill commented Feb 3, 2025 •

edited

Loading