Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex match vs search #254

Closed
scop opened this issue Dec 23, 2021 · 5 comments
Closed

Regex match vs search #254

scop opened this issue Dec 23, 2021 · 5 comments
Assignees
Labels
enhancement User-facing feature enhancements
Milestone

Comments

@scop
Copy link
Contributor

scop commented Dec 23, 2021

Some rules' regular expressions are used with .search, some with .match, and the only source which one it is appears to be the gitlint source.

search: title-match-regex, body-match-regex
match: author-valid-email, ignore-by-title, ignore-by-body, ignore-body-lines, ignore-by-author-name

I think it would be good to standardize on just one of these for all regex matching. I suggest going with .search, because .match is a Python oddity not present in any other language I know of. (It's kind of ironic and unfortunate that the rules that have match in their name are the ones for which we do not invoke match, but search.)

Then again, standardizing on one would be a breaking change. If that's not acceptable or will take some time, I think we should document which method is used with each rule. I'm available to help with that or making the search/match change (if we're going with search ;)), let me know.

@sigmavirus24
Copy link
Collaborator

I think we can alternatively be smart for backwards compat until we can break it and issue deprecation log warnings until that expected release while giving folks an additional option to allow them to try the new behaviour

@jorisroovers
Copy link
Owner

Good catch, I agree with the suggestion to standardize on search.

Wrt deprecation strategy:

  • I like the idea of a global feature flag to enable the new regex behavior (e.g. enable_regex_style_search=true, let me know if you have a better suggestion)
  • We don't have any case today where gitlint logs warnings. Should be easy enough though. To state perhaps the obvious, this should only be logged for users who are actually using these rules and the warning should disappear when the enable_regex_style_search flag is set.

@jorisroovers jorisroovers added the enhancement User-facing feature enhancements label Feb 10, 2022
@jreybert

This comment was marked as off-topic.

@jorisroovers jorisroovers added this to the 0.18.0 milestone Sep 5, 2022
@jorisroovers jorisroovers moved this to Todo in gitlint Sep 5, 2022
@jorisroovers jorisroovers moved this from Todo to In Progress in gitlint Sep 30, 2022
jorisroovers added a commit that referenced this issue Sep 30, 2022
This general option will allow users to choose between regex match (=default)
and search semantics.

This commit only implements the config option, not the actual regex
semantics switching within rules.

Relates to #254
jorisroovers added a commit that referenced this issue Oct 12, 2022
- Adds a new Deprecration class that allows for a global approach to various
  deprecation scenarios.

- The new Deprecation.get_regex_method function deals specifically with the
  general.regex-style-search option and behavior.

- Updated rules that use Deprecation.get_regex_method: IgnoreByTitle,
  IgnoreByBody, IgnoreBodyLines, IgnoreByAuthorName, AuthorValidEmail

Relates to #254
@jorisroovers jorisroovers self-assigned this Oct 12, 2022
jorisroovers added a commit that referenced this issue Oct 24, 2022
This general option will allow users to choose between regex match (=default)
and search semantics.

This commit only implements the config option, not the actual regex
semantics switching within rules.

Relates to #254
jorisroovers added a commit that referenced this issue Oct 24, 2022
- Adds a new Deprecration class that allows for a global approach to various
  deprecation scenarios.

- The new Deprecation.get_regex_method function deals specifically with the
  general.regex-style-search option and behavior.

- Updated rules that use Deprecation.get_regex_method: IgnoreByTitle,
  IgnoreByBody, IgnoreBodyLines, IgnoreByAuthorName, AuthorValidEmail

Relates to #254
@jorisroovers
Copy link
Owner

jorisroovers commented Oct 24, 2022

This was implemented in #345 which will go out with 0.18.0 soonish.

Implemented behavior:

  • Rules this applies to: author-valid-email, ignore-by-title, ignore-by-body, ignore-body-lines, ignore-by-author-name

  • If users haven’t explicitly set a custom regex for these rules, no regex matching is done and no warning is logged.

  • The only exception is author-valid-email which by default verifies the validity of the specified email address. It now uses re.search when the default regex is used, and re.match when a custom regex is used.

  • If users have set a custom regex, we will warn once for each rule that uses a custom regex and still use re.match semantics. Notice that this means that if users lint multiple commits at once using --commits we will still only print the warning for each rule only once.

  • If users specify general.regex-style-search=true, we switch to using re.search semantics and no longer print a warning.

    [general]
    regex-style-search=true
    
  • This is the printed warning:

    M1 - author-valid-email: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your author-valid-email.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search
  • Docs have been updated accordingly (will be published with the release)

The general.regex-style-search flag will be enabled by default in a future gitlint release.

@jorisroovers jorisroovers moved this from In Progress to Done in gitlint Oct 24, 2022
jorisroovers added a commit that referenced this issue Nov 16, 2022
- Python 3.11 support
- Last release to support Python 3.6
- Behavior Change: In a future release, gitlint will be switching to use
  `re.search` instead of `re.match` semantics for all rules. (#254)
- gitlint no longer uses the `sh` library by default in an attempt to reduce
  external dependencies.
- `--commits` now also accepts a comma-separated list of commit hashes,
  making it possible to lint a list of non-contiguous commits without invoking
  gitlint multiple times (#283)
- Improved handling of branches that have no commits (#188)
- Support for `GITLINT_CONFIG` env variable (#189)
- Added a new `gitlint-ci` pre-commit hook, making it easier to run gitlint
  through pre-commit in CI (#191)
- Contrib Rules:
  - New `contrib-disallow-cleanup-commits` rule (#312)
  - New `contrib-allowed-authors` rule (#358)
- User Defined rules:
  - Gitlint now recognizes `fixup=amend` commits, available as
    `commit.is_fixup_amend_commit=True`
  - Gitlint now parses diff **stat** information, available in
    `commit.changed_files_stats` (#314)
- Bugfixes:
  - Use correct encoding when using `--msg-filename` parameter (#310)
  - Various documentation fixes (#244) (#263) (#266) (#294) (#295) (#347) (#364)
- Under-the-hood:
  - Dependencies updated
  - Moved to blacked for formatting
  - Fixed nasty CI issue (#298)
  - Unit tests fix (#256)
  - Vagrant box removed in favor of github dev containers (#348)
  - Removed a few lingering references to the `master` branch in favor of `main`
  - Moved roadmap and project planning to github projects

Full Release details in CHANGELOG.md.
@jorisroovers
Copy link
Owner

Released as part of 0.18.0, closing!

@jorisroovers jorisroovers moved this from Ready for Release to Released in gitlint Nov 17, 2022
rettichschnidi added a commit to husqvarnagroup/wakaama that referenced this issue Mar 1, 2023
The ignore-body-lines regex also works for re.search(). Resolving the
warning by opt-in to the new behavior.

Relevant links:
 - https://jorisroovers.com/gitlint/configuration/#regex-style-search
 - jorisroovers/gitlint#254
rettichschnidi added a commit to husqvarnagroup/wakaama that referenced this issue Mar 1, 2023
The ignore-body-lines regex also works for re.search(). Resolving the
warning by opting-in to the new behavior.

Relevant links:
 - https://jorisroovers.com/gitlint/configuration/#regex-style-search
 - jorisroovers/gitlint#254
mlasch pushed a commit to eclipse-wakaama/wakaama that referenced this issue Mar 3, 2023
The ignore-body-lines regex also works for re.search(). Resolving the
warning by opting-in to the new behavior.

Relevant links:
 - https://jorisroovers.com/gitlint/configuration/#regex-style-search
 - jorisroovers/gitlint#254
jorisroovers added a commit that referenced this issue Jul 4, 2023
This will make all rules use re.search() by default.
It's still possible to manually reverse this behavior by setting:
general.regex-style-search=False

Relates to #254
jorisroovers added a commit that referenced this issue Jul 5, 2023
This will make all rules use re.search() by default.
It's still possible to manually reverse this behavior by setting:
general.regex-style-search=False

Relates to #254
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement User-facing feature enhancements
Projects
Status: 0.18.0
Development

No branches or pull requests

4 participants