-
Notifications
You must be signed in to change notification settings - Fork 4
feat: backreferences, named capture groups, named backreferences #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
changed babel.config.js to support node version 10.0 and greater (for named capture support).
@PaulJPhilp There already is an open PR #66 (of mine) about named captures and backreferences. It's nearly finished in terms of features, I was planning to gather some critique about the API to perfect it before merging. Let's pick the best parts of both and unify them. Please take a look at it, and I will take a careful look at your PR, and let's discuss it in the following days. |
Sounds good to me.
…On Tue, Mar 12, 2024 at 4:38 PM Maciej Jastrzebski ***@***.***> wrote:
@PaulJPhilp <https://github.com/PaulJPhilp> There already is an open PR
#66 <#66> (of mine)
about named captures and backreferences. It's nearly finished in terms of
features, I was planning to gather some critique about the API to perfect
it before merging.
Let's pick the best parts of both and unify them. Please take a look at
it, and I will take a careful look at your PR, and let's discuss it in the
following days.
—
Reply to this email directly, view it on GitHub
<#71 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BA6V6ZSDCD3F3YHJAUM43SLYX5RVJAVCNFSM6AAAAABESQ7HSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJSGUZTMMJSHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@@ -74,6 +74,41 @@ Captures, also known as capturing groups, extract and store parts of the matched | |||
> [!NOTE] | |||
> TS Regex Builder does not have a construct for non-capturing groups. Such groups are implicitly added when required. E.g., `zeroOrMore(["abc"])` is encoded as `(?:abc)+`. | |||
|
|||
### `backreference()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ordinal backreferences accurately might be problematic in case of more complex expressions, nesting, etc. Therefore, I think we can drop them without loosing any functionality to the user trying to build maintainable regexes.
|
||
A backreference is a way to match the same text as previously matched by a capturing group. | ||
|
||
### `namedCapture()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've considered option of having a separate namedCapture()
construct in addtion to basic capture()
. After some prototyping and consulting I've found capture(..., { name: 'aaa' })
to be better due to improving discoverability, and following JS convetion of "config" or "options" objects.
|
||
A named capturing group is a capturing group that give a name to the group. The group's matching result can later be identified by this name. | ||
|
||
### `namedBackreference()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swift Regex Builder uses following convention for named captures/references:
let kind = Reference(Substring.self)
let regex = Capture(as: kind) {
ChoiceOf {
"CREDIT"
"DEBIT"
}
}
It has a nice feature of connecting reference straight to capturing group, instead of forcing user to repeat the name twice, once in capture
then in backreference
.
However, in such case dropping "back" prefix seems beneficial, as reference becomes "backreference" only when added to regular expression. Until it's applied to the previous part of the express ("back"), is more of reference.
import { buildRegExp, digit, endOfString, namedCapture, repeat, startOfString } from '..'; | ||
|
||
// Example: dateRegex | ||
const dateRegex = /^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice one, I'll add this example.
|
||
const usernameRegex = buildRegExp([startOfString, username, endOfString]); | ||
|
||
test('Matching the Username component.', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this with existing email example, as there quite similar.
Note: I am planning to add some frequently used patterns (URL, email, maybe hashtags, etc). So that each user does not have to define them by hand. I will soon spec-out this feature. I invite you to join in if you have capacity to work for that.
|
||
describe('namedCapture RegEx matching', () => { | ||
test('`named-capture` pattern', () => { | ||
expect(namedCapture('a', 'abba')).toEqualRegex(/(?<abba>a)/); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When seeing namedCapture('a', 'abba')
it's hard to tell which part is the matched pattern and which one is the name of capturing group.
Agreed. Your API is an improvement.
…On Wed, Mar 13, 2024 at 5:36 PM Maciej Jastrzebski ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/constructs/__tests__/named-capture.test.tsx
<#71 (comment)>
:
> + expect(match).not.toBeNull();
+ expect(match?.groups?.group1).toBe('ab');
+ expect(match?.groups?.group2).toBe('b');
+ });
+
+ it('should handle nested named capture groups', () => {
+ const regex = buildRegExp(namedCapture(['a', namedCapture('b', 'group2')], 'group1'));
+ const match = regex.exec('ab');
+ expect(match?.groups?.group1).toBe('ab');
+ expect(match?.groups?.group2).toBe('b');
+ });
+});
+
+describe('namedCapture RegEx matching', () => {
+ test('`named-capture` pattern', () => {
+ expect(namedCapture('a', 'abba')).toEqualRegex(/(?<abba>a)/);
When seeing namedCapture('a', 'abba') it's hard to tell which part is the
matched pattern and which one is the name of capturing group.
—
Reply to this email directly, view it on GitHub
<#71 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BA6V6ZU2LPRNOYU73RZBTETYYDBEZAVCNFSM6AAAAABESQ7HSGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMZVGI2TSNBTGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
name: string; | ||
} | ||
|
||
export function namedBackreference(groupName: string): NamedBackreference { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #66 I proposed an option to skip the name
, in which case Regex Builder would autogenerate a brief name for the user (ref-1
, etc). That matches Swift Regex Builder which defines Reference
without name parameter at all. Not sure if this is worth it. wdyt?
I don't see a use case for it. If I don't specify the name, I can use the
group number (\1).
…On Wed, Mar 13, 2024 at 5:38 PM Maciej Jastrzebski ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/constructs/named-backreference.ts
<#71 (comment)>
:
> @@ -0,0 +1,24 @@
+//import { encodeSequence } from '../encoder/encoder';
+import type { EncodeResult } from '../encoder/types';
+//import { ensureArray } from '../utils/elements';
+import type { RegexConstruct } from '../types';
+
+export interface NamedBackreference extends RegexConstruct {
+ type: 'named-backreference';
+ name: string;
+}
+
+export function namedBackreference(groupName: string): NamedBackreference {
In #66 <#66> I proposed
an option to skip the name, in which case Regex Builder would
autogenerate a brief name for the user (ref-1, etc). That matches Swift
Regex Builder which defines Reference without name parameter at all. Not
sure if this is worth it. wdyt?
—
Reply to this email directly, view it on GitHub
<#71 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BA6V6ZXSLOBM2N3OCZTJFFDYYDBOLAVCNFSM6AAAAABESQ7HSGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMZVGI3DENJZGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
changed babel.config.js to support node version 10.0 and greater (for named capture support).
Summary
Added support for named-capture, backreference, name-backreference. All three functions are as defined in the Swift RegEx Builder.
I needed to make a change to babel.config.js because the default config didn't support named capture groups. This was the toughest bug I've tracked down in quite a while. I'm not confident that my solution is optimal and would appreciate a second set of eyes.
Test plan
Added unit tests for each of the new functions.
Add a couple of new examples showing how to use new functions in typical use cases.