Skip to content

Commit e9b764b

Browse files
feat: wordBoundary anchor (#62)
1 parent aa585b1 commit e9b764b

19 files changed

+274
-107
lines changed

README.md

+4-6
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,7 @@ This library allows users to create regular expressions in a structured way, mak
2121
const hexColor = /^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$/;
2222

2323
// TS Regex Builder DSL
24-
const hexDigit = charClass(
25-
charRange('a', 'f'),
26-
charRange('A', 'F'),
27-
charRange('0', '9'),
28-
);
24+
const hexDigit = charClass(charRange('a', 'f'), charRange('A', 'F'), charRange('0', '9'));
2925

3026
const hexColor = buildRegExp([
3127
startOfString,
@@ -66,13 +62,15 @@ const regex = buildRegExp(['Hello ', capture(oneOrMore(word))]);
6662
TS Regex Builder allows you to build complex regular expressions using domain-specific language.
6763

6864
Terminology:
65+
6966
- regex construct (`RegexConstruct`) - common name for all regex constructs like character classes, quantifiers, and anchors.
7067
- regex element (`RegexElement`) - a fundamental building block of a regular expression, defined as either a regex construct, a string, or `RegExp` literal (`/.../`).
7168
- regex sequence (`RegexSequence`) - a sequence of regex elements forming a regular expression. For developer convenience, it also accepts a single element instead of an array.
7269

7370
Most of the regex constructs accept a regex sequence as their argument.
7471

7572
Examples of sequences:
73+
7674
- single element (construct): `capture('Hello')`
7775
- single element (string): `'Hello'`
7876
- single element (`RegExp` literal): `/Hello/`
@@ -152,6 +150,7 @@ See [Character Classes API doc](./docs/API.md##character-classes) for more info.
152150
| --------------- | ------------ | ------------------------------------------------------------------------ |
153151
| `startOfString` | `^` | Match the start of the string (or the start of a line in multiline mode) |
154152
| `endOfString` | `$` | Match the end of the string (or the end of a line in multiline mode) |
153+
| `wordBoundary` | `\b` | Match the start or end of a word without consuming characters |
155154

156155
See [Anchors API doc](./docs/API.md#anchors) for more info.
157156

@@ -182,7 +181,6 @@ TS Regex Builder is inspired by [Swift Regex Builder API](https://developer.appl
182181
- [Swift Regex Builder API docs](https://developer.apple.com/documentation/regexbuilder)
183182
- [Swift Evolution 351: Regex Builder DSL](https://github.com/apple/swift-evolution/blob/main/proposals/0351-regex-builder.md)
184183

185-
186184
---
187185

188186
Made with [create-react-native-library](https://github.com/callstack/react-native-builder-bob)

docs/API.md

+42-25
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ Fundamental building blocks of a regular expression, defined as either a regex c
1414

1515
The common type for all regex constructs like character classes, quantifiers, and anchors. You should not need to use this type directly, it is returned by all regex construct functions.
1616

17-
Note: the shape of the `RegexConstruct` is considered private and may change in a breaking way without a major release. We will focus on maintaining the compatibility of regexes built with
18-
17+
Note: the shape of the `RegexConstruct` is considered private and may change in a breaking way without a major release. We will focus on maintaining the compatibility of regexes built with
1918

2019
## Builder
2120

@@ -133,14 +132,15 @@ Quantifiers in regex define the number of occurrences to match for a pattern.
133132
function zeroOrMore(
134133
sequence: RegexSequence,
135134
options?: {
136-
greedy?: boolean, // default=true
137-
}
138-
): ZeroOrMore
135+
greedy?: boolean; // default=true
136+
},
137+
): ZeroOrMore;
139138
```
140139

141140
Regex syntax:
142-
* `x*` for default greedy behavior (match as many characters as possible)
143-
* `x*?` for non-greedy behavior (match as few characters as possible)
141+
142+
- `x*` for default greedy behavior (match as many characters as possible)
143+
- `x*?` for non-greedy behavior (match as few characters as possible)
144144

145145
The `zeroOrMore` quantifier matches zero or more occurrences of a given pattern, allowing a flexible number of repetitions of that element.
146146

@@ -150,14 +150,15 @@ The `zeroOrMore` quantifier matches zero or more occurrences of a given pattern,
150150
function oneOrMore(
151151
sequence: RegexSequence,
152152
options?: {
153-
greedy?: boolean, // default=true
154-
}
155-
): OneOrMore
153+
greedy?: boolean; // default=true
154+
},
155+
): OneOrMore;
156156
```
157157

158158
Regex syntax:
159-
* `x+` for default greedy behavior (match as many characters as possible)
160-
* `x+?` for non-greedy behavior (match as few characters as possible)
159+
160+
- `x+` for default greedy behavior (match as many characters as possible)
161+
- `x+?` for non-greedy behavior (match as few characters as possible)
161162

162163
The `oneOrMore` quantifier matches one or more occurrences of a given pattern, allowing a flexible number of repetitions of that element.
163164

@@ -167,14 +168,15 @@ The `oneOrMore` quantifier matches one or more occurrences of a given pattern, a
167168
function optional(
168169
sequence: RegexSequence,
169170
options?: {
170-
greedy?: boolean, // default=true
171-
}
172-
): Optionally
171+
greedy?: boolean; // default=true
172+
},
173+
): Optionally;
173174
```
174175

175176
Regex syntax:
176-
* `x?` for default greedy behavior (match as many characters as possible)
177-
* `x??` for non-greedy behavior (match as few characters as possible)
177+
178+
- `x?` for default greedy behavior (match as many characters as possible)
179+
- `x??` for non-greedy behavior (match as few characters as possible)
178180

179181
The `optional` quantifier matches zero or one occurrence of a given pattern, making it optional.
180182

@@ -183,17 +185,20 @@ The `optional` quantifier matches zero or one occurrence of a given pattern, mak
183185
```ts
184186
function repeat(
185187
sequence: RegexSequence,
186-
options: number | {
187-
min: number;
188-
max?: number;
189-
greedy?: boolean; // default=true
190-
},
191-
): Repeat
188+
options:
189+
| number
190+
| {
191+
min: number;
192+
max?: number;
193+
greedy?: boolean; // default=true
194+
},
195+
): Repeat;
192196
```
193197

194198
Regex syntax:
195-
* `x{n}`, `x{min,}`, `x{min, max}` for default greedy behavior (match as many characters as possible)
196-
* `x{min,}?`, `x{min, max}?` for non-greedy behavior (match as few characters as possible)
199+
200+
- `x{n}`, `x{min,}`, `x{min, max}` for default greedy behavior (match as many characters as possible)
201+
- `x{min,}?`, `x{min, max}?` for non-greedy behavior (match as few characters as possible)
197202

198203
The `repeat` quantifier in regex matches either exactly `count` times or between `min` and `max` times. If only `min` is provided, it matches at least `min` times.
199204

@@ -301,3 +306,15 @@ const endOfString: Anchor;
301306

302307
- `startOfString` anchor matches the start of a string (or line, if multiline mode is enabled). Regex syntax: `^`.
303308
- `endOfString` anchor matches the end of a string (or line, if multiline mode is enabled). Regex syntax: `$`.
309+
310+
### Word boundary
311+
312+
```ts
313+
const wordBoundary: Anchor;
314+
const notWordBoundary: Anchor;
315+
```
316+
317+
- `wordBoundary` matches the positions where a word character is not followed or preceded by another word character, effectively indicating the start or end of a word. Regex syntax: `\b`.
318+
- `notWordBoundary` matches the positions where a word character is followed or preceded by another word character, indicating that it is not at the start or end of a word. Regex syntax: `\B`.
319+
320+
Note: word characters are letters, digits, and underscore (`_`). Other special characters like `#`, `$`, etc are not considered word characters.

docs/Examples.md

+62-13
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ const regex = buildRegExp(
4040
{ ignoreCase: true },
4141
);
4242

43-
const isValid = regex.test("#ffffff");
43+
const isValid = regex.test('#ffffff');
4444
```
4545

4646
Encoded regex: `/^#?(?:[a-f\d]{6}|[a-f\d]{3})$/i`.
@@ -70,7 +70,7 @@ const regex = buildRegExp([
7070
endOfString,
7171
]);
7272

73-
const isValid = regex.test("https://hello.github.com");
73+
const isValid = regex.test('https://hello.github.com');
7474
```
7575

7676
Encoded regex: `/^(?:(?:http|https):\/\/)?(?:(?:[a-z\d]|[a-z\d][a-z\d-]*[a-z\d])\.)+[a-z][a-z\d]+$/`.
@@ -100,7 +100,7 @@ const regex = buildRegExp(
100100
{ ignoreCase: true },
101101
);
102102

103-
const isValid = regex.test("user@example.com");
103+
const isValid = regex.test('user@example.com');
104104
```
105105

106106
Encoded regex: `/^[a-z\d._%+-]+@[a-z\d.-]+\.[a-z]{2,}$/i`.
@@ -126,7 +126,7 @@ const regex = buildRegExp([
126126
endOfString,
127127
]);
128128

129-
const isValid = regex.test("1.0e+27");
129+
const isValid = regex.test('1.0e+27');
130130
```
131131

132132
Encoded regex: `/^[+-]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?$/`.
@@ -190,6 +190,7 @@ See tests: [example-regexp.ts](../src/__tests__/example-regexp.ts).
190190
## Simple password validation
191191
192192
This regex corresponds to following password policy:
193+
193194
- at least one uppercase letter
194195
- at least one lowercase letter
195196
- at least one digit
@@ -205,16 +206,16 @@ const atLeastEightChars = /.{8,}/;
205206

206207
// Match
207208
const validPassword = buildRegExp([
208-
startOfString,
209-
atLeastOneUppercase,
210-
atLeastOneLowercase,
211-
atLeastOneDigit,
212-
atLeastOneSpecialChar,
213-
atLeastEightChars,
214-
endOfString
209+
startOfString,
210+
atLeastOneUppercase,
211+
atLeastOneLowercase,
212+
atLeastOneDigit,
213+
atLeastOneSpecialChar,
214+
atLeastEightChars,
215+
endOfString,
215216
]);
216217

217-
const isValid = regex.test("Aa$123456");
218+
const isValid = regex.test('Aa$123456');
218219
```
219220
220221
Encoded regex: `/^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[^A-Za-z0-9\s])(?:.{8,})$/`.
@@ -243,9 +244,57 @@ const currencyRegex = buildRegExp([
243244
endOfString,
244245
]);
245246

246-
const isValid = regex.test("£1,000");
247+
const isValid = regex.test('£1,000');
247248
```
248249
249250
Encoded regex: `/(?<=[$€£¥R₿])\s?\d{1,3}(?:,?\d{3})*(?:\.\d{2})?$/`.
250251
251252
See tests: [example-currency.ts](../src/__tests__/example-currency.ts).
253+
254+
## Finding specific whole words
255+
256+
Ignoring cases where given word is part of a bigger word.
257+
258+
```ts
259+
const wordsToFind = ['word', 'date'];
260+
261+
const regex = buildRegExp([
262+
wordBoundary, // match whole words only
263+
choiceOf(...wordsToFind),
264+
wordBoundary,
265+
]);
266+
267+
expect(regex).toMatchString('word');
268+
expect(regex).toMatchString('date');
269+
270+
expect(regex).not.toMatchString('sword');
271+
expect(regex).not.toMatchString('update');
272+
```
273+
274+
Encoded regex: `/\b(?:word|date)\b/`.
275+
276+
See tests: [example-find-words.ts](../src/__tests__/example-find-words.ts).
277+
278+
## Finding specific suffixes
279+
280+
Ignoring cases where given word is part of a bigger word.
281+
282+
```ts
283+
const suffixesToFind = ['acy', 'ism'];
284+
285+
const regex = buildRegExp([
286+
notWordBoundary, // match suffixes only
287+
choiceOf(...suffixesToFind),
288+
wordBoundary,
289+
]);
290+
291+
expect(regex).toMatchString('privacy ');
292+
expect(regex).toMatchString('democracy');
293+
294+
expect(regex).not.toMatchString('acy');
295+
expect(regex).not.toMatchString('ism');
296+
```
297+
298+
Encoded regex: `/\B(?:acy|ism)\b/`.
299+
300+
See tests: [example-find-suffixes.ts](../src/__tests__/example-find-suffixes.ts).
+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
import { buildRegExp, choiceOf, notWordBoundary, wordBoundary } from '..';
2+
3+
test('example: find words with suffix', () => {
4+
const suffixesToFind = ['acy', 'ism'];
5+
6+
const regex = buildRegExp([
7+
notWordBoundary, // match suffixes only
8+
choiceOf(...suffixesToFind),
9+
wordBoundary,
10+
]);
11+
12+
expect(regex).toMatchString('democracy');
13+
expect(regex).toMatchString('Bureaucracy');
14+
expect(regex).toMatchString('abc privacy ');
15+
expect(regex).toMatchString('abc dynamism');
16+
expect(regex).toMatchString('realism abc');
17+
expect(regex).toMatchString('abc modernism abc');
18+
19+
expect(regex).not.toMatchString('abc acy');
20+
expect(regex).not.toMatchString('ism abc');
21+
expect(regex).not.toMatchString('dynamisms');
22+
23+
expect(regex).toEqualRegex(/\B(?:acy|ism)\b/);
24+
});

src/__tests__/example-find-words.ts

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
import { buildRegExp, choiceOf, wordBoundary } from '..';
2+
3+
test('example: find specific words', () => {
4+
const wordsToFind = ['word', 'date'];
5+
6+
const regex = buildRegExp([
7+
wordBoundary, // match whole words only
8+
choiceOf(...wordsToFind),
9+
wordBoundary,
10+
]);
11+
12+
expect(regex).toMatchString('word');
13+
expect(regex).toMatchString('some date');
14+
expect(regex).toMatchString('date and word');
15+
16+
expect(regex).not.toMatchString('sword');
17+
expect(regex).not.toMatchString('keywords');
18+
expect(regex).not.toMatchString('words');
19+
expect(regex).not.toMatchString('update');
20+
expect(regex).not.toMatchString('dates');
21+
22+
expect(regex).toEqualRegex(/\b(?:word|date)\b/);
23+
});

0 commit comments

Comments
 (0)