Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: further renaming #45

Merged
merged 9 commits into from
Jan 19, 2024
2 changes: 1 addition & 1 deletion GUIDELINES.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@

## Implementation guidelines

1. When the user passes the text to any regex component, it should be treated as an exact string to match and not as a regex string. We might provide an escape hatch for providing raw regex string through, but the user should use it explicitly.
1. When the user passes the text to any regex construct, it should be treated as an exact string to match and not as a regex string. We might provide an escape hatch for providing raw regex string through, but the user should use it explicitly.
47 changes: 25 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ const hexDigit = charClass(
charRange('0', '9'),
);

const hexColor = buildRegex(
const hexColor = buildRegExp(
startOfString,
optionally('#'),
capture(
Expand Down Expand Up @@ -47,32 +47,35 @@ yarn add ts-regex-builder
## Basic usage

```js
import { buildRegex, capture, oneOrMore } from 'ts-regex-builder';
import { buildRegExp, capture, oneOrMore } from 'ts-regex-builder';

// /Hello (\w+)/
const regex = buildRegex(['Hello ', capture(oneOrMore(word))]);
const regex = buildRegExp(['Hello ', capture(oneOrMore(word))]);
```

## Regex domain-specific language

TS Regex Builder allows you to build complex regular expressions using domain-specific language of regex components.
TS Regex Builder allows you to build complex regular expressions using domain-specific language.

Terminology:

- regex component (e.g., `capture()`, `oneOrMore()`, `word`) - function or object representing a regex construct
- regex element (`RegexElement`) - object returned by regex components
- regex sequence (`RegexSequence`) - single regex element or string (`RegexElement | string`) or array of such elements and strings (`Array<RegexElement | string>`)
- regex construct (`RegexConstruct`) - common name for all regex constructs like character classes, quantifiers, and anchors.

Most of the regex components accept a regex sequence. Examples of sequences:
- regex element (`RegexElement`) - fundamental building block of a regular expression, defined as either a regex construct or a string.

- single string: `'Hello World'` (note: all characters will be automatically escaped in the resulting regex)
- single element: `capture('abc')`
- array of elements and strings: `['$', oneOrMore(digit)]`
- regex sequence (`RegexSequence`) - a sequence of regex elements forming a regular expression. For developer convenience it also accepts a single element instead of array.

Regex components can be composed into a complex tree:
Most of the regex constructs accept a regex sequence as their argument.

Examples of sequences:
- array of elements: `['USD', oneOrMore(digit)]`
- single construct: `capture('abc')`
- single string: `'Hello'`

Regex constructs can be composed into a tree:

```ts
const currencyAmount = buildRegex([
const currencyAmount = buildRegExp([
choiceOf(
'$',
'€',
Expand All @@ -87,14 +90,14 @@ const currencyAmount = buildRegex([

### Regex Builders

| Regex Component | Regex Pattern | Description |
| --------------------------------------- | ------------- | ----------------------------------- |
| `buildRegex(...)` | `/.../` | Create `RegExp` instance |
| `buildRegex(..., { ignoreCase: true })` | `/.../i` | Create `RegExp` instance with flags |
| Builder | Regex Pattern | Description |
| ---------------------------------------- | ------------- | ----------------------------------- |
| `buildRegExp(...)` | `/.../` | Create `RegExp` instance |
| `buildRegExp(..., { ignoreCase: true })` | `/.../i` | Create `RegExp` instance with flags |

### Components
### Regex Constructs

| Regex Component | Regex Pattern | Notes |
| Regex Construct | Regex Pattern | Notes |
| ------------------- | ------------- | ------------------------------- |
| `capture(...)` | `(...)` | Create a capture group |
| `choiceOf(x, y, z)` | `x\|y\|z` | Match one of provided sequences |
Expand All @@ -106,7 +109,7 @@ Notes:

### Quantifiers

| Regex Component | Regex Pattern | Description |
| Regex Construct | Regex Pattern | Description |
| -------------------------------- | ------------- | ------------------------------------------------- |
| `zeroOrMore(x)` | `x*` | Zero or more occurence of a pattern |
| `oneOrMore(x)` | `x+` | One or more occurence of a pattern |
Expand All @@ -119,7 +122,7 @@ All quantifiers accept sequence of elements

### Character classes

| Regex Component | Regex Pattern | Description |
| Regex Construct | Regex Pattern | Description |
| --------------------- | ------------- | ------------------------------------------- |
| `any` | `.` | Any character |
| `word` | `\w` | Word characters |
Expand All @@ -140,7 +143,7 @@ Notes:

### Anchors

| Regex Component | Regex Pattern | Description |
| Regex Construct | Regex Pattern | Description |
| --------------- | ------------- | ---------------------------------------------------------------- |
| `startOfString` | `^` | Match start of the string (or start of a line in multiline mode) |
| `endOfString` | `$` | Match end of the string (or end of a line in multiline mode) |
Expand Down
34 changes: 17 additions & 17 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,26 @@

## Builder

### `buildRegex()` function
### `buildRegExp()` function

```ts
function buildRegex(sequence: RegexSequence): RegExp;
function buildRegExp(sequence: RegexSequence): RegExp;

function buildRegex(
function buildRegExp(
sequence: RegexSequence,
flags: {
global?: boolean;
ignoreCase?: boolean;
multiline?: boolean;
hasIndices?: boolean;
sticky?: boolean;
},
sequence: RegexSequence
): RegExp;
```

## Components
## Constructs

### `capture()` component
### `capture()`

Captures, also known as capturing groups, are used to extract and store parts of the matched string for later use.

Expand All @@ -31,7 +31,7 @@ function capture(
): Capture
```

### `choiceOf()` component
### `choiceOf()`

```ts
function choiceOf(
Expand All @@ -45,31 +45,31 @@ Example: `choiceOf("color", "colour")` matches either `color` or `colour` patter

## Quantifiers

### `zeroOrMore()` component
### `zeroOrMore()`

```ts
function zeroOrMore(
sequence: RegexSequence,
): ZeroOrMore
```

### `oneOrMore()` component
### `oneOrMore()`

```ts
function oneOrMore(
sequence: RegexSequence,
): OneOrMore
```

### `optionally()` component
### `optionally()`

```ts
function optionally(
sequence: RegexSequence,
): Optionally
```

### `repeat()` component
### `repeat()`

```ts
function repeat(
Expand All @@ -96,7 +96,7 @@ const whitespace: CharacterClass;
* `digit` matches any digit.
* `whitespace` matches any whitespace character (spaces, tabs, line breaks).

### `anyOf()` component
### `anyOf()`

```ts
function anyOf(
Expand All @@ -108,7 +108,7 @@ The `anyOf` class matches any character present in the `character` string.

Example: `anyOf('aeiou')` will match either `a`, `e`, `i` `o` or `u` characters.

### `characterRange()` component
### `characterRange()`

```ts
function characterRange(
Expand All @@ -124,29 +124,29 @@ Examples:
* `characterRange('A', 'Z')` will match all uppercase characters from `a` to `z`.
* `characterRange('0', '9')` will match all digit characters from `0` to `9`.

### `characterClass()` component
### `characterClass()`

```ts
function characterClass(
...elements: CharacterClass[],
): CharacterClass
```

The `characterClass` component creates a new character class that includes all passed character classes.
The `characterClass` construct creates a new character class that includes all passed character classes.

Example:
* `characterClass(characterRange('a', 'f'), digit)` will match all lowercase hex digits (`0` to `9` and `a` to `f`).
* `characterClass(characterRange('a', 'z'), digit, anyOf("._-"))` will match any digit, lowercase latin lettet from `a` to `z`, and either of `.`, `_`, and `-` characters.

### `inverted()` component
### `inverted()`

```ts
function inverted(
element: CharacterClass,
): CharacterClass
```

The `inverted` component creates a new character class that matches any character that is not present in the passed character class.
The `inverted` construct creates a new character class that matches any character that is not present in the passed character class.

Examples:
* `inverted(digit)` matches any character that is not a digit
Expand Down
2 changes: 1 addition & 1 deletion docs/Examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ const octet = choiceOf(
);

// Match
const regex = buildRegex([
const regex = buildRegExp([
startOfString, //
repeat([octet, '.'], { count: 3 }),
octet,
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@
"quoteProps": "consistent",
"singleQuote": true,
"tabWidth": 2,
"trailingComma": "es5",
"trailingComma": "all",
"useTabs": false
}
],
Expand Down
30 changes: 15 additions & 15 deletions src/__tests__/builder.test.ts
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
import { buildRegex } from '../builders';
import { buildRegExp } from '../builders';

test('`regexBuilder` flags', () => {
expect(buildRegex('a').flags).toBe('');
expect(buildRegex('a', {}).flags).toBe('');
expect(buildRegExp('a').flags).toBe('');
expect(buildRegExp('a', {}).flags).toBe('');

expect(buildRegex('a', { global: true }).flags).toBe('g');
expect(buildRegex('a', { global: false }).flags).toBe('');
expect(buildRegExp('a', { global: true }).flags).toBe('g');
expect(buildRegExp('a', { global: false }).flags).toBe('');

expect(buildRegex('a', { ignoreCase: true }).flags).toBe('i');
expect(buildRegex('a', { ignoreCase: false }).flags).toBe('');
expect(buildRegExp('a', { ignoreCase: true }).flags).toBe('i');
expect(buildRegExp('a', { ignoreCase: false }).flags).toBe('');

expect(buildRegex('a', { multiline: true }).flags).toBe('m');
expect(buildRegex('a', { multiline: false }).flags).toBe('');
expect(buildRegExp('a', { multiline: true }).flags).toBe('m');
expect(buildRegExp('a', { multiline: false }).flags).toBe('');

expect(buildRegex('a', { hasIndices: true }).flags).toBe('d');
expect(buildRegex('a', { hasIndices: false }).flags).toBe('');
expect(buildRegExp('a', { hasIndices: true }).flags).toBe('d');
expect(buildRegExp('a', { hasIndices: false }).flags).toBe('');

expect(buildRegex('a', { sticky: true }).flags).toBe('y');
expect(buildRegex('a', { sticky: false }).flags).toBe('');
expect(buildRegExp('a', { sticky: true }).flags).toBe('y');
expect(buildRegExp('a', { sticky: false }).flags).toBe('');

expect(
buildRegex('a', {
buildRegExp('a', {
global: true, //
ignoreCase: true,
multiline: false,
}).flags
}).flags,
).toBe('gi');
});
8 changes: 4 additions & 4 deletions src/__tests__/examples.test.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import {
buildRegex,
buildRegExp,
charRange,
choiceOf,
digit,
Expand All @@ -14,10 +14,10 @@ test('example: IPv4 address validator', () => {
[charRange('1', '9'), digit],
['1', repeat(digit, { count: 2 })],
['2', charRange('0', '4'), digit],
['25', charRange('0', '5')]
['25', charRange('0', '5')],
);

const regex = buildRegex([
const regex = buildRegExp([
startOfString, //
repeat([octet, '.'], { count: 3 }),
octet,
Expand All @@ -38,6 +38,6 @@ test('example: IPv4 address validator', () => {
expect(regex).not.toMatchString('255.255.255.256');

expect(regex).toHavePattern(
/^(?:(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])$/
/^(?:(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])\.){3}(?:\d|[1-9]\d|1\d{2}|2[0-4]\d|25[0-5])$/,
);
});
27 changes: 5 additions & 22 deletions src/builders.ts
Original file line number Diff line number Diff line change
@@ -1,23 +1,6 @@
import type { RegexSequence } from './types';
import type { RegexFlags, RegexSequence } from './types';
import { encodeSequence } from './encoder/encoder';
import { asNodeArray } from './utils/nodes';

export interface RegexFlags {
/** Global search. */
global?: boolean;

/** Case-insensitive search. */
ignoreCase?: boolean;

/** Allows ^ and $ to match newline characters. */
multiline?: boolean;

/** Generate indices for substring matches. */
hasIndices?: boolean;

/** Perform a "sticky" search that matches starting at the current position in the target string. */
sticky?: boolean;
}
import { ensureArray } from './utils/elements';

/**
* Generate RegExp object from elements with optional flags.
Expand All @@ -26,8 +9,8 @@ export interface RegexFlags {
* @param flags RegExp flags object
* @returns RegExp object
*/
export function buildRegex(sequence: RegexSequence, flags?: RegexFlags): RegExp {
const pattern = encodeSequence(asNodeArray(sequence)).pattern;
export function buildRegExp(sequence: RegexSequence, flags?: RegexFlags): RegExp {
const pattern = encodeSequence(ensureArray(sequence)).pattern;
const flagsString = encodeFlags(flags ?? {});
return new RegExp(pattern, flagsString);
}
Expand All @@ -38,7 +21,7 @@ export function buildRegex(sequence: RegexSequence, flags?: RegexFlags): RegExp
* @returns regex pattern string
*/
export function buildPattern(sequence: RegexSequence): string {
return encodeSequence(asNodeArray(sequence)).pattern;
return encodeSequence(ensureArray(sequence)).pattern;
}

function encodeFlags(flags: RegexFlags): string {
Expand Down
Loading