Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex schemas #317

Merged
merged 84 commits into from
Jan 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
b720f6f
wip
ikitommi Nov 13, 2020
355277b
wip
ikitommi Nov 13, 2020
619f7a1
Initial NFA code drop (from my Seqexp 'perf' branch).
Nov 24, 2020
ccc1fd6
Reimplement regex `validator`.
Nov 24, 2020
05eb30d
Seq regex parsing, first draft.
Nov 25, 2020
b4e1e5c
Make regex parser behave passably and add `malli.regex/parse(r)`.
Nov 26, 2020
99766f4
re/fn -> re/is
Nov 26, 2020
f71beba
Disable some broken stuff.
Nov 26, 2020
ebba0e0
Add some regex `explainer` sketches.
Nov 26, 2020
95fa6e7
Push `path` and `in` into ExplanatoryVM.
Nov 27, 2020
5de3895
Make regex schemas work with regular `validator` and `explainer`.
Nov 27, 2020
a18f729
Add `explain` instruction.
Nov 30, 2020
873c350
Disallow trailing seq via `end` instruction.
Nov 30, 2020
7c5a0e7
Fix ::end-of-input and ::input-remaining schema args.
Nov 30, 2020
9390282
Add missing regex validation `end` clause.
Nov 30, 2020
8d6876c
Extract `regex-validator` and `regex-explainer`.
Nov 30, 2020
dfc13b5
Improve regex LensSchema impls.
Nov 30, 2020
3a935e1
Add regex `validate` tests (imitating Seqexp tests).
Nov 30, 2020
680e669
Move bool coercion inside `exec-recognizer`.
Nov 30, 2020
ed62e6b
Add seqexp generators.
Dec 1, 2020
cd2a84b
Remove fixed FIXME.
Dec 1, 2020
75a1cf1
Move regex macros to separate namespace.
Dec 1, 2020
d392ea5
Move regex compiler to separate namespace.
Dec 1, 2020
8b32557
Make everything compile on cljs.
Dec 1, 2020
cdf92d8
Fix regex VM on cljs.
Dec 1, 2020
046169d
Add :nested schema for preventing regex schema 'inlining'.
Dec 1, 2020
7168669
Use list for regex parse stack.
Dec 2, 2020
9c329d4
Add seqexp transformers.
Dec 2, 2020
2efd016
Fix regex-transformer self-enter/leave.
Dec 3, 2020
aae9cb7
Unify encoder-regex and decoder-regex into transformer-regex.
Dec 3, 2020
c900902
opt: use ^:const
Dec 3, 2020
32d700c
Optimize regex decoder space (and time) usage.
Dec 3, 2020
ca2fb29
Add backtracking validators.
Dec 7, 2020
02db7e8
Add backtracking explainers.
Dec 7, 2020
37090c3
Add FIXME.
Dec 8, 2020
40daf68
Fix [:cat [:* ...] ...] with trampolined CPS craziness.
Dec 8, 2020
afa82c2
Reimplement regex `explainer` on trampoline.
Dec 9, 2020
9eaf3f8
Reimplement regex `transformer` on trampoline.
Dec 9, 2020
4f6127f
Regularize (hehe) regex validator, explainer and transformer generation.
Dec 10, 2020
a035f22
Memoize backtracker (TODO: `repeat`).
Dec 10, 2020
c43797a
Right-associate `cat` and `alt`.
Dec 10, 2020
33619b1
Fix TCO regression.
Dec 10, 2020
cd90d41
Update and re-enable `repeat` seqex.
Dec 11, 2020
fb613b0
Make some outdated tests and benchmarks loadable again.
Dec 11, 2020
0115ca4
Remove double obsolete impls.
Dec 11, 2020
384cb07
Share some Driver logic.
Dec 11, 2020
f523171
Fix :repeat.
Dec 11, 2020
03363c8
Add regex parsers.
Dec 11, 2020
8ee5e0d
Stop returning maps from :cat and :alt parsing.
Dec 14, 2020
5b03a19
Remove unused PikeVM code.
Dec 14, 2020
1d7a319
Restore cljs support in malli.regex.
Dec 14, 2020
28221b1
malli.regex -> malli.impl.regex
Dec 15, 2020
bf9b9b9
Add malli.impl.error to avoid dependency injections to malli.impl.regex.
Dec 15, 2020
aefc660
Add seqex validation tests to core-test. Validate ?/*/+/repeat child …
Dec 15, 2020
fede818
Check that :alt(*) has 1+ children.
Dec 15, 2020
32e0b6f
Add seqex explanation tests.
Dec 15, 2020
30beb04
Add tests for past roadblocks.
Dec 15, 2020
5bd3ad3
Remove Seqexpy tests.
Dec 15, 2020
54d66dc
Add seqex transform tests and fix consequent bugfind in :repeat trans…
Dec 15, 2020
27d20b0
Add impl.regex ns docstring.
Dec 15, 2020
b0dad37
Add seqex generator tests.
Dec 16, 2020
8ba9166
Minor seqex generator cleanups.
Dec 16, 2020
5f09f86
Fix e.g. `(validate [:repeat {:min 2, :max 2} [:* int?]] []) ;=> false`.
Dec 16, 2020
c548463
First look up pos, then fn.
Jan 7, 2021
82f52f0
Flatten cache to regain decent cljs perf. Also a simplification.
Jan 7, 2021
9d347e3
Remove outdated regex experiments.
Jan 7, 2021
7b20d92
Switch to bespoke hash set impl, hopefully improves cljs perf.
Jan 7, 2021
47bc1b6
Cache set cleanups.
Jan 8, 2021
3fd8eb7
Add missing `-parent` impls.
Jan 8, 2021
c65f91b
Sequence schema children reflection fixes.
Jan 8, 2021
b6dc8d5
Remove more outdated seqex schema experiments.
Jan 8, 2021
14a3351
Add malli.impl.util and /-tagged.
Jan 8, 2021
8a3d317
Remove unneccessary FIXME.
Jan 8, 2021
e77ff18
Commenting.
Jan 8, 2021
1c1eb94
Add seqex documentation to README
Jan 8, 2021
4e4ae75
Fix -sequence-entry-schema type name.
Jan 11, 2021
1c87a24
Remove :nested, already had :schema.
Jan 11, 2021
911709a
Actually use quadratic probing (how embarrassing).
Jan 11, 2021
76b3503
Add missing :cat* and :alt* transform tests.
Jan 14, 2021
6c48077
Read through (nonrecursive) RefSchemas in seqex validator etc. constr…
Jan 14, 2021
1cad6df
Preent recursive seqexen more conservatively.
Jan 15, 2021
0c51c3e
RegexSchema cleanups.
Jan 15, 2021
9732f87
Fix seqex generators.
Jan 15, 2021
674710d
Add int tree seqex test.
Jan 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 93 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,99 @@ You can use `:sequential` for any homogeneous Clojure sequence, `:vector` for ve
;; => false
```

Support for Heterogeneous/Regex sequences is [WIP](https://github.com/metosin/malli/issues/180).
Malli also supports sequence regexes like [Seqexp](https://github.com/cgrand/seqexp) and Spec.
The supported operators are `:cat` & `:cat*` for concatenation / sequencing

```clj
(m/validate [:cat string? int?] ["foo" 0]) ; => true

(m/validate [:cat* [:s string?] [:n int?]] ["foo" 0]) ; => true
```

`:alt` & `:alt*` for alternatives

```clj
(m/validate [:alt keyword? string?] ["foo"]) ; => true

(m/validate [:alt* [:kw keyword?] [:s string?]] ["foo"]) ; => true
```

and `:?`, `:*`, `:+` & `:repeat` for repetition:

```clj
(m/validate [:? int?] []) ; => true
(m/validate [:? int?] [1]) ; => true
(m/validate [:? int?] [1 2]) ; => false

(m/validate [:* int?] []) ; => true
(m/validate [:* int?] [1 2 3]) ; => true

(m/validate [:+ int?] []) ; => false
(m/validate [:+ int?] [1]) ; => true
(m/validate [:+ int?] [1 2 3]) ; => true

(m/validate [:repeat {:min 2, :max 4} int?] [1]) ; => false
(m/validate [:repeat {:min 2, :max 4} int?] [1 2]) ; => true
(m/validate [:repeat {:min 2, :max 4} int?] [1 2 3 4]) ; => true (:max is inclusive, as elsewhere in Malli)
(m/validate [:repeat {:min 2, :max 4} int?] [1 2 3 4 5]) ; => false
```

`:cat*` and `:alt*` allow naming the subsequences / alternatives

```clj
(m/explain [:* [:cat* [:prop string?] [:val [:alt* [:s string?] [:b boolean?]]]]]
["-server" "foo" "-verbose" 11 "-user" "joe"])
;; => {:schema [:* [:map [:prop string?] [:val [:map [:s string?] [:b boolean?]]]]],
;; :value ["-server" "foo" "-verbose" 11 "-user" "joe"],
;; :errors (#Error{:path [0 :val :s], :in [3], :schema string?, :value 11}
;; #Error{:path [0 :val :b], :in [3], :schema boolean?, :value 11})}
```

while `:cat` and `:alt` just use numeric indices for paths:

```clj
(m/explain [:* [:cat string? [:alt string? boolean?]]]
["-server" "foo" "-verbose" 11 "-user" "joe"])
;; => {:schema [:* [:cat string? [:alt string? boolean?]]],
;; :value ["-server" "foo" "-verbose" 11 "-user" "joe"],
;; :errors (#Error{:path [0 1 0], :in [3], :schema string?, :value 11}
;; #Error{:path [0 1 1], :in [3], :schema boolean?, :value 11})}
```

As all these examples show, the "seqex" operators take any non-seqex child schema to
mean a sequence of one element that matches that schema. To force that behaviour for
a seqex child `:schema` can be used:

```clj
(m/validate [:cat [:= :names] [:schema [:* string?]]
[:= :nums] [:schema [:* number?]]]
[:names ["a" "b"] :nums [1 2 3]]) ; => true

;; whereas
(m/validate [:cat [:= :names] [:* string?] [:= :nums] [:* number?]]
[:names "a" "b" :nums 1 2 3]) ; => true
```

Although a lot of effort has gone into making the seqex implementation fast

```clj
(require '[clojure.spec.alpha :as s])
(require '[criterium.core :as cc])

(let [valid? (m/validator [:* int?] (range 1000))]
(cc/quick-bench (valid? (range 1000)))) ; Execution time mean : 189,953863 µs
(let [valid? (partial s/valid? (s/* int?))]
(cc/quick-bench (valid? (range 1000)))) ; Execution time mean : 2,576905 ms
(let [valid? (partial s/valid? (s/coll-of int?))]
(cc/quick-bench (valid? (range 1000)))) ; Execution time mean : 136,599310 µs
```

it is always better to use less general tools whenever possible:

```clj
(let [valid? (m/validator [:sequential int?])]
(cc/quick-bench (valid? (range 1000)))) ; Execution time mean : 2,863314 µs
```

## String schemas

Expand Down
Loading