Skip to content

Commit

Permalink
feat: update to v4.9.0-rc2
Browse files Browse the repository at this point in the history
  • Loading branch information
bergmannjg committed Jun 22, 2024
1 parent 3ade00b commit 5151406
Show file tree
Hide file tree
Showing 11 changed files with 64 additions and 86 deletions.
76 changes: 35 additions & 41 deletions Regex.lean
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,12 @@ This library implements part of “Basic Unicode Support” (Level 1) as specifi
Hex Notation refers to the ability to specify a Unicode code point in a regular expression
via its hexadecimal code point representation. These forms of hexadecimal notation are supported:
<pre>
\x7F hex character code (exactly two digits)
\x{10FFFF} any hex character code corresponding to a Unicode code point
\u007F hex character code (exactly four digits)
\u{7F} any hex character code corresponding to a Unicode code point
\U0000007F hex character code (exactly eight digits)
\U{7F} any hex character code corresponding to a Unicode code point
</pre>
- \x7F hex character code (exactly two digits)
- \x{10FFFF} any hex character code corresponding to a Unicode code point
- \u007F hex character code (exactly four digits)
- \u{7F} any hex character code corresponding to a Unicode code point
- \U0000007F hex character code (exactly eight digits)
- \U{7F} any hex character code corresponding to a Unicode code point
### RL1.2 Properties
Expand All @@ -69,13 +67,11 @@ and a type (`Unicode.PropertyType`) and may have values.
These forms of property notation are supported:
<pre>
\pX Unicode character class identified by a one-letter name
\p{Greek} Unicode character class given by property value
\p{Script=Greek} Unicode character class given by property name and value
\PX Negated Unicode character class identified by a one-letter name
\P{Greek} negated Unicode character class (general category or script)
</pre>
- \pX Unicode character class identified by a one-letter name
- \p{Greek} Unicode character class given by property value
- \p{Script=Greek} Unicode character class given by property name and value
- \PX Negated Unicode character class identified by a one-letter name
- \P{Greek} negated Unicode character class (general category or script)
The type `Unicode.PropertyName` contains all supported property names.
Expand All @@ -98,19 +94,19 @@ Main api for Regex library
Get captures of "∀ (n : Nat), 0 ≤ n" :
<pre>
```
def Main : IO Unit := do
let re := regex% r"^\p{Math}\s*.(?<=\\()([a-z])[^,]+,\s*(\p{Nd})\s*(\p{Math})\s*\1$"
let captures := Regex.captures "∀ (n : Nat), 0 ≤ n" re
IO.println s!"{captures}"
</pre>
```
Output is
<pre>
```
fullMatch: '∀ (n : Nat), 0 ≤ n', 0, 22
groups: #[(some ('n', 5, 6)), (some ('0', 15, 16)), (some ('≤', 17, 20))]
</pre>
```
Components of regular expression:
Expand All @@ -131,13 +127,13 @@ The performance is tested for the regular expression **a?<sup>n</sup>a<sup>n</su
the haystack a<sup>n</sup> where n means repetition i.e. a?a?aa for n = 2.
To find a match for this expression the BoundedBacktracker has to traverse the entire search space.
<table border="1" align="center">
<tr><th>n</th><th>seconds</th><th>visited values</th><th>backtracks</></tr>
<tr align="right"><td>100</td><td>0.005</td><td>20404</td><td>5050</td></tr>
<tr align="right"><td>500</td><td>0.150</td><td>502004</td><td>125250</td></tr>
<tr align="right"><td>1000</td><td>0.600</td><td>2004004</td><td>500500</td></tr>
<tr align="right"><td>2000</td><td>2.200</td><td>8008004</td><td>2001000</td></tr>
</table>
|n|seconds|visited values|backtracks|
|--|--|--|--|
|100|0.005|20404|5050|
|500|0.150|502004|125250|
|1000|0.600|2004004|500500|
|2000|2.200|8008004|2001000|
The visited values are the encoded (StateID, HaystackOffset) pairs that have been visited
by the backtracker within a single search (`BoundedBacktracker.SearchState`).
Expand All @@ -147,12 +143,11 @@ by the backtracker within a single search (`BoundedBacktracker.SearchState`).
The performance is tested for the regular expression **(?i)Xyz** and
a haystack which contains **xyz** as last element.
<table border="1" align="center">
<tr align="right"><th>size of haystack</th><th>seconds</th><th>visited values</th></tr>
<tr align="right"><td>1 kb</td><td>0.170</td><td>5010</td></tr>
<tr align="right"><td>100 kb</td><td>0.340</td><td>500010</td></tr>
<tr align="right"><td>1000 kb</td><td>1.720</td><td>5000010</td></tr>
</table>
|size of haystack|seconds|visited values|
|--|--|--|
|1 kb|0.170|5010|
|100 kb|0.340|500010|
|1000 kb|1.720|5000010|
## Inspect tool
Expand All @@ -166,23 +161,23 @@ The inspect tool command line options:
Output of *inspect ast 'a|b*
<pre>
```
Alternation
0: Literal (a|b,0,1) Verbatim 'a'
1: Literal (a|b,2,3) Verbatim 'b'
</pre>
```
Output of *inspect hir 'a|b*
<pre>
```
Alternation
0: Literal 'a'
1: Literal 'b'
</pre>
```
Output of *inspect compile 'a|b*
<pre>
```
0: UnionReverse [ 2 3 ]
1: Empty => 0
2: SparseTransitions [ 0x00-0xd7ff => 1 0xe000-0x10ffff => 1 ]
Expand All @@ -193,13 +188,12 @@ Output of *inspect compile 'a|b*
7: Empty => 8
8: capture(pid=0, group=0, slot=1) => 9
9: Match(0)
</pre>
```
Output of *inspect captures 'a|b' a*
<pre>
```
fullMatch: 'a', 0, 1
groups: #[]
</pre>
```
-/
6 changes: 3 additions & 3 deletions Regex/Compiler.lean
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ private def c_concat (hirs : Array Hir) : CompilerM ThompsonRef := do

have : sizeOf tail < sizeOf hirs := Array.sizeOf_head?_of_tail h
let hir ← tail.attach.foldlM (init := «end»)
(fun s h => do
(fun s (h : { x // x ∈ tail}) => do
have : sizeOf h.val < sizeOf tail := Array.sizeOf_lt_of_mem h.property
let hir ← c h.val
patch s hir.start
Expand Down Expand Up @@ -183,7 +183,7 @@ private def c_alt_iter (hirs : Array Hir) : CompilerM ThompsonRef := do

have : sizeOf tail' < sizeOf tail := Array.sizeOf_head?_of_tail hTail
tail'.attach.forM
(fun h => do
(fun (h : { x // x ∈ tail'}) => do
have : sizeOf h.val < sizeOf tail' := Array.sizeOf_lt_of_mem h.property
let compiled ← c h.val
patch union compiled.start
Expand Down Expand Up @@ -243,7 +243,7 @@ termination_by sizeOf hir + sizeOf n + 1

/-- Compile the given expression such that it matches at least `min` times,
but no more than `max` times.-/
private def c_bounded (hir : Hir) (min max : Nat) (greedy : Bool) (h : min ≤ max)
private def c_bounded (hir : Hir) (min max : Nat) (greedy : Bool) (_ : min ≤ max)
: CompilerM ThompsonRef := do
if h : 0 < max then
let «prefix» ← c_exactly hir min
Expand Down
6 changes: 0 additions & 6 deletions Regex/Data/Char/Basic.lean
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,6 @@ theorem succ_lt {c1 c2 : Char} (h : (c2.val.toNat - c1.val.toNat) = 1) : c1 < c2
have hx : c1.val.toNat < c2.val.toNat := by simp_all [Nat.lt_of_sub_eq_succ h]
exact hx

theorem le_trans {c1 c2 c3 : Char} (h₁ : c1 ≤ c2) (h₂ : c2 ≤ c3) : c1 ≤ c3 := by
have h : c1.val.toNat ≤ c3.val.toNat := by exact Nat.le_trans h₁ h₂
exact h

theorem min_le (c : Char) : Char.min ≤ c := by
have h : 0 ≤ c.val.toNat := by simp_all
exact h
Expand All @@ -50,8 +46,6 @@ theorem le_max (c : Char) : c ≤ Char.max := by
have hy : c.val.toNat ≤ 0x10FFFF := Nat.le_of_lt_succ hx
exact hy

theorem lt_def {a b : Char} : a < b ↔ a.val < b.val := .rfl

theorem one_le_of_lt {c1 c2 : Char} (h : c1 < c2) : 1 ≤ c2.val :=
have h : c1.val < c2.val := Char.lt_def.mp h
UInt32.one_le_of_lt h
Expand Down
22 changes: 6 additions & 16 deletions Regex/Data/UInt/Basic.lean
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,12 @@ import Regex.Data.Nat.Basic

namespace UInt32

theorem lt_def {a b : UInt32} : a < b ↔ a.val < b.val := .rfl
theorem add_def' {a b : UInt32} : UInt32.add a b = a + b := rfl

theorem le_def {a b : UInt32} : a ≤ b ↔ a.val ≤ b.val := .rfl

theorem add_def {a b : UInt32} : UInt32.add a b = a + b := rfl

theorem sub_def {a b : UInt32} : UInt32.sub a b = a - b := rfl

theorem eq_of_val_eq : ∀ {i j : UInt32}, Eq i.val j.val → Eq i j
| ⟨_, _⟩, ⟨_, _⟩, rfl => rfl

theorem val_eq_of_eq {i j : UInt32} (h : Eq i j) : Eq i.val j.val :=
h ▸ rfl
theorem sub_def' {a b : UInt32} : UInt32.sub a b = a - b := rfl

theorem val_ne_of_ne {c d : UInt32} (h : Not (Eq c d)) : Not (Eq c.val d.val) :=
fun h' => absurd (eq_of_val_eq h') h
fun h' => absurd (UInt32.eq_of_val_eq h') h

theorem one_le_of_lt {c1 c2 : UInt32} (h : c1 < c2) : 1 ≤ c2 :=
Nat.zero_lt_of_lt h
Expand All @@ -38,7 +28,7 @@ theorem ofNat_eq (n : Nat) (h : n < UInt32.size) : (UInt32.ofNat n).val = ⟨n,

theorem ofNat_add_ofNat (n m : Nat) (hn : n < UInt32.size) (hm : m < UInt32.size)
(hnm : n + m < UInt32.size) : (UInt32.ofNat n) + (UInt32.ofNat m) = UInt32.ofNat (n + m) := by
rw [← UInt32.add_def]
rw [← UInt32.add_def']
unfold UInt32.add
have hx : (UInt32.ofNat n).val = ⟨n, hn⟩ := UInt32.ofNat_eq n hn
have hy : (UInt32.ofNat m).val = ⟨m, hm⟩ := UInt32.ofNat_eq m hm
Expand All @@ -56,7 +46,7 @@ theorem ofNat_add_ofNat (n m : Nat) (hn : n < UInt32.size) (hm : m < UInt32.size

theorem toNat_add_toNat (n m : UInt32) (hnm : n.val + m.val < UInt32.size)
: n.toNat + m.toNat = (n + m).toNat := by
rw [← UInt32.add_def]
rw [← UInt32.add_def']
unfold UInt32.add
unfold UInt32.toNat
rw [Fin.add_def]
Expand All @@ -67,7 +57,7 @@ theorem toNat_sub_toNat {n m : UInt32} (hmn : m.val ≤ n.val) (h2 : n.val < UIn
: n.toNat - m.toNat = (n - m).toNat := by
have h1 : n.val - m.val < UInt32.size := Nat.le_lt_sub_lt hmn h2
have h2 : m.val < UInt32.size := Nat.lt_of_le_of_lt hmn h2
rw [← UInt32.sub_def]
rw [← UInt32.sub_def']
unfold UInt32.sub
unfold UInt32.toNat
rw [Fin.sub_def]
Expand Down
4 changes: 2 additions & 2 deletions Regex/Syntax/Ast/Parse.lean
Original file line number Diff line number Diff line change
Expand Up @@ -902,15 +902,15 @@ private def get_fixed_width (pattern : String) (ast : Ast) : Except String Nat :
| .Concat concat =>
match concat with
| ⟨_, asts⟩ =>
let width ← asts.attach.foldlM (init := 0) (fun s ast => do
let width ← asts.attach.foldlM (init := 0) (fun s (ast : { x // x ∈ asts}) => do
have : sizeOf ast.val < sizeOf asts := Array.sizeOf_lt_of_mem ast.property
let width ← get_fixed_width pattern ast.val
pure (s + width))
pure width
| .Alternation alt =>
match alt with
| ⟨_, asts⟩ =>
let widths ← asts.attach.mapM (fun ast => do
let widths ← asts.attach.mapM (fun (ast : { x // x ∈ asts}) => do
have : sizeOf ast.val < sizeOf asts := Array.sizeOf_lt_of_mem ast.property
let width ← get_fixed_width pattern ast.val
pure width)
Expand Down
8 changes: 4 additions & 4 deletions Regex/Syntax/Ast/Visitor.lean
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ private def visit_class_item_loop' {α σ: Type} [Inhabited α]
match item with
| .Bracketed ⟨_ , _, kind⟩ => visit_class_loop' β kind
| .Union ⟨_, items⟩ =>
items.attach.forM (fun item => do
items.attach.forM (fun (item : { x // x ∈ items}) => do
have : sizeOf item.val < sizeOf items := Array.sizeOf_lt_of_mem item.property
visit_class_item_loop' β item.val)
| _ => pure ()
Expand All @@ -76,7 +76,7 @@ private def visit_class_loop' {α σ: Type} [Inhabited α]
match ast with
| .Item (ClassSetItem.Bracketed ⟨_, _, kind⟩ ) => visit_class_loop' β kind
| .Item (ClassSetItem.Union ⟨_, items⟩) =>
items.attach.forM (fun item => do
items.attach.forM (fun (item : { x // x ∈ items}) => do
have : sizeOf item.val < sizeOf items := Array.sizeOf_lt_of_mem item.property
visit_class_item_loop' β item.val)
| .BinaryOp op =>
Expand Down Expand Up @@ -113,7 +113,7 @@ def visit_loop {α σ: Type} [Inhabited α] (β : Visitor α σ) (ast : Ast) : M
| .Alternation alt =>
match alt with
| .mk _ items =>
items.attach.forM (fun ast => do
items.attach.forM (fun (ast : { x // x ∈ items}) => do
have : sizeOf ast.val < sizeOf items := Array.sizeOf_lt_of_mem ast.property
visit_loop β ast.val)
| .Group g =>
Expand All @@ -122,7 +122,7 @@ def visit_loop {α σ: Type} [Inhabited α] (β : Visitor α σ) (ast : Ast) : M
visit_loop β ast'
| .Concat concat => do
have : sizeOf concat.asts < sizeOf concat := Concat.sizeOfAstsOfConcat concat
concat.asts.attach.forM (fun ast => do
concat.asts.attach.forM (fun (ast : { x // x ∈ concat.asts}) => do
have : sizeOf ast.val < sizeOf concat.asts := Array.sizeOf_lt_of_mem ast.property
visit_loop β ast.val)

Expand Down
4 changes: 2 additions & 2 deletions Regex/Syntax/Hir.lean
Original file line number Diff line number Diff line change
Expand Up @@ -221,11 +221,11 @@ def fold [Inhabited α] (hir : Hir) (f : α -> Hir -> α) (init : α): α :=
| .Repetition ⟨_, _, _, sub⟩ => fold sub f init
| .Capture _ => f init hir
| .Concat hirs => hirs.attach |> Array.foldl (init := init)
(fun a h =>
(fun a (h : { x // x ∈ hirs}) =>
have : sizeOf h.val < sizeOf hirs := Array.sizeOf_lt_of_mem h.property
fold h.val f a)
| .Alternation hirs => hirs.attach |> Array.foldl (init := init)
(fun a h =>
(fun a (h : { x // x ∈ hirs}) =>
have : sizeOf h.val < sizeOf hirs := Array.sizeOf_lt_of_mem h.property
fold h.val f a)
termination_by sizeOf hir
Expand Down
14 changes: 7 additions & 7 deletions lake-manifest.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{"version": 7,
{"version": "1.0.0",
"packagesDir": ".lake/packages",
"packages":
[{"url": "https://github.com/fgdorais/lean4-unicode-basic.git",
Expand All @@ -13,7 +13,7 @@
{"url": "https://github.com/leanprover-community/batteries",
"type": "git",
"subDir": null,
"rev": "551ff2d7dffd7af914cdbd01abbd449fe3e3d428",
"rev": "47e4cc5c5800c07d9bf232173c9941fa5bf68589",
"name": "batteries",
"manifestFile": "lake-manifest.json",
"inputRev": "main",
Expand All @@ -22,7 +22,7 @@
{"url": "https://github.com/leanprover-community/quote4",
"type": "git",
"subDir": null,
"rev": "53156671405fbbd5402ed17a79bd129b961bd8d6",
"rev": "a7bfa63f5dddbcab2d4e0569c4cac74b2585e2c6",
"name": "Qq",
"manifestFile": "lake-manifest.json",
"inputRev": "master",
Expand All @@ -31,7 +31,7 @@
{"url": "https://github.com/leanprover-community/aesop",
"type": "git",
"subDir": null,
"rev": "53ba96ad7666d4a2515292974631629b5ea5dfee",
"rev": "882561b77bd2aaa98bd8665a56821062bdb3034c",
"name": "aesop",
"manifestFile": "lake-manifest.json",
"inputRev": "master",
Expand All @@ -58,7 +58,7 @@
{"url": "https://github.com/leanprover-community/import-graph.git",
"type": "git",
"subDir": null,
"rev": "77e081815b30b0d30707e1c5b0c6a6761f7a2404",
"rev": "1588be870b9c76fe62286e8f42f0b4dafa154c96",
"name": "importGraph",
"manifestFile": "lake-manifest.json",
"inputRev": "main",
Expand All @@ -67,10 +67,10 @@
{"url": "https://github.com/leanprover-community/mathlib4",
"type": "git",
"subDir": null,
"rev": "b5eba595428809e96f3ed113bc7ba776c5f801ac",
"rev": "ab0de47a15970636867ab3bea74476a1264ffbb2",
"name": "mathlib",
"manifestFile": "lake-manifest.json",
"inputRev": "v4.8.0",
"inputRev": "ab0de47a15970636867ab3bea74476a1264ffbb2",
"inherited": false,
"configFile": "lakefile.lean"}],
"name": "Regex",
Expand Down
5 changes: 3 additions & 2 deletions lakefile.lean
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ open Lake DSL

meta if get_config? env = some "dev" then
require «doc-gen4» from git "https://github.com/leanprover/doc-gen4"
@ "c7f4ac84b973b6efd8f24ba2b006cad1b32c9c53"
@ "1f51169609a3a7c448558c3d3eb353fb355c7025"

require UnicodeBasic from git "https://github.com/fgdorais/lean4-unicode-basic.git"
@ "main"

require mathlib from git "https://github.com/leanprover-community/mathlib4" @ "v4.8.0"
require mathlib from git "https://github.com/leanprover-community/mathlib4"
@ "ab0de47a15970636867ab3bea74476a1264ffbb2"

package «Regex» {
}
Expand Down
2 changes: 1 addition & 1 deletion lean-toolchain
Original file line number Diff line number Diff line change
@@ -1 +1 @@
leanprover/lean4:v4.8.0
leanprover/lean4:v4.9.0-rc2
Loading

0 comments on commit 5151406

Please sign in to comment.