Optimize the E-Graph Pattern Matcher #6

0x0f0f0f · 2021-01-29T15:44:28Z

The current pattern matcher is an unefficient version of the pattern matcher in
https://www.hpl.hp.com/techreports/2003/HPL-2003-148.pdf
Adapted from https://github.com/philzook58/EGraphs.jl/
By now, the pattern matcher uses channels as generators.
This architecture should be reconsidered for efficient parallelization.

Another pattern matcher architecture, based on a small virtual machine is
http://leodemoura.github.io/files/ematching.pdf
If this solution is considered, the abstract virtual machine could be implemented as low level as possible.

0x0f0f0f · 2021-03-20T12:51:52Z

The pattern matcher now uses shared buffers between recursive calls.

0x0f0f0f · 2021-04-05T15:50:41Z

An update on the issue. The pattern matcher works fairly well, but still has some pitfalls that cause bottlenecks and need to be solved. E-matching is still the big bottleneck in the equality saturation process.

The main bottleneck in the pattern matcher is the ematchlist function. Profiling allocations shows that the code location where most bytes are allocated, is inside this function, when creating a fresh buffer of substitutions for each time the system is trying to match against a composite pattern of the form f(subpat_a, subpat_b, ...).

One can look at this flamegraph to confirm this.

This flamegraph was obtained by this line using the propositional logic system of rewrite rules (here).

There is a bad performance loss as the size of patterns grows. Let's consider this example

using Metatheory
@metatheory_init ()
using Metatheory.EGraphs
using Metatheory.Library
using Metatheory.Util
using Metatheory.EGraphs.Schedulers

Metatheory.options.printiter = true
Metatheory.options.verbose = true

function rep(x, op, n::Int)
    foldl((x, y) -> :(($op)($x, $y)), repeat([x], n))
end

rep(:a, :*, 3)

Mid = @theory begin 
    a * :ε => :ε
    :ε * a => :ε
end 

Massoc = @theory begin
    a * (b * c) => (a * b) * c
    (a * b) * c => a * (b * c) 
end 


T = [
    @rule :b*:B => :ε
    RewriteRule(Pattern(rep(:(:a), :*, 2)), Pattern(:(:ε)))
    RewriteRule(Pattern(rep(:(:b), :*, 3)), Pattern(:(:ε)))
    RewriteRule(Pattern(rep(:(:a*:b), :*, 7)), Pattern(:(:ε)))
    RewriteRule(Pattern(rep(:(:a*:b*:a*:B), :*, 5)), Pattern(:(:ε)))
]

G = Mid∪Massoc∪T
expr = :(a*b*a*a*a*b*b*b*a*B*B*B*B*a)

g = EGraph(expr)
params = SaturationParams(timeout=5)
saturate!(g, G, params)
ex = extract!(g, astsize)
rewrite(ex, Mid)

Even though the egraph grows to only 37 eclasses, 202 nodes, it takes in total 125 seconds to match against the egraph!
This is because we have some quite big patterns:
Pattern(rep(:(:a*:b*:a*:B), :*, 5)) is the pattern :a * :b * :a * :B multiplied by itself 5 times.

See the equality saturation report from this test run

┌ Info: Equality Saturation Report
│ =================
│       Stop Reason: Iteration Timeout
│       Iterations: 5
│       EGraph Size: 37 eclasses, 202 nodes
│       Total Time: (time = 125.61055936500001, bytes = 36250695656, gctime = 11.394093429)
│       Search Time: (time = 125.42570223800001, bytes = 36246605964, gctime = 11.394093429)
│       Apply Time: (time = 0.18337512299999997, bytes = 3743452, gctime = 0.0)
└       Rebuild Time: (time = 0.001482004, bytes = 346240, gctime = 0.0)

How to attack this problem?

I've tried attacking this problem in many ways.
Implementing a virtual machine based ematcher like leo de moura's is harder than predicted: since MT supports additional features such as type assertions and custom term types, MT's substitutions are much more complicated than simple pattern_var -> eclass_id maps.

Some ideas:

It is very important to not stupidly match again against some sub-patterns that have already been matched, if not necessary.
One could apply some dynamic programming technique to memoize substitutions from subpatterns
We could borrow a concept of an explicit backtracking stack when doing ematchlist
Sub(stitutions) are mutable structures! But they get copy-ed every time a substitution is yielded to an substitution buffer. Couldn't we avoid some copies?

0x0f0f0f · 2021-04-11T14:51:19Z

New pattern matcher architecture started in branch newematch2.
Thanks @philzook58 for https://www.philipzucker.com/staging-patterns/

0x0f0f0f · 2021-04-20T11:05:53Z

Merged into master

0x0f0f0f added the enhancement New feature or request label Feb 25, 2021

0x0f0f0f closed this as completed Apr 20, 2021

utensil mentioned this issue May 22, 2023

Evaluate the feasibility to reimplement galgebra in Julia pygae/GAlgebra.jl#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the E-Graph Pattern Matcher #6

Optimize the E-Graph Pattern Matcher #6

0x0f0f0f commented Jan 29, 2021

0x0f0f0f commented Mar 20, 2021

0x0f0f0f commented Apr 5, 2021

0x0f0f0f commented Apr 11, 2021

0x0f0f0f commented Apr 20, 2021

Optimize the E-Graph Pattern Matcher #6

Optimize the E-Graph Pattern Matcher #6

Comments

0x0f0f0f commented Jan 29, 2021

0x0f0f0f commented Mar 20, 2021

0x0f0f0f commented Apr 5, 2021

How to attack this problem?

0x0f0f0f commented Apr 11, 2021

0x0f0f0f commented Apr 20, 2021