-
Notifications
You must be signed in to change notification settings - Fork 63
Design Wiki
These are design notes roughly in the order that we would like to deal with the challenges.
Patterns can compile themselves into the required test.
(LiteralPattern. true) ; -> (= y true)
(SeqPattern. foo) ; -> (seq? x)
(VectorPattern. foo) ; -> (vector? x)
(PredicatePattern. even?) ; -> (even? x)
Quoted symbols are simply symbols, unquoted symbols are scoped.
(let [y "a"
x "a"]
(match [y]
[x] 1)) ;;; x is scoped
;=> 1
(let [y 'x]
(match [y]
['x] 1)) ;; 'x is 'x
;=> 1
(defn match-seq [x]
(match [x]
[()] 1
[(1)] 2
[(1 _)] 3
[(1 & _)] 4))
(match-seq [])
;=> 1
(match-seq [1])
;=> 2
(match-seq [1 2])
;=> 3
(match-seq [1 2 3])
;=> 4
(def matrix (build-matrix [x y z]
[(1 2) 3 4] 1))
(specialize matrix (seq-pattern '(1 2)))
;; should produce a new matrix with a row that looks like
;; [1 2 <crash> 3 4] :occurrences [x0 x1 xr y z]
This new matrix should have a constraint that enforces left right order until the seq occurrences are cleared. Simple way, we could put metadata on the seq occurence symbols ^:seq-occurrence, if the first occurrence has this metadata we always specialize the first column.
Perhaps we should also add metadata on the occurrence so we know how to bind this occurrence in the switch node.
^{:bind-expr '(if (nil? x)
(fail-node)
(let [x0 (first x)
x (rest x)]))}
Vectors support random access, there's no reason to not use the full power of lazy pattern matching semantics on them.
(defn match-seq [x]
(match [x]
[ [] ] 1
[ [1] ] 2
[ [1 _] ] 3
[ [1 & _] ] 4))
Note that seq patterns and vector pattern can work with either datatype, but using the correct datatype will deliver considerable performance advantages.
Exploring how to implement rest pattern has shown that we should probably re-assess our current approach to seq pattern matching. It will actually make things considerably simpler. The basic problem is that we were considering seq a multi-arity constructor instead of a always of the form (cons x seq)
.
This change means that we should carefully consider what rest pattern means when used with vectors. It seems like the constructor here is (concat vector seq)
. We can pattern match using lazy evaluation semantics for everything up to the first rest pattern. We can use subvec.
[1 2 3 4 5 & rest]
[1 2 3 4 6 & rest]
[1 2 & rest]
Then the pattern matrix might expand like this:
[ [1 2] [3 4 5 & rest] ]
[ [1 2] [3 4 6 & rest ] ]
[ [1 2] rest]
;; =>
[ [1 2] [3 4 5] rest]
[ [1 2] [3 4 5] rest]
[ [1 2] rest]
This approach seems interesting.
(defmatch simple-binding [x]
[b] b))
(simple-binding 1)
;=> 1
(defmatch seq-binding [x]
[ [1 _ b] ] b))
[ [b] ] b))
(seq-binding [1 2 3])
;=> 3
(seq-binding [4])
;=> 4
You should be allowed to write the above pattern match and similar patterns like so:
(defmatch seq-binding [x]
[1 _ b] b))
[b] b))
(defmatch map-match [x]
[{:a 'foo :only [:a]}] :a0
[{:a 'foo}] :a1
[{:a 'foo :b 'bar}] :a2
[{:a 'bar :b 'foo}] :a3)
(map-match {:a 'foo} 1)
;=> :a0
(map-match {:a 'foo :bar 'wob})
;=> :a1
(map-match {:a 'foo :b 'bar})
;=> :a2
(map-match {:a 'bar :b 'foo})
;=> :a3
Say we have the following match expression:
(match x
{1 :a _ :b} :a0
{_ :a 1 :c} :a1)
This should expand out into a pattern matrix that looks like the following:
k0 k1 k2
[1 _ _ ] :a0
[_ _ 1 ] :a1
The occurrences will have the appropriate binding expression attached to them via metadata.
Consider the case of only:
(match x
{1 :a _ :b :only [:a :b]} :a0
{_ :a 1 :c :only [:a :c]} :a1)
Then we have the following pattern matrix where % represents the crash pattern:
k0 k1 k2
[1 _ % ] :a0
[1 % _ ] :a1
The crash pattern will hold the constraint. crash-pattern should in general cause a column to be tested only after columns without crash-patterns have been tested first.
My opinion about the importance of maximal sharing is beginning to change. In Maranget's paper 54 typical pattern matches were used. Decisions trees w/o maximal sharing were never larger than 1.5-2X over the code size with maximal sharing.
The best way to control code size is with heuristics for selecting optimal columns to test. For example or patterns should probably be handled last, columns with high branch factors should be scored low for selection, etc.
The only concern might be code duplication of the actual actions. We'll track how this plays out in practice. Feedback from users during alpha release will be helpful in this regard.
(defn match-seq [x]
(match [x]
[[]] 1
[[1]] 2
[[1 _]] 3
[[1 & rest]] 4))
(match-seq [])
;=> 1
(match-seq [1])
;=> 2 or 4
(match-seq [1 2])
;=> 3 or 4
(match-seq [1 2 3])
;=> 4
Logical disjunction is defined by separating patterns with a pipe.
(match [x]
[ [10] ] :a0 ;; normal row
[ [(1 | 2 | 3 | 4)] ] :a1)
These aren't patterns, rather they are a way to bind a name for a pattern that matches. For example:
(match [x]
[[1 ({_ :a} :as y)]] :a0
...)
In the code branch (represented here by :a0) we'd like to use the value bound to y, which matched. This seems like something best handled at emit-pattern time. We can just put the :as binding information on the pattern itself. This means that pattern types should always take a metadata field?
Each pattern can have one or more optional guards.
(match [x y z]
[(a :when even?) _ _] :a0
[_ (b :when [odd?, prime?]) _] :a1)
- http://adam.chlipala.net/mlcomp/
- http://learnyousomeerlang.com/syntax-in-functions
- http://learnyouahaskell.com/syntax-in-functions
Users can just extend Java classes to, say, match.core.IPLookup. Then map matching can be used on that type. So map matching + guards.
See above.
Nothing needs to be done excepts guards the actual type.
This is really the static versus dynamic problem. Currently everything we're doing is very much static in nature - compile time. In order for match to be open we will need to convert the decision tree to a DAG which can be traversed at runtime (we could use eval, but we want to explore avoiding that)
If we do have a runtime DAG we'll need to consider how many of the pattern matching concepts translate. In particular new bindings. This seems to mean that we'll need to push an environment around that has very efficient lookup times.
Seems like the DAG and the binding environment should be implemented in terms of Java Object arrays. MethodImplCache is probably a good place to look for inspiration.