-
Hey. I want to parse a string into a sequence of symbols. (m/rewrite (vec "(Fn [(Fn [a, b] c), d] e)")
[] []
[(m/or \() & (m/cata ?sexp) \) & ?rest] ('l ?sexp (m/cata ?rest))
[(m/or \[) & (m/cata ?sexp) \] & ?rest] ('v ?sexp (m/cata ?rest))
[(m/and (m/not (m/or \( \) \[ \])) ?c) & ?rest] ['c ?c (m/cata ?rest)]
?e ('failed ?e)) (l [c \F [c \n [c \space (v (failed [\( \F \n \space \[ \a \, \space \b]) [c \space [c \c []]])]]] [c \, [c \space [c \d (failed [\] \space \e \)])]]]) Here, I'm trying to match an open paren, something else that matches the pattern, a closing paren, and then either something else that matches the pattern or nothing at all. I think it'll try to put as much as possible in |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
There are many ways to solve this problem and one of the ways — when performance is not important — is to use a step-wise approach in the spirit of an abstract machine. With this approach we utilize a State object and Step function. The Step function is used to map a State, input, to another State, output. The State object contains a string to be parsed s; an instruction, k, for what to do when s alone cannot be mapped to an output state; and a stack of values v which hold all of the parsed values. If the output of Step function is equal to its input, then the output state is terminal. Step = State → State ImplementationTo get started we, of course, need the necessary (require '[meander.epsilon :as m])
(require '[meander.strategy.epsilon :as m*) For this implementation we'll be using a fixed point operator which is available in the First, we'll need a function for creating a brand new State from a string. (defn make-state [s]
{:pre [(string? s)]}
[s [:void] []]) In the literature about abstract machines, this function is sometimes called "inject". Next is our Step function. (defn step [state]
(m/rewrite state
;; Error.
[_ [:error & _] _ :as ?state]
?state
;; Atom.
[(m/re #"(?s)([^\s[\[\]\(\)],]+)(.*)" [_ ?text ?rest]) ?k ?stack]
[?rest ?k [& ?stack (m/app read-string ?text)]]
;; -----------------^^^^^^^^^^^^^^^^^^^^^^^^^ Push the atom.
;; Whitespace.
[(m/re #"(?s)(?:\s+|,)(.*)" [_ ?rest]) ?k ?stack]
[?rest ?k ?stack]
;; List rules
;; ----------
;; List Start
[(m/re #"(?s)\((.*)" [_ ?rest]) ?k ?stack]
[?rest [:make-list ?k ?stack] []]
;; List End
[(m/re #"(?s)\)(.*)" [_ ?rest]) [:make-list ?k ?old-stack] ?stack]
[?rest ?k [& ?old-stack (& ?stack)]]
;; ---------------------^^^^^^^^^^ Push the list.
;; Optional integrity check.
;;[(m/re #"(?s)\)(.*)" [?s ?rest]) ?k ?stack]
;;[?s [:error ?k "Invalid state"] ?stack]
["" [:make-list & _ :as ?k] ?stack]
["" [:error ?k "Unclosed list"] ?stack]
;; Vector rules
;; ------------
;; Vector Start
[(m/re #"(?s)\[(.*)" [_ ?rest]) ?k ?stack]
[?rest [:make-vector ?k ?stack] []]
;; Vector End
[(m/re #"(?s)\](.*)" [_ ?rest]) [:make-vector ?k ?old-stack] ?stack]
[?rest ?k [& ?old-stack ?stack]]
;; ---------------------^^^^^^ Push the vector.
;; Optional integrity check.
;;[(m/re #"(?s)\](.*)" [?s ?rest]) ?k ?stack]
;;[?s [:error ?k "Invalid state"] ?stack]
["" [:make-vector & _ :as ?k] ?stack]
["" [:error ?k "Unclosed vector"] ?stack]
?state
?state)) The first and final rules are our terminal states. If we're ever in situation where our k value of our input state is an error, return the original input e.g. it is a terminal state. If we're in a state with no definition, it is also terminal. The structure of the second rule is repeated throughout the many of the rules which follow. We use To understand the remaining rules, I'll explain the "Atom", "List Start", and "List End" rules. By understanding these rules the others should become clear since they are very similar. The "Atom" RuleThe left side of the "Atom" rule says if s begins with a non-zero length sequence of characters that are not white space, not a vector or list delimiter, and not a special character, then capture that portion of s as Example(step (make-state "foo bar"))
;; =>
[" bar" [:void] [foo]] The "List Start" RuleThe left side of the "List Start" rule says if s begins with Example(-> (make-state "(foo bar)")
;; [S K V ]
step ;; => ["foo bar)" [:make-list [:void] []] [] ]
step ;; => [" bar)" [:make-list [:void] []] [foo] ]
step ;; => ["bar)" [:make-list [:void] []] [foo] ]
step) ;; => [")" [:make-list [:void] []] [foo bar]] The "List End" RuleThe left side of the "List End" rule matches when s begins with Example(-> (make-state "(foo bar)")
;; [S K V ]
step ;; => ["foo bar)" [:make-list [:void] []] [] ]
step ;; => [" bar)" [:make-list [:void] []] [foo] ]
step ;; => ["bar)" [:make-list [:void] []] [foo] ]
step ;; => [")" [:make-list [:void] []] [foo bar]]
step) ;; => ["" [:void] [(foo bar)]] Final touchesLets define a function called (def execute
(m*/fix #'step)) I'm using Clojure's var notation here brevity. (execute (make-state "(foo (bar baz) quux)"))
;; =>
["" [:void] [(foo (bar baz) quux)]] Finally, we can tie it all together with a (defn string->sexp
[s]
{:pre [(string? s)]}
(m/match (execute (make-state s))
[_ [:error _ ?message] _]
(throw (ex-info {} ?message))
[_ _ ?stack]
?stack)) And try it out. (let [input-string "(Fn [(Fn [a, b] c), d] e)
(bar baz quux)
[1 2 3 4]
(defn execute-string \"foo\"
[s]
(execute (make-state s)))
"]
(string->sexp input-string))
;;=>
[(Fn [(Fn [a b] c) d] e)
(bar baz quux)
[1 2 3 4]
(defn execute-string "foo" [s] (execute (make-state s)))] I hope this is helpful or, at the very least, interesting. I find this approach and variations of it useful whenever I start on a problem like this. If you have encountered descriptions of abstract machines like CESK or SECD, this should look familiar. If not, I think its worthwhile to know a bit about them and the implementation and description of this abstract machine should be instructive. Meander was partly built to make writing abstract machines easy after all. Among the benefits of this approach, in my opinion, is its adaptability. For example, if we combine (let [input-string "(foo bar baz)"]
(take 10 (iterate step (make-state input-string))))
;;=>
(["(foo bar baz)" [:void] []]
["foo bar baz)" [:make-list [:void] []] []]
[" bar baz)" [:make-list [:void] []] [foo]]
["bar baz)" [:make-list [:void] []] [foo]]
[" baz)" [:make-list [:void] []] [foo bar]]
["baz)" [:make-list [:void] []] [foo bar]]
[")" [:make-list [:void] []] [foo bar baz]]
["" [:void] [(foo bar baz)]]
["" [:void] [(foo bar baz)]]
["" [:void] [(foo bar baz)]]) Anyway, try toying around with this and making it into something you like. Ask questions if you have them. |
Beta Was this translation helpful? Give feedback.
-
Just for fun I re-implemented this with a state machine approach, with clj-statecharts. The idea is 99% the same with the approach described by @noprompt above , just with another way to express the same logic under the hood. The main logic is expressed in the state machine as described in the (ns sexp-parser
(:require [statecharts.core :as fsm :refer [assign]]
[clojure.string :as str]
[meander.epsilon :as m]))
(defn append
"Add an element to the coll, whether it's a list or a vector."
[coll x]
(if (vector? coll)
(conj coll x)
(concat coll [x])))
;; Each stack entry is a two-tuple [<prev-state>, <prev-value>]
(defn pop-stack
[{:keys [stack value]
:as state} _]
(assoc state
:stack (drop-last stack)
:value (-> stack
last
second
(append value))))
(defn push-stack
[{:keys [value stack _state _prev-state]
:as state} {:keys [type]}]
(assoc state
:stack (append stack [_prev-state value])
:value (if (= type :vec-start)
[]
'())))
(defn prev-is-list?
[{:keys [stack]} _]
(-> stack
last
first
(= :list)))
(defn prev-is-vec?
[{:keys [stack]} _]
(-> stack
last
first
(= :vec)))
(defn prev-is-done?
[{:keys [stack]} _]
(empty? stack))
(defn init-value
[x]
(assign (fn [state _]
(assoc state :value x))))
;; Turn "(Fn [(Fn [a, b] c), d] e)" into '[Fn [[Fn [a b] c] d] e]
;;
(def parser-machine
(fsm/machine
{:id ::parser
:initial :idle
:context {:stack nil
:value nil}
:states
{:idle {:on {:list-start {:target :list
:actions (init-value '())}
:vec-start {:target :vec
:actions (init-value [])}
:white-space {}}}
:error {}
:done {}
:list {:on {:atom {:actions (assign (fn [state {:keys [arg]}]
(update state :value concat [arg])))}
:white-space {}
:list-start {:target :list
:actions (assign push-stack)}
:vec-start {:target :vec
:actions (assign push-stack)}
:list-end :check-stack}}
:vec {:on {:atom {:actions (assign (fn [state {:keys [arg]}]
(update state :value append arg)))}
:white-space {}
:list-start {:target :list
:actions (assign push-stack)}
:vec-start {:target :vec
:actions (assign push-stack)}
:vec-end :check-stack}}
:check-stack {:always [{:guard prev-is-list?
:target :list
:actions (assign pop-stack)}
{:guard prev-is-vec?
:target :vec
:actions (assign pop-stack)}
{:guard prev-is-done?
:target :done}
{:target :error}]}
}}))
(defn parse-token
"Get the next token with regexp."
[s]
(m/match s
(m/re #"(?s)([^\s[\[\]\(\)],]+)(.*)" [_ ?text ?rest])
[:atom (read-string ?text) ?rest]
;; Whitespace.
(m/re #"(?s)(?:\s+|,)(.*)" [_ ?rest])
[:white-space nil ?rest]
;; list start
(m/re #"(?s)\((.*)" [_ ?rest])
[:list-start nil ?rest]
;; list end
(m/re #"(?s)\)(.*)" [_ ?rest])
[:list-end nil ?rest]
;; vec start
(m/re #"(?s)\[(.*)" [_ ?rest])
[:vec-start nil ?rest]
;; vec End
(m/re #"(?s)\](.*)" [_ ?rest])
[:vec-end nil ?rest]
_
[:no-match s]))
(defn step
"Take [current-state, string], return [new-state, remaining-string]"
[[state s]]
(if (= (:_state state) :done)
[state s]
(do
(assert (not (str/blank? s)))
(let [[event-type event-arg remaining] (parse-token s)
new-state (fsm/transition parser-machine
state
{:type event-type
:arg event-arg})]
[new-state remaining]))))
(defn get-init-state
[]
(fsm/initialize parser-machine))
(comment
(->> (iterate step [(get-init-state) "(Fn [(Fn [a, b] c), d] e)"])
(take 30))
()) Obviously it's more verbose than @noprompt 's code above, mostly because the state transitions in meander are all coded with pattern matching, and events parsing and state transitions happens at the same time. But being more explicit about the events and state transitions could be helpful for reasoning about the logic, at least for me. |
Beta Was this translation helpful? Give feedback.
There are many ways to solve this problem and one of the ways — when performance is not important — is to use a step-wise approach in the spirit of an abstract machine. With this approach we utilize a State object and Step function. The Step function is used to map a State, input, to another State, output. The State object contains a string to be parsed s; an instruction, k, for what to do when s alone cannot be mapped to an output state; and a stack of values v which hold all of the parsed values. If the output of Step function is equal to its input, then the output state is terminal.
Step = State → State
State ::= [s, k, v]
s ∈ String
k ∈ { [
:void
], [:error
, k, s], [:make-list
, k, v], [:…