-
-
Notifications
You must be signed in to change notification settings - Fork 398
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(parsing): fix red parse failures in block contents and refactor p…
…arser (#88) * fix(parsing): fix red parse failures in block contents and refactor parser The parse failures were because lines beginning with punctuation such as `{` were neither word characters (`\w`) or space characters (`\s`). I fixed that with the `any-char` rule. The `any-char` rule matches `\w\W` instead of `.` to ensure it also matches newlines, no matter how Clojure and ClojureScript’s regexes work. This is desired in normal text, but probably not in links and hash-tags. That’s one reason for my comment about planning to move away from the `any-chars` rule. I could have fixed it by just changing the `c` regexp to `#'(\\w|\\W)+'`, but that would lead to other problems down the line. See https://github.com/Engelberg/instaparse#regular-expressions-a-word-of-warning. The downside of doing it repetition in the parser is that I need to combine the individually-matched characters in the transformer later, with `join` and `combine-adjacent-strings`. Other changes: - Use clearer names for the parts of the parser - Use `defparser` to precompile the parser for ClojureScript (following the advice in https://github.com/Engelberg/instaparse#defparser) - Add classes to output elements to aid in future styling and debugging - Move the block-ref parsing rule closer to the block-link parsing rule - Link to the Instaparse docs to make contributing to this file easier * feat: parse simple bold formatting in blocks This is Bardia’s code from PR #87 redone to work after my refactoring. I say “simple” bold formatting because it doesn’t handle other formatting nested in bold. Roam supports italics but not every type of formatting in bold. * refactor(parsing): make the parser testable and add a few tests The tests run with Clojure, not ClojureScript, so the code to test had to be moved to a `.cljc` file to be available in both environments. Only the parsing code, not the transforming/rendering code, was moved into the new file. That’s because the rendering code uses the re-frame and reitit libraries to return its value, so it can’t be tested without mocking them somehow. I’m not sure if testing rendered HTML is even desirable (though testing other aspects of the transformation might be nice). * refactor(parsing): render bold text with a `class` for consistency and make the :any-chars transformer easier to read by adding whitespace
- Loading branch information
Showing
8 changed files
with
143 additions
and
59 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
(ns athens.parse-transform-helper) | ||
|
||
|
||
(defn combine-adjacent-strings | ||
"In a sequence of strings mixed with other values, returns the same sequence with adjacent strings concatenated. | ||
(If the sequence contains only strings, use clojure.string/join instead.)" | ||
[coll] | ||
(reduce | ||
(fn [elements-so-far elmt] | ||
(if (and (string? elmt) (string? (peek elements-so-far))) | ||
(let [previous-elements (pop elements-so-far) | ||
combined-last-string (str (peek elements-so-far) elmt)] | ||
(conj previous-elements combined-last-string)) | ||
(conj elements-so-far elmt))) | ||
[] | ||
coll)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
(ns athens.parser | ||
(:require | ||
#?(:cljs [instaparse.core :as insta :refer-macros [defparser]] | ||
:clj [instaparse.core :as insta :refer [defparser]]))) | ||
|
||
|
||
(declare block-parser) | ||
|
||
|
||
;; Instaparse docs: https://github.com/Engelberg/instaparse#readme | ||
|
||
(defparser block-parser | ||
"(* This first rule is the top-level one. *) | ||
block = ( syntax-in-block / any-char )* | ||
(* `/` ordered alternation is used to, for example, try to interpret a string beginning with '[[' as a block-link before interpreting it as raw characters. *) | ||
<syntax-in-block> = (block-link | block-ref | hashtag | bold) | ||
block-link = <'[['> any-chars <']]'> | ||
block-ref = <'(('> any-chars <'))'> | ||
hashtag = <'#'> any-chars | <'#'> <'[['> any-chars <']]'> | ||
bold = <'**'> any-chars <'**'> | ||
(* It’s useful to extract this rule because its transform joins the individual characters everywhere it’s used. *) | ||
(* However, I think in many cases a more specific rule can be used. So we will migrate away from uses of this rule. *) | ||
any-chars = any-char+ | ||
<any-char> = #'\\w|\\W' | ||
") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
(ns athens.parse-renderer | ||
(:require | ||
[athens.parse-transform-helper :refer [combine-adjacent-strings]] | ||
[athens.parser :as parser] | ||
[instaparse.core :as insta] | ||
[re-frame.core :refer [subscribe]] | ||
[reitit.frontend.easy :as rfee])) | ||
|
||
|
||
(declare parse-and-render) | ||
|
||
|
||
;; Instaparse transforming docs: https://github.com/Engelberg/instaparse#transforming-the-tree | ||
|
||
(defn transform | ||
"Transforms Instaparse output to Hiccup." | ||
[tree] | ||
(insta/transform | ||
{:block (fn [& raw-contents] | ||
;; use combine-adjacent-strings to collapse individual characters from any-char into one string | ||
(let [collapsed-contents (combine-adjacent-strings raw-contents)] | ||
(concat [:span {:class "block"}] collapsed-contents))) | ||
:any-chars (fn [& chars] | ||
(clojure.string/join chars)) | ||
:block-link (fn [title] | ||
(let [id (subscribe [:block/uid [:node/title title]])] | ||
[:span {:class "block-link"} | ||
[:span {:style {:color "gray"}} "[["] | ||
[:a {:href (rfee/href :page {:id (:block/uid @id)}) | ||
:style {:text-decoration "none" :color "dodgerblue"}} title] | ||
[:span {:style {:color "gray"}} "]]"]])) | ||
:block-ref (fn [id] | ||
(let [string (subscribe [:block/string [:block/uid id]])] | ||
[:span {:class "block-ref" | ||
:style {:font-size "0.9em" :border-bottom "1px solid gray"}} | ||
[:a {:href (rfee/href :page {:id id})} (parse-and-render (:block/string @string))]])) | ||
:hashtag (fn [tag-name] | ||
(let [id (subscribe [:block/uid [:node/title tag-name]])] | ||
[:a {:class "hashtag" | ||
:style {:color "gray" :text-decoration "none" :font-weight "bold"} | ||
:href (rfee/href :page {:id (:block/uid @id)})} | ||
(str "#" tag-name)])) | ||
:bold (fn [text] | ||
[:strong {:class "bold"} text])} | ||
tree)) | ||
|
||
|
||
(defn parse-and-render | ||
"Converts a string of block syntax to Hiccup, with fallback formatting if it can’t be parsed." | ||
[string] | ||
(let [result (parser/block-parser string)] | ||
(if (insta/failure? result) | ||
[:span | ||
{:content-editable true | ||
:title (pr-str (insta/get-failure result)) | ||
:style {:color "red"}} | ||
string] | ||
[:span | ||
{:content-editable true} | ||
(vec (transform result))]))) |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
(ns athens.parse-transform-helper-test | ||
(:require | ||
[athens.parse-transform-helper :refer [combine-adjacent-strings]] | ||
[clojure.test :refer [deftest is are]])) | ||
|
||
|
||
(deftest combine-adjacent-strings-tests | ||
(are [x y] (= x (combine-adjacent-strings y)) | ||
[] | ||
, [] | ||
["some text"] | ||
, ["some" " " "text"] | ||
["some text" [:link] "around a link"] | ||
, ["some" " " "text" [:link] "around " "a link"] | ||
[{:something nil} "more text" [:link] "between elements" 39] | ||
, [{:something nil} "more" " " "text" [:link] "between" " " "elements" 39] | ||
[{:a 1 :b 2} 3 ["leave" "intact"]] | ||
, [{:a 1 :b 2} 3 ["leave" "intact"]])) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
(ns athens.parser-test | ||
(:require | ||
[athens.parser :refer [block-parser]] | ||
[clojure.test :refer [deftest is]])) | ||
|
||
|
||
(deftest block-parser-tests | ||
(is (= [:block] (block-parser ""))) | ||
(is (= [:block "O" "K" "?" " " "Y" "e" "s" "."] (block-parser "OK? Yes."))) | ||
(is (= [:block [:block-link [:any-chars "l" "i" "n" "k"]]] (block-parser "[[link]]"))) | ||
(is (= [:block "[" "[" "t" "e" "x" "t"] (block-parser "[[text"))) | ||
;; Not including tests for every type of syntax because I expect the trees they are parsed to to change soon. | ||
;; For now, additional tests would probably be more annoying than useful. | ||
) |