-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: customizable HTML renderer #4
Conversation
Seems like a good idea to me. By the way, in cmark we handled the "change tag while keeping default rendering of the children" by making giving 'enter' and 'exit' fields to the CMARK_CUSTOM_BLOCK and CMARK_CUSTOM_INLINE nodes. So, in a filter you could add the new open tag to 'enter' and the close tag to 'exit'. |
Yeah, I am actually now having second thoughts here... At least for HTML, it seems like it is possible to more-or-less embed full HTML into djot. Basically, if we add just a single condition to the renderer like "a div with a |
462c948
to
e4b0f2a
Compare
@@ -141,6 +145,14 @@ class HTMLRenderer { | |||
} | |||
|
|||
renderAstNode(node: AstNode): void { | |||
const override = this.options.overrides?.[node.tag]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amusingly, this even type-checks that I didn't miss any cases in the visitor. This also unearthed a bunch of inconsistentcies: blockqote, hardbreak, softbreak, symbol
I think I am still 80% sure that this is a good idea, pushed something which should be good to go! |
To state the obvious: this makes HTMLRenderer a part of the public API. I think we are not at the stage where we worry too much about stability of interfaces, so I don't think this is a too big of a problem. |
I'm kind of a novice at typescript. Is there a way to designate some methods in the class as public and others as public? That could be a good idea if this becomes part of the API. |
Yeah, there's EDIT: and stuff is public by default. |
I think your Div idea above is probably too special-case. But if we wanted to pursue something like that, we could change the RawBlock and RawInline types as follows: { tag: "RawBlock",
format: string,
enter: string,
exit: string,
children: Block[] }
{ tag: "RawInline",
format: string,
enter: string,
exit: string,
children: Inline[] } The renderer would (if |
Hm, would it actually be more general than what we have today? I think the above can be simulated with one raw node for enter, one raw node for exit, and the children being siblings between the two bracketing raw nodes? |
Tried to use the API in matklad/matklad.github.io@89651fd?diff=split#diff-3de00bf2df40cd18ccb9a5c819912060859e62b1b2450dd6fa50224afd6e9b35 for my blog. It definitely gets the job done, but HTMLRenderer API feels a bit to verbose as is for the task. In particular, I think as of now the net line diff for the relevant file is 0, and that is despite the fact that before I specified everything, and now only the overriden bit. The |
Well, it's more convenient, because it's actually hard to insert multiple nodes with the current filter interface. (That should probably change, though.) |
So at this point you don't think this should be merged? |
I lean towards merging, and thinking in background about how we can make this better. Without something like this feature, I’d have to write all html myself, but I would like to re-use djot defaults as a matter of principle. |
Ok, so I am recalling an important fact: generating HTML is actually something JS/TS people tend to do fairly frequently, and they have the whole JSX thing to do that even: https://www.typescriptlang.org/docs/handbook/jsx.html I wonder if this might be the right approach here... |
Are you suggesting we should use jsx in the html module instead of just generating the HTML tags programatically? What would be the motivation? |
Not yet suggesting, but yeah, thinking about it :) Basically, I see the whole "supply user-defined template to override rendering of specific elements" as a super-important feature. Couple of years ago I used that approach for creating slide decks out of asciidoctor (source), and since then believe that that's the best UX for customizing light markup. As I think this might end up being one of the primary interfaces for people to consume djot, I want too make sure that specifying custom tempplates this way reads nice. Usually I don't really care how few characters to type and such, but in this case I do. And something like const visitor = {
image(image: Image, ctx): Node {
if (image.attributes?.["class"]?.includes("video")) {
return <video src={image.destination}></video>
}
return ctx.default(image)
}
} Does look significantly more appealing than either manual string concatenation or the So yeah, that's my primary motivation here -- to make the UX for consumers really slick. Though, we might actually end up with significantly more readale html.tsx as well. |
Agreed. What would making this work require? |
Still looking into this: first time using JSX, so no idea really. But it seems that fundamentally that's just a syncatict transformation, so you don't have to import react and such, and can just define your own functions for constructing nodes. Eg, the following self-contained thing seems to work for me (I am using deno, rather than node, so some specific spells might be different) https://gist.github.com/matklad/a84c4857782a66589c6c0f6ec0f8af99 The big question is whether we can load a jsx template from a file and eval it: JSX is a transpilation-level thing. I think that'd actually work with deno (as that I think can load TypesScript and JSX directly), but it's also important to make it work with node. While deno to node is what Djot is to Markdown in terms of overall soundless, the same relation holds for the amount of usage today as well (up to including my blog being powered by two less conventional choices, lol). |
Ok, so I think for cases where the user is using djot as a library from type script, this'll basically just work, as TypeScript supports JSX syntax natively. For using djot as a CLI utility which loads template.js file and evals it, I think we can make it work by including a JS parser. https://github.com/acornjs/acorn-jsx seem like a dependable thing (by https://github.com/marijnh of CodeMirror fame). Shipping JS parser is kinda big thing An interesting case is using djot.js as a library from JavaScript (eg, some random script tag on a web page). In this case, the user doesn't have transpilation pipeline, and we don't want to eval random stuff. I've looked at what (p)react suggests to do here, and the answer is basically " So, yeah, I think it makes a lot of sense to make the customization interface to be based on JSX APIs, and to provide JSX (transpilationnn) and htm (runtime) interface to it. A nice side benefit would be that we'll render |
OK, I think I understand how this works now. I think we can separate out two issues:
I distinguish these issues because in principle we could do 1 without 2. But obviously 2 makes sense only if 1 does. |
I think technically we might also do 2 without 1: as far as understand, there are no requirements for jsx to maintain tree shape, it can return a string directly. Though, because each individual DOM node have to evaluate to something, I think we still won’t be able to fully amortize allocations. OTOH, constructing a virtual DOM tree is the use-case JS was heavily optimized for, so I am not even sure if the impact of extra allocations would be that big.
Super fuzzy on this, but I recall hearing that it’s the opposite: asking the browser to construct a DOM tree by parsing a string is faster than constructing it manually object-by-object, which makes some sense. In object-by-object case, we continuously cross the border between JS and C++, the parsing case is just C++. |
Another considiration is our old discussion of stack overflows. The "natural" API for exensibility where a user-supplied callback returns an HTML-DOM node for a given djot node would be prone to stack overflow. OTOH, the current style I think would not be too horrible to rewrite in non-recursive way. But exposing non-recursive rendering to the user is going to be hard! We can do that with filters, because we translate |
Ah, I see that the filter's implementation is actually recursive: Lines 122 to 124 in e1138ed
|
I guess in addition to 1 and 2 it makes sense to ponder null hypothesis: rendering each djot node to HTML string. That would be the minimal API and, given that JS strings internally are already string[], might not be as slow as it would seem. |
I'm a bit lost about the "null hypothesis" -- do you mean what the current code is doing? (It is pretty much just rendering djot nodes to strings, which it puts on a buffer to be concatenated later.) Good point about the natural interface being recursive. This could be a reason just to stick with filters for this kind of customization. (I've added an issue to make the filter logic non-recursive.) Perhaps we could think about how to make this common case for constructing a filter (just to change the default HTML rendering) more ergonomic. What if you could do something like: filters: [
{ image: node => if (isVideo(node.destination) {
return <video src={node.destination}><children/></video>
} }
] The desugaring process could split this into an "enter" part {before [EDIT: I guess if we wanted to use jsx, we have to stick with what can be done with |
Not exactly. The current code is class HTMLRenderer {
buffer: string[];
renderAstNode(node: AstNode): void { }
} the alternative I am talking about is class HTMLRenderer {
renderAstNode(node: AstNode): string { }
} this allows the user to render a child node first, and then paste it into some larger context.
Uhu. I think the only thing I really don't like at the moment is stuffing html into Another idea here: what if we allow the filter to set |
I see. We'd have to measure performance effects of dropping the buffer and just having renderAstNode put out a string. (renderChildren could still use a buffer internally, returning a concatenated string at the end, so maybe it wouldn't be too bad.) |
I don't really like the idea of allowing you to set |
Uhu
This one I'd disagree this: in this case, you switch |
Yes, it's true you could add new attributes to the AST. But this approach still seems far too special-purpose. It's tailor-made for this one case -- rendering an I'm not sure what the right approach is here, but a version of this PR + exploring the "null hypothesis" (if the performance impact is acceptable) still seems best to me. |
#14 if my eyes are not deceiving me, just rendering to string seems to be faster. Could you double-check the finding? If it replicates, this makes "just return a string" API significatnly more attractive. |
Updated in light of #14. I think I now like this, but let me check how it fairs for my blog use-case |
Will come back tomorrow, but I think we can do even better here: instead of |
I like that suggestion. |
Yeah, sadly I think it doesn’t work, as we still have pass some sort of a callback to render children. |
Ok, the new API seems to work quite OK! I still want to try the |
Yeah, that turns out to be uglier: coding html in djot is just too indirect, it's too hard to tell from the filter's code what the end end result would be. |
As of now, djot.js provides extensibility via filters -- code to modify djot's own AST. This is I think of limited usability: djot is the source format, not the destination format, so most interesting logic should happen during the output. For example, if I want to render `{.video}` as `<video>`, rather than `<img>`, with filters I'd have to introduce a verbatim node, which is rather ugly. An even harder case is when I want to replace some tag while keeping default rendering for children. I think a nice way to solve that would be to allow customizing HTML rendering itself, by essentially letting the user to supply a bunch of overrides for various nodes. This PR implements a proof of concept, with the call-site looking like this: ```ts const ast = djot.parse(` # Example {.video}  `); const html = djot.renderHTML(ast, { overrides: { image: (node, renderer) => { if ((node.attributes?.["class"] ?? "").includes("video")) { return renderer.renderTag("video", node, { src: node.destination }) } return renderer.renderAstNodeDefault(node) } } }) ```
I like it. Ready to merge, you think? |
Yes, this is ready! |
I made one small change in a subsequent commit. |
@matklad if you'd be able to replace "TODO document HTMLRenderOptions and overrides" in the README with a small, self-contained example of using the overrides feature, that would be great. |
I added a really simple example, but if you think more would be good, feel free. |
I’ve realized that if we do jgm/djot#146 (dedicated support for role attribute), we’d get “embed html in djot” for free (presumably, html renderer would interpret role exactly as a tag name). |
As of now, djot.js provides extensibility via filters -- code to modify djot's own AST. This is I think of limited usability: djot is the source format, not the destination format, so most interesting logic should happen during the output.
For example, if I want to render
{.video}
as<video>
, rather than<img>
, with filters I'd have to introduce a verbatim node, which is rather ugly. An even harder case is when I want to replace some tag while keeping default rendering for children.I think a nice way to solve that would be to allow customizing HTML rendering itself, by essentially letting the user to supply a bunch of overrides for various nodes.
This PR implements a proof of concept, with the call-site looking like this: