From 262f3270dca55b3ec01cc32496b93a7558e4a863 Mon Sep 17 00:00:00 2001 From: Damian Avila Date: Wed, 25 Aug 2021 22:57:31 -0300 Subject: [PATCH 1/7] Add first iteration on 2nd draft --- posts/draft2.md | 252 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 252 insertions(+) create mode 100644 posts/draft2.md diff --git a/posts/draft2.md b/posts/draft2.md new file mode 100644 index 00000000..9623b911 --- /dev/null +++ b/posts/draft2.md @@ -0,0 +1,252 @@ +# A deep dive into MyST, Part 2: TBD + +> *This is a series of blog posts inviting you to join me in a little journey I have +experienced in the last few weeks at the time to figure it out a nice story for **MyST** +inside the **Nikola** world.* + +In the previous blog [post](https://damianavila.github.io/blog/posts/a-deep-dive-into-myst-part-1-the-myst-parser-python-api-usage-in-nikola.html), +we have identified some limitations in the MyST-Parser Python API and we started to +understand why roles and directives were not supported by the API as we expected. + +In this post, we will explore the machinery underneath the MyST-Parser with the goal to +deepen our understanding and being able to propose some alternatives to provide the +expected support. + +## Markdown-it-py and its plugins + +Coming back to the MyST-Parser `default_parser` implementation, you can see the Parser +as a [collection of `markdown-it-py` plugins](https://mdit-py-plugins.readthedocs.io/en/latest/): + +```python +    md = ( +        MarkdownIt("commonmark", renderer_cls=renderer_cls) +        .enable("table") +        .use(front_matter_plugin) +        .use(myst_block_plugin) +        .use(myst_role_plugin) +        .use(footnote_plugin) +        .use(wordcount_plugin, per_minute=config.words_per_minute) +        .disable("footnote_inline") +        .disable("footnote_tail") +    ) +``` + +The `MarkdownIt` class is instantiated with some parsing configuration options, dictating +the syntax rules and additional options for the parser and renderer. In addition, +several plugins are activated to load a collection of additional syntax rules and render +methods into the parser. + +When the input data is parsed via a nested chains of rules, it generates a list (stream) +of tokens, that will be eventually passed to the renderer to generate a Docutils object. + +In the previous post, we highlighted some [MyST specific markdown-it-py plugins](https://mdit-py-plugins.readthedocs.io/en/latest/#myst-plugins), +such as the `myst_block_plugin` and the `myst_role_plugin`. Let's take a brief look at +the latest one for the sake of simplicity: + +```python +def myst_role_plugin(md: MarkdownIt): + """Parse ``{role-name}`content```""" + md.inline.ruler.before("backticks", "myst_role", myst_role) + md.add_render_rule("myst_role", render_myst_role) +``` + +The `myst_role_plugin` is essentially adding a new syntax rule to the parser that now is +able to the parse roles from the input and a new method for the renderer to render +the now role-associated tokens: + +```python +def render_myst_role(self, tokens, idx, options, env): + token = tokens[idx] + name = token.meta.get("name", "unknown") + return ( + '' + f"{{{name}}}[{escapeHtml(token.content)}]" + "" + ) +``` + +We now understand the output the MyST-Parser Python API is giving us at the time to +parse and render a simple role input: + +```ipython +In [1]: from myst_parser.main import to_html + +In [2]: text = """ + ...: {emphasis}`content` + ...: """ + +In [3]: to_html(text) +Out[3]: '

{emphasis}[content]

\n' +``` + +We previously showed the `to_html` function using the `default_parser` configured with +the `html` renderer (provided by the `markdown-it-py` RendererHTML class). +But, what happens when we use other renderers provided by the MyST-Parser Python API? + +## The Docutils and Sphinx renderers + +Let's first explore the `to_docutils` function: + +```ipython +In [1]: from myst_parser.main import to_docutils + +In [2]: text = """ + ...: {emphasis}`content` + ...: + ...: ```{admonition} This is my admonition + ...: This is my note + ...: ``` + ...: """ + +In [3]: print(to_docutils(text).pformat()) + + + + content + + + This is my admonition + <paragraph> + This is my note +``` + +The [DocutilsRenderer](https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/docutils_renderer.py#L72) +converts a token directly to the docutils.document representation of the document, +converting roles and directives to a docutils.nodes if a converter can be found for the +given name. + +The [SphinxRenderer](https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/sphinx_renderer.py#L31) +builds on the DocutilsRenderer to add sphinx specific nodes, e.g. for cross-referencing +between documents. + +```ipython +In [1]: from myst_parser.main import to_docutils + +In [2]: text = """ + ...: {ref}`target` + ...: """ + +In [3]: print(to_docutils(text, in_sphinx_env=True).pformat()) +<document source="notset"> + <paragraph> + <pending_xref refdoc="mock_docname" refdomain="std" refexplicit="False" reftarget="target" reftype="ref" refwarn="True"> + <inline classes="xref std std-ref"> + target +``` + +Notice the sphinx-specific roles (and directives) needs the `in_sphinx_env` option enabled. + +## The MystParser class + +Previously, we presented the `MystParser` as a Sphinx parser: + +```python +from sphinx.parsers import Parser as SphinxParser +... +class MystParser(SphinxParser): + """Sphinx parser for Markedly Structured Text (MyST).""" +``` + +using a set of general and some specific `markdown-it-py` plugins (notice the +`default_parser` configured - by default - with the "sphinx" renderer): + +```python + def parse(self, inputstring: str, document: nodes.document) -> None: + """Parse source text. + :param inputstring: The source string to parse + :param document: The root docutils node to add AST elements to + """ + config = document.settings.env.myst_config + parser = default_parser(config) + parser.options["document"] = document + env = AttrDict() +``` + +to parse the input text into a token stream and then rendering those (via the +SphinxRenderer which is a subclass of the DocutilsRenderer): + +```python + tokens = parser.parse(inputstring, env) + if not tokens or tokens[0].type != "front_matter": + # we always add front matter, so that we can merge it with global keys, + # specified in the sphinx configuration + tokens = [Token("front_matter", "", 0, content="{}", map=[0, 0])] + tokens + parser.renderer.render(tokens, parser.options, env) +``` + +into a Docutils' document representation: + +```python +def parse(app: Sphinx, text: str, docname: str = "index") -> nodes.document: + """Parse a string as MystMarkdown with Sphinx application.""" + app.env.temp_data["docname"] = docname + app.env.all_docs[docname] = time.time() + reader = SphinxStandaloneReader() + reader.setup(app) + parser = MystParser() + parser.set_application(app) + with sphinx_domains(app.env): + return publish_doctree( + text, + path.join(app.srcdir, docname + ".md"), + reader=reader, + parser=parser, + parser_name="markdown", + settings_overrides={"env": app.env, "gettext_compact": True}, + ) +``` + +Finally, those objects are passed to the built-in Sphinx machinery to write the html output. + +## The big question + +This is great! We now understand why we need Sphinx to generate the expected HTML output +for roles and directives. Several questions now arise: + +* Do we want to have Sphinx-free support for rendering roles and directives in the MyST-Parser + +Nikola (among other projects) already have their own machinery (based in Docutils) to +build the final HTML output. Getting the docutils object from the Python API would be +a super interesting way to easily expose and provide that object to the downstream +projects! + +One caveat with this approach would be missing some Sphinx features (ie. cross-linking +between documents) based on custom roles and directives that we may need to re-implement. + +2. Do we want to have docutils-free support for roles and directives in the MyST Python API? + +Docutils actually introduces the roles and directives concept (that Sphinx extend) so if +we want to go docutils-free, then we will need to re-implement those concepts. + +3. Does it makes sense to create a docutils alternative in Python? At what extend? + +There is currently a nice example about an alternative implementation from the +Executable Books community, but in the Javascript world: https://github.com/executablebooks/markdown-it-docutils ;-). +Since we do not have Docutils there, it actually makes a lot of sense to write something +new. But, what is the value/need/place for an alternative implementation in Python? +Maybe, we do not need the whole Docutils stuff... but we maybe need some core +functionality? + +If we decide to write some minimal support, what pieces are we interested to bring first? +Where those pieces should end up? The `markdown-it-docutils` package I referenced above is +actually a `markdown-it` (JS) plugin. If we follow that pattern, we should create a new +`markdown-it-py-docutils` plugin and we are not longer in the MyST-Parser territory. +But the MyST-Parser has, in fact, some [parsing directive functions](https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/parse_directives.py). +We may need to move that toward `markdown-it-py` as the [JS plugin does](https://github.com/executablebooks/markdown-it-docutils/blob/main/src/directives/main.ts). +That sounds nice, but... is there any other suitable (simpler) alternatives besides the +one I proposed above? + +Finally,in the MyST community, there are some ongoing discussions about developing a +MyST specification that represent what we understand as the MyST language. Having one +specification could be interesting and super useful because different actors interested +in writing a MyST parser for their own space can check their stuff against that +specification. Those actors could be different programming languages such as JS or Python +or even more, it could be different flavors in the same programming language, such as +Docutils, Sphinx, or a theoretical `markdown-it-py-docutils` ;-) + +## Show time... not really ;-) + +OK, again, this is long enough! In the next post I will try to start answering some of +these questions showcasing alternative approaches for the Nikola use case. + +I hope you enjoyed the ride and I will see you soon with the third part! \ No newline at end of file From ebc90de9fef485181bf9803bd0370116f2694a11 Mon Sep 17 00:00:00 2001 From: Damian Avila <damianavila82@yahoo.com.ar> Date: Mon, 30 Aug 2021 11:24:54 -0300 Subject: [PATCH 2/7] Update posts/draft2.md Simplify phrase Co-authored-by: Chris Holdgraf <choldgraf@gmail.com> --- posts/draft2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/draft2.md b/posts/draft2.md index 9623b911..49384715 100644 --- a/posts/draft2.md +++ b/posts/draft2.md @@ -1,7 +1,7 @@ # A deep dive into MyST, Part 2: TBD > *This is a series of blog posts inviting you to join me in a little journey I have -experienced in the last few weeks at the time to figure it out a nice story for **MyST** +experienced in the last few weeks to figure it out a nice story for **MyST** inside the **Nikola** world.* In the previous blog [post](https://damianavila.github.io/blog/posts/a-deep-dive-into-myst-part-1-the-myst-parser-python-api-usage-in-nikola.html), From d82b38884482d55fab851981cf5ba51969914dd2 Mon Sep 17 00:00:00 2001 From: Damian Avila <damianavila82@yahoo.com.ar> Date: Mon, 30 Aug 2021 11:26:32 -0300 Subject: [PATCH 3/7] Update posts/draft2.md Linkify two words instead of just one Co-authored-by: Chris Holdgraf <choldgraf@gmail.com> --- posts/draft2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/draft2.md b/posts/draft2.md index 49384715..f8c26fee 100644 --- a/posts/draft2.md +++ b/posts/draft2.md @@ -4,7 +4,7 @@ experienced in the last few weeks to figure it out a nice story for **MyST** inside the **Nikola** world.* -In the previous blog [post](https://damianavila.github.io/blog/posts/a-deep-dive-into-myst-part-1-the-myst-parser-python-api-usage-in-nikola.html), +In the previous [blog post](https://damianavila.github.io/blog/posts/a-deep-dive-into-myst-part-1-the-myst-parser-python-api-usage-in-nikola.html), we have identified some limitations in the MyST-Parser Python API and we started to understand why roles and directives were not supported by the API as we expected. From a222d8ff695efd66a3a1831c7c2df5526028f1b0 Mon Sep 17 00:00:00 2001 From: Damian Avila <damianavila82@yahoo.com.ar> Date: Mon, 30 Aug 2021 11:30:44 -0300 Subject: [PATCH 4/7] Update posts/draft2.md Remove additional article Co-authored-by: Chris Holdgraf <choldgraf@gmail.com> --- posts/draft2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/draft2.md b/posts/draft2.md index f8c26fee..5f9ec706 100644 --- a/posts/draft2.md +++ b/posts/draft2.md @@ -36,7 +36,7 @@ the syntax rules and additional options for the parser and renderer. In addition several plugins are activated to load a collection of additional syntax rules and render methods into the parser. -When the input data is parsed via a nested chains of rules, it generates a list (stream) +When the input data is parsed via nested chains of rules, it generates a list (stream) of tokens, that will be eventually passed to the renderer to generate a Docutils object. In the previous post, we highlighted some [MyST specific markdown-it-py plugins](https://mdit-py-plugins.readthedocs.io/en/latest/#myst-plugins), From 0a92a6d1a13171c4135ca033f3b4e8bdf38919e8 Mon Sep 17 00:00:00 2001 From: Damian Avila <damianavila82@yahoo.com.ar> Date: Mon, 30 Aug 2021 11:38:53 -0300 Subject: [PATCH 5/7] Indent paragraphs after numbered list of questions --- posts/draft2.md | 46 +++++++++++++++++++++++----------------------- 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/posts/draft2.md b/posts/draft2.md index 5f9ec706..9bcf9e6b 100644 --- a/posts/draft2.md +++ b/posts/draft2.md @@ -205,36 +205,36 @@ for roles and directives. Several questions now arise: * Do we want to have Sphinx-free support for rendering roles and directives in the MyST-Parser -Nikola (among other projects) already have their own machinery (based in Docutils) to -build the final HTML output. Getting the docutils object from the Python API would be -a super interesting way to easily expose and provide that object to the downstream -projects! + Nikola (among other projects) already have their own machinery (based in Docutils) to + build the final HTML output. Getting the docutils object from the Python API would be + a super interesting way to easily expose and provide that object to the downstream + projects! -One caveat with this approach would be missing some Sphinx features (ie. cross-linking -between documents) based on custom roles and directives that we may need to re-implement. + One caveat with this approach would be missing some Sphinx features (ie. cross-linking + between documents) based on custom roles and directives that we may need to re-implement. 2. Do we want to have docutils-free support for roles and directives in the MyST Python API? -Docutils actually introduces the roles and directives concept (that Sphinx extend) so if -we want to go docutils-free, then we will need to re-implement those concepts. + Docutils actually introduces the roles and directives concept (that Sphinx extend) so if + we want to go docutils-free, then we will need to re-implement those concepts. 3. Does it makes sense to create a docutils alternative in Python? At what extend? -There is currently a nice example about an alternative implementation from the -Executable Books community, but in the Javascript world: https://github.com/executablebooks/markdown-it-docutils ;-). -Since we do not have Docutils there, it actually makes a lot of sense to write something -new. But, what is the value/need/place for an alternative implementation in Python? -Maybe, we do not need the whole Docutils stuff... but we maybe need some core -functionality? - -If we decide to write some minimal support, what pieces are we interested to bring first? -Where those pieces should end up? The `markdown-it-docutils` package I referenced above is -actually a `markdown-it` (JS) plugin. If we follow that pattern, we should create a new -`markdown-it-py-docutils` plugin and we are not longer in the MyST-Parser territory. -But the MyST-Parser has, in fact, some [parsing directive functions](https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/parse_directives.py). -We may need to move that toward `markdown-it-py` as the [JS plugin does](https://github.com/executablebooks/markdown-it-docutils/blob/main/src/directives/main.ts). -That sounds nice, but... is there any other suitable (simpler) alternatives besides the -one I proposed above? + There is currently a nice example about an alternative implementation from the + Executable Books community, but in the Javascript world: https://github.com/executablebooks/markdown-it-docutils ;-). + Since we do not have Docutils there, it actually makes a lot of sense to write something + new. But, what is the value/need/place for an alternative implementation in Python? + Maybe, we do not need the whole Docutils stuff... but we maybe need some core + functionality? + + If we decide to write some minimal support, what pieces are we interested to bring first? + Where those pieces should end up? The `markdown-it-docutils` package I referenced above is + actually a `markdown-it` (JS) plugin. If we follow that pattern, we should create a new + `markdown-it-py-docutils` plugin and we are not longer in the MyST-Parser territory. + But the MyST-Parser has, in fact, some [parsing directive functions](https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/parse_directives.py). + We may need to move that toward `markdown-it-py` as the [JS plugin does](https://github.com/executablebooks/markdown-it-docutils/blob/main/src/directives/main.ts). + That sounds nice, but... is there any other suitable (simpler) alternatives besides the + one I proposed above? Finally,in the MyST community, there are some ongoing discussions about developing a MyST specification that represent what we understand as the MyST language. Having one From 667e1f7c5d30ed7efbdacd16bc56e764b0e5ed6f Mon Sep 17 00:00:00 2001 From: Damian Avila <damianavila82@yahoo.com.ar> Date: Mon, 30 Aug 2021 11:43:18 -0300 Subject: [PATCH 6/7] Fix negative word --- posts/draft2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/draft2.md b/posts/draft2.md index 9bcf9e6b..f52e12db 100644 --- a/posts/draft2.md +++ b/posts/draft2.md @@ -230,7 +230,7 @@ for roles and directives. Several questions now arise: If we decide to write some minimal support, what pieces are we interested to bring first? Where those pieces should end up? The `markdown-it-docutils` package I referenced above is actually a `markdown-it` (JS) plugin. If we follow that pattern, we should create a new - `markdown-it-py-docutils` plugin and we are not longer in the MyST-Parser territory. + `markdown-it-py-docutils` plugin and we are no longer in the MyST-Parser territory. But the MyST-Parser has, in fact, some [parsing directive functions](https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/parse_directives.py). We may need to move that toward `markdown-it-py` as the [JS plugin does](https://github.com/executablebooks/markdown-it-docutils/blob/main/src/directives/main.ts). That sounds nice, but... is there any other suitable (simpler) alternatives besides the From 99fe9d38d0b27d450cec24310cd32d24d025ba51 Mon Sep 17 00:00:00 2001 From: Damian Avila <damianavila82@yahoo.com.ar> Date: Mon, 30 Aug 2021 15:05:19 -0300 Subject: [PATCH 7/7] Clarify the question I was asking about Co-authored-by: Chris Holdgraf <choldgraf@gmail.com> --- posts/draft2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/draft2.md b/posts/draft2.md index f52e12db..a156817b 100644 --- a/posts/draft2.md +++ b/posts/draft2.md @@ -218,7 +218,7 @@ for roles and directives. Several questions now arise: Docutils actually introduces the roles and directives concept (that Sphinx extend) so if we want to go docutils-free, then we will need to re-implement those concepts. -3. Does it makes sense to create a docutils alternative in Python? At what extend? +3. Does it makes sense to create a docutils alternative in Python? How much of its functionality would need to be replicated? How should it be extended or enhanced? There is currently a nice example about an alternative implementation from the Executable Books community, but in the Javascript world: https://github.com/executablebooks/markdown-it-docutils ;-).