Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline math #717

Merged
merged 6 commits into from
Dec 12, 2017
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions nbconvert/filters/markdown_mistune.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,58 @@
from nbconvert.filters.strings import add_anchor


class MathBlockGrammar(mistune.BlockGrammar):
"""This defines a single regex comprised of the different patterns that
identify math content spanning multiple lines. These are used by the
MathBlockLexer.
"""
multi_math_str = "|".join([r"(^\$\$.*?\$\$)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know about the re.VERBOSE option? It lets you split a regex over several lines and have comments, without needing to assemble the string yourself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give an example in this case?

Because I genuinely prefer reading individual regexes joined with a pipe to indicate alternations to one long regex that is individually difficult to parse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this:

multiline_math = re.compile(r"""(^\$\$.*?\$\$)|
                                (^\\\\\[.*?\\\\\])|
                                (^\\begin\{([a-z]*\*?)\}(.*?)\\end\{\4\})""",
                            re.DOTALL | re.VERBOSE)

Personally I prefer that to the string-joining approach, but you're the one writing it, so you get to decide on code style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, in this case I find them equally easy to read. it's more in cases like this in mistune https://github.com/lepture/mistune/blob/92b7f32664bad1a4b3740ee81eda47e5246e780f/mistune.py#L120-L134 where individual chunks need to split multiple lines, and accordingly detecting that an alternation is happening is a lot harder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I find these to be equally readable, I'm going to go with your style in this case. But I also figured out a way to simplify it further (since we don't really need the groups for grabbing the content, as you pointed out).

Copy link
Member Author

@mpacer mpacer Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I ask your rationale for disliking the string joining approach? I'm just curious as to why it would be dispreferred so that I can update my preferences accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok… actually I'm going to keep using the join method, but with simpler component strings.

I ended up not finding the pipe's use for alternations and the bitwise OR to be equally readable as not needing the second flag. I had skimmed over that detail at a first glance.

r"(^\\\\\[.*?\\\\\])",
r"(^\\begin\{([a-z]*\*?)\}(.*?)\\end\{\4\})"])
multiline_math = re.compile(multi_math_str, re.DOTALL)


class MathInlineGrammar(mistune.InlineGrammar):
"""This defines different ways of declaring math objects that should be
passed through to mathjax unaffected. These are used by the MathInlineLexer.
"""
inline_math = re.compile(r"^\$(.+?)\$|^\\\\\((.+?)\\\\\)", re.DOTALL)
block_math = re.compile(r"^\$\$(.*?)\$\$|^\\\\\[(.*?)\\\\\]", re.DOTALL)
latex_environment = re.compile(r"^\\begin\{([a-z]*\*?)\}(.*?)\\end\{\1\}",
re.DOTALL)
text = re.compile(r'^[\s\S]+?(?=[\\<!\[_*`~$]|https?://| {2,}\n|$)')


class MathBlockLexer(mistune.BlockLexer):
""" This acts as a pass-through to the MathInlineLexer. It is needed in
order to avoid other block level rules splitting math sections apart.
"""

default_rules = (['multiline_math']
+ mistune.BlockLexer.default_rules)

def __init__(self, rules=None, **kwargs):
if rules is None:
rules = MathBlockGrammar()
super(MathBlockLexer, self).__init__(rules, **kwargs)

def parse_multiline_math(self, m):
"""Add token to pass through mutiline math."""
self.tokens.append({
"type": "multiline_math",
"text": m.group(1) or m.group(2) or m.group(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the three groups are each the whole of their alternative, this is equivalent to m.group(0), right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Much cleaner, grazie.

})


class MathInlineLexer(mistune.InlineLexer):
"""This interprets the content of LaTeX style math objects using the rules
defined by the MathInlineGrammar.

In particular this grabs ``$$...$$``, ``\\[...\\]``, ``\\(...\\)``, ``$...$``,
and ``\begin{foo}...\end{foo}`` styles for declaring mathematics. It strips
delimiters from all these varieties, and extracts the type of environment
in the last case (``foo`` in this example).
"""
default_rules = (['block_math', 'inline_math', 'latex_environment']
+ mistune.InlineLexer.default_rules)

Expand All @@ -53,8 +96,14 @@ class MarkdownWithMath(mistune.Markdown):
def __init__(self, renderer, **kwargs):
if 'inline' not in kwargs:
kwargs['inline'] = MathInlineLexer
if 'block' not in kwargs:
kwargs['block'] = MathBlockLexer
super(MarkdownWithMath, self).__init__(renderer, **kwargs)


def output_multiline_math(self):
return self.inline(self.token["text"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we're parsing it again with the inline parser? Can we skip that, given that we just want to put it in the output unmodified?

Copy link
Member Author

@mpacer mpacer Dec 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're parsing it semantically for the first time in the inline parser. We actually aren't putting it in the output unmodified, because we need to escape the individual parts of it for the purposes of putting > and < in html. That occurs in the rendering step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks.



class IPythonRenderer(mistune.Renderer):
def block_code(self, code, lang):
Expand Down
12 changes: 12 additions & 0 deletions nbconvert/filters/tests/test_markdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,18 @@ def test_markdown2html_math(self):
"$$a<b&b<lt$$",
"$$a<b&lt;b>a;a-b<0$$",
"$$<k'>$$",
"""$$x
=
2$$""",
r"""$$
b_{l_1,l_2,l_3}^{\phi\phi\omega} = -8\, l_1\times l_2 \,l_1\cdot l_2 \int_0^{\chi_*} d\chi \frac{W(\chi,\chi_*)^2}{\chi^2} \int_0^{\chi}d\chi' \frac{W(\chi',\chi)W(\chi',\chi_*)}{{\chi'}^2} \left[
P_\psi\left(\frac{l_1}{\chi},z(\chi)\right) P_\Psi\left(\frac{l_2}{\chi'},z(\chi')\right)
- (l_1\leftrightarrow l_2)
\right]
$$""",
r"""\begin{equation*}
x = 2 *55* 7
\end{equation*}""",
"""$
\\begin{tabular}{ l c r }
1 & 2 & 3 \\
Expand Down