Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sublist: Pandoc expect 4 spaces while Markdown can have 2 #2575

Closed
nbigaouette opened this issue Dec 4, 2015 · 5 comments
Closed

Sublist: Pandoc expect 4 spaces while Markdown can have 2 #2575

nbigaouette opened this issue Dec 4, 2015 · 5 comments

Comments

@nbigaouette
Copy link

I have a markdown file that I'm converting to LaTeX and then PDF.

There is an issue with how sublists are generated in the final LaTeX/PDF.

The pandoc README states:

List items may include other lists. In this case the preceding blank line is optional. The nested list must be indented four spaces or one tab:

But this is not required by the original Markdown: I couldn't find that convention here.

Also, my Markdown is converted through pandoc using:

--from=markdown_github+yaml_metadata_block+table_captions+implicit_figures

and I such I would expect the same behavior as GitHub flavored Markdown. As a simple test, the following list (using 2 spaces):

- First item
- Second item
  - First subitem
  - Second subitem
- Third item

gets rendered to:

  • First item
  • Second item
    • First subitem
    • Second subitem
  • Third item

and this one (using 4 spaces):

- First item
- Second item
    - First subitem
    - Second subitem
- Third item

gets rendered to:

  • First item
  • Second item
    • First subitem
    • Second subitem
  • Third item

which is exactly the same: the subitems have the proper nesting level.

Unfortunately, pandoc converts the two-spaces nested list as:

\begin{itemize}
\item
  First item
\item
  Second item
\item
  First subitem
\item
  Second subitem
\item
  Third item
\end{itemize}

instead of the expected

\begin{itemize}
\item
  First item
\item
  Second item
\begin{itemize}
\item
  First subitem
\item
  Second subitem
\end{itemize}
\item
  Third item
\end{itemize}

I don't mind using a pandoc extension to get the proper nesting, but couldn't find any option to get this.

I also don't mind four spaces for the sublist but I use the (great) Atom editor with the tidy-markdown extension which automatically formats (tidy) the markdown. This extension follows these conventions for the formatting which is "two-spaces nested list". This means that every time I save my Markdown file, sublists gets changed to using two-spaces which completely breaks the final LaTeX file.

@jgm
Copy link
Owner

jgm commented Dec 4, 2015

See #2367, #2210, #744.

Please see especially the full discussion at http://spec.commonmark.org/0.22/#motivation

pandoc doesn't currently follow the CommonMark spec, but the simpler four-space rule; however, the section linked in the CommonMark spec explains why the issue is complex, why a "two-space rule" isn't the solution, and why Gruber's syntax specification at least strongly suggests the four-space rule. Eventually I hope to update pandoc to use the CommonMark rules (or you can get this now with -f commonmark, though you lose most of the useful pandoc extensions).

Note that github markdown does not consistently follow a two-space rule!

Here one space is enough:

- foo
 - bar
  • foo
    • bar

But now let's go another level, and we get bizarre results:

- foo
 - bar
  - baz
  • foo
    • bar
    • baz

At least with pandoc, a consistent rule is followed.

Things are even worse when we consider that the list markers can be indented. Look what github does here:

  - foo
 - bar
  • foo
    • bar

What about code blocks under lists? Gruber says they need to be indented eight spaces, and github respects that:

- foo

      bar

        bar
  • foo

    bar

    bar
    

But that gives the unpleasant result that if you take the list marker out and deindent everything two spaces, you get something different than you had in the list:

foo

    bar

      bar

foo

bar

  bar

I could go on and on, but I hope this helps.

In principle it would be nice to parse lists exactly the same as github with -f markdown_github, but (a) there's no clear rule, and (b) because of the interaction with outer indentation and indented code blocks, it would be a very complex change.

@nbigaouette
Copy link
Author

Hi Jonh, thanks for the information and clarifications. I would be happy to use commonmark; I love markdown and I wish it was better standardized! Hopefully commonmark will achieve this.

Meanwhile, when you say using commonmark would loose pandoc's useful extensions, couldn't they be explicitly enabled, using -f commonmark+yaml_metadata_block for example? Or is there something else?

Thanks!

@nbigaouette
Copy link
Author

I'm playing around with -f commonmark to see if it could solve my issue. Unfortunately, my YAML metablock is ignored, even if I use +yaml_metadata_block. For example, the $title$ variable in my LaTeX template is not expanded to the YAML value.

I even tried using --variable=title:TESTTITLE but this does not work. I could make it work by using --metadata=title:TESTTITLE though. So it really seems pandoc ignores the YAML metablock.

Shouldn't the +yaml_metadata_block extension be respected even with -f commonmark?

@jgm
Copy link
Owner

jgm commented Dec 7, 2015

At the moment pandoc just uses the libcmark C library to
parse CommonMark, so it has very little control over
extensions. We need a Haskell parser for CommonMark; a
couple people were working on these, but nothing has
happened as far as I know. I could write one, based on
the algorithm in libcmark, but I don't have time at the
moment.

+++ Nicolas Bigaouette [Dec 07 15 08:38 ]:

I'm playing around with -f commonmark to see if it could solve my
issue. Unfortunately, my YAML metablock is ignored, even if I use
+yaml_metadata_block. For example, the $title$ variable in my LaTeX
template is not expanded to the YAML value.

I even tried using --variable=title:TESTTITLE but this does not work. I
could make it work by using --metadata=title:TESTTITLE though. So it
really seems pandoc ignores the YAML metablock.

Shouldn't the +yaml_metadata_block extension be respected even with -f
commonmark?


Reply to this email directly or [1]view it on GitHub.

References

  1. Sublist: Pandoc expect 4 spaces while Markdown can have 2 #2575 (comment)

@nbigaouette
Copy link
Author

Ok, thanks for your help.

I'll bypass my issues by using pandoc markdown and get rid of tidy-markdown.

Hopefully one day CM will end this fragmentation. Thank's for you work on all this!

jgm added a commit that referenced this issue Aug 19, 2017
Closes #3511.

Previously pandoc used the four-space rule: continuation paragraphs,
sublists, and other block level content had to be indented 4
spaces.  Now the indentation required is determined by the
first line of the list item:  to be included in the list item,
blocks must be indented to the level of the first non-space
content after the list marker. Exception: if are 5 or more spaces
after the list marker, then the content is interpreted as an
indented code block, and continuation paragraphs must be indented
two spaces beyond the end of the list marker.  See the CommonMark
spec for more details and examples.

Documents that adhere to the four-space rule should, in most cases,
be parsed the same way by the new rules.  Here are some examples
of texts that will be parsed differently:

    - a
      - b

will be parsed as a list item with a sublist; under the four-space
rule, it would be a list with two items.

    - a

          code

Here we have an indented code block under the list item, even though it
is only indented six spaces from the margin, because it is four spaces
past the point where a continuation paragraph could begin.  With the
four-space rule, this would be a regular paragraph rather than a code
block.

    - a

            code

Here the code block will start with two spaces, whereas under
the four-space rule, it would start with `code`.  With the four-space
rule, indented code under a list item always must be indented eight
spaces from the margin, while the new rules require only that it
be indented four spaces from the beginning of the first non-space
text after the list marker (here, `a`).

This change was motivated by a slew of bug reports from people
who expected lists to work differently (#3125, #2367, #2575, #2210,
 #1990, #1137, #744, #172, #137, #128) and by the growing prevalance
of CommonMark (now used by GitHub, for example).

Users who want to use the old rules can select the `four_space_rule`
extension.

* Added `four_space_rule` extension.
* Added `Ext_four_space_rule` to `Extensions`.
* `Parsing` now exports `gobbleAtMostSpaces`, and the type
  of `gobbleSpaces` has been changed so that a `ReaderOptions`
  parameter is not needed.
codingepaduli added a commit to codingepaduli/codingepaduli that referenced this issue Nov 21, 2020
codingepaduli added a commit to codingepaduli/codingepaduli that referenced this issue Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants