Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List item content wrapped in paragraphs #226

Closed
ndreynolds opened this issue Mar 31, 2019 · 12 comments
Closed

List item content wrapped in paragraphs #226

ndreynolds opened this issue Mar 31, 2019 · 12 comments
Assignees

Comments

@ndreynolds
Copy link

I noticed an interesting discrepancy between Earmark and other markdown parsers I've tried. Probably related to #93, but it seems a bit different. Take the following Markdown example:

* Groceries
  * Produce
    * Apples
    * Carrots

Pandoc, CommonMark, Python Markdown, and redcarpet all produce the following HTML:

<ul>
  <li>
    Groceries
    <ul>
      <li>
        Produce
        <ul>
          <li>Apples</li>
          <li>Carrots</li>
        </ul>
      </li>
    </ul>
  </li>
</ul>

whereas Earmark produces this:

<ul>
  <li>
    <p>Groceries</p>
    <ul>
      <li>
        <p>Produce</p>
        <ul>
          <li>Apples</li>
          <li>Carrots</li>
        </ul>
      </li>
    </ul>
  </li>
</ul>

You can see that Earkmark adds paragraph tags whenever the list item contains a nested list. Using this comparison tool, I only see one other implementation (Gambas) with this behavior:

http://johnmacfarlane.net/babelmark2/?text=*+Groceries%0A++*+Produce%0A++++*+Apples%0A++++*+Carrots

I noticed it because it leads to inconsistent vertical spacing in the lists in my library's documentation (using ExDoc), since p elements have a 1em top and bottom margin by default in most browsers. Here's a visual example from my project:

Earmark via ExDoc GitHub Markdown
ex_doc github

In comparison to the GitHub version, the Earmark version looks very disconnected to me.

Is this something you'd be open to changing? If so, I'd be happy to try creating a PR that adjusts the output to match the other implementations.

@RobertDober
Copy link
Collaborator

Thank you for putting this to our attention.
I guess the <p>s around are indeed not the best way of rendering and I would be very glad to accept PRs in general.
However there might be a major issue with backwards compatibility, especially with ex_doc.
Maybe we should discuss it there firstly?

@RobertDober
Copy link
Collaborator

@ndreynolds on a personal note

Babelmark 2 has been a most valuable tool (and I had an Earmark dingus running for a couple of years)
now it seems that it is outdated and I usually use Babelmark III, which does confirm your findings too BTW 😁.
Sadly I do not have the time to setup a dingus right now, would be nice exposure of Elixir, but time is such a sparse resource 😢

@ndreynolds
Copy link
Author

Sure, then I'll open a similar issue in ExDoc to start the discussion, and we can go from there.

Sadly I do not have the time to setup a dingus right now, would be nice exposure of Elixir, but time is such a sparse resource cry

I can set one up. I saw Gigalixir has a free tier, which seems like a good fit for this. It would certainly be helpful to have when testing this. Will give it a go later.

@RobertDober
Copy link
Collaborator

RobertDober commented Mar 31, 2019 via email

@RobertDober
Copy link
Collaborator

Backward incompatibility accepted here by @josevalim.
@ndreynolds shall I assign myself or do you want to prepare a PR. That will not be an easy one I am afraid.

@ndreynolds
Copy link
Author

@RobertDober I'd be happy to take on the challenge if you don't mind. I have some time this week.

It seems like this is where I ought to start looking:

https://github.com/pragdave/earmark/blob/master/lib/earmark/html_renderer.ex#L129

I'd play with it some today and try to understand what would need to be changed.

@RobertDober
Copy link
Collaborator

RobertDober commented Apr 1, 2019

@ndreynolds looks about the right spot. I appreciate how you handled this so far and do not mind at all if you take this on.

@ndreynolds
Copy link
Author

I dug into this a bit more and found this in the CommonMark spec:

A list is loose if any of its constituent list items are separated by blank lines, or if any of its constituent list items directly contain two block-level elements with a blank line between them. Otherwise a list is tight.
(The difference in HTML output is that paragraphs in a loose list are wrapped in <p> tags, while paragraphs in a tight list are not.)

https://spec.commonmark.org/0.28/#loose

So it's a bit more nuanced than I thought, but this is pretty clear from an implementation standpoint. I think it means we need to identify whether the list is loose or tight and then apply (or not apply) the <p> tags accordingly. I'll keep working on it this week, hope to have a PR soon.

@RobertDober
Copy link
Collaborator

Great
Maybe we should start with adapting the acceptance tests to the behavior we want?
Shall I try to find some time to do that, or can you start your PR with it, so that we agree on it before you start implementation?

@RobertDober
Copy link
Collaborator

Please make your PR against https://github.com/pragdave/earmark/tree/226-loose-tight-lists
so that I can merge with the tests failing.

ndreynolds added a commit to ndreynolds/earmark that referenced this issue Apr 3, 2019
This is related to issue pragdave#226.

The CommonMark spec says the following about lists:

> A list is loose if any of its constituent list items are separated by
> blank lines, or if any of its constituent list items directly contain
> two block-level elements with a blank line between them. Otherwise a
> list is tight. (The difference in HTML output is that paragraphs in a
> loose list are wrapped in <p> tags, while paragraphs in a tight list
> are not.)

In Earmark, many lists without blank line separators are rendered as if
they were loose. For example, given a nested list, the outer list is
always treated as loose even when there are no blank lines. This appears
to be an issue with any item that has multiple content blocks.

Another issue to consider with nested lists: we need to determine
whether or not a list is loose or tight at each level independently
based on their direct constituents (as described above). So it's
possible to have nested lists with many levels that each have varying
looseness/tightness.

This commit adds tests derived from the CommonMark spec to better cover
these edge cases. Some are currently failing because of the problems
above.
@ndreynolds
Copy link
Author

@RobertDober done, I created #229 with failing tests, so we can discuss further there if you like.

ndreynolds added a commit to ndreynolds/earmark that referenced this issue Apr 6, 2019
Updates the logic for how lists are parsed and rendered to HTML to match
the following specification from CommonMark:

> A list is loose if any of its constituent list items are separated by
> blank lines, or if any of its constituent list items directly contain
> two block-level elements with a blank line between them. Otherwise a
> list is tight. (The difference in HTML output is that paragraphs in a
> loose list are wrapped in <p> tags, while paragraphs in a tight list
> are not.)

* Adds `tight` (`true`/`false`) to `%Block.List{}`. A tight list is not
  loose. Based on the CommonMark semantics found in the spec.
* Adds `blanks` (`true`/`false`) to `%Block.ListItem{}`, which is used
  along with existing `spaced` field to compute `tight`.
* Instead of removing `<p>` tags with a regex, adds `within_tight_list`
  to the `%Context{}`. This is updated whenever entering a list, and
  then used to decide whether or not to output the tags when a
  `%Block.Para{}` is rendered.
* Updates parser tests with the `tight` field.
* Updates test expectations in cases where a tight list has `<p>` tags.

Fixes pragdave#226
@RobertDober RobertDober modified the milestones: 1.3.3, 1.3.4 Jul 8, 2019
@RobertDober RobertDober modified the milestones: 1.3.4, 1.3.5 Jul 29, 2019
@RobertDober
Copy link
Collaborator

Redundant to #249

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants