Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix list tightness #479

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

taku0
Copy link

@taku0 taku0 commented Aug 17, 2023

This is a port of commonmark/commonmark.js#269.

  • Set the end position precisely.
  • Check list tightness by comparing line numbers.
  • Remove LAST_LINE_BLANK and LAST_LINE_CHECKED flags.
  • Defer resolution of link reference definitions until list tightness is checked.

Comments for each commits (feel free to squash them):

  • Defer resolution of link reference definitions:
    We must not remove link reference definitions until we check for list tightness.
    This commit defers resolving of link reference definitions until finalization of the document. We still need to eagerly remove link reference definitions in setext headings to determine whether it is a setext heading or a thematic break.
    So this commit provides slightly different functions for resolving link reference definitions and checking if a line is blank for cmark_strbuf and cmark_chunk.

  • Remove CMARK_NODE__LAST_LINE_CHECKED flag:
    This flag was introduced by Parsing ‘* * * * * * … a’ takes quadratic time #284, but we will not need it once we update S_ends_with_blank_line to not use resursion in the next commit.

  • Fix list tightness:
    This commit changes list tightness checking algorithm from one based on LAST_LINE_BLANK flag to one based on line numbers. This commit also set the end position precisely.

    Classification of end positions:

    • The end of the current line:

      • Thematic breaks
      • ATX headings
      • Setext headings
      • Fenced code blocks closed explicitly
      • HTML blocks (pre, comments, and others)
    • The end of the previous line:

      • Fenced code blocks closed by the end of the parent or EOF
      • HTML blocks (div and others)
      • HTML blocks closed by the end of the parent or EOF
      • Paragraphs
      • Block quotes
      • Empty list items
    • The end position of the last child:

      • Non-empty list items
      • Lists
    • The end position of the last non-blank line:

      • Indented code blocks

    The first two cases are handed by finalize and closed_explicitly flag.

    Non empty list items and lists are handled in switch statements in finalize.

    Indented code blocks are handled by setting the end position every time non-blank line is added to the block.

Benchmark:

  • master branch:
    mean = 0.1560, median = 0.1550, stdev = 0.0070

  • this branch:
    mean = 0.1610, median = 0.1600, stdev = 0.0032

@taku0
Copy link
Author

taku0 commented Aug 17, 2023

Actually, this branch set end columns more accurately than commonmark/commonmark.js#269 for corner cases. I will update commonmark.js when this PR is merged.

@taku0
Copy link
Author

taku0 commented Aug 17, 2023

make leakcheck fails but it also fails on the master branch with same errors.

@jgm
Copy link
Member

jgm commented Aug 18, 2023

Excellent! What errors is make leakcheck failing with?
We routinely run it as part of CI and it doesn't fail there.
EDIT: Ah, I see the failures in CI. But the last commit from master succeeds on that same check...

@taku0
Copy link
Author

taku0 commented Aug 19, 2023

Sorry, I forget to make after switching the branch. I'll investigate the leak.

We must not remove link reference definitions until we check for list
tightness.  This commit defers resolving of link reference definitions
until finalization of the document.  We still need to eagerly remove
link reference definitions in setext headings to determine whether it is
a setext heading or a thematic break.  So this commit provides slightly
different functions for resolving link reference definitions and
checking if a line is blank for `cmark_strbuf` and `cmark_chunk`.
This flag was introduced by
commonmark#284, but we will not need it
once we update `S_ends_with_blank_line` to not use resursion in the next
commit.
- Set the end position precisely
- Check list tightness by comparing line numbers
- Remove `LAST_LINE_BLANK` flag

See also commonmark/commonmark.js#269 .

Classification of end positions:

- The end of the current line:
  - Thematic breaks
  - ATX headings
  - Setext headings
  - Fenced code blocks closed explicitly
  - HTML blocks (`pre`, comments, and others)

- The end of the previous line:
  - Fenced code blocks closed by the end of the parent or EOF
  - HTML blocks (`div` and others)
  - HTML blocks closed by the end of the parent or EOF
  - Paragraphs
  - Block quotes
  - Empty list items

- The end position of the last child:
  - Non-empty list items
  - Lists

- The end position of the last non-blank line:
  - Indented code blocks

The first two cases are handed by `finalize` and `closed_explicitly` flag.

Non empty list items and lists are handled in `switch` statements in `finalize`.

Indented code blocks are handled by setting the end position every time
non-blank line is added to the block.
@taku0 taku0 force-pushed the tighten-list-blockquote-list branch from 34997d4 to ff0d224 Compare August 19, 2023 00:45
@taku0
Copy link
Author

taku0 commented Aug 19, 2023

I have fixed the leak and the CI is now all green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants