Improve markdown + HTML parsing #3

nvlang · 2024-07-17T14:56:07Z

This PR significantly improves SvelTeX's parsing of content mixing markdown and HTML syntax. Among other things, it includes the following changes:

Use sanitize-html to ensure that HTML generated by Markdown
processor is valid.
Refine whitespace adjustment performed before passing markup to the
Markdown processor.
Remove  tags within HTML elements or Svelte components that
cannot contain paragraphs (e.g., *text*
becomes text now, ignoring insignificant
whitespace.
Add markdown.components option to SvelTeX configuration to specify
preferences in regards to how each Svelte component is treated by
SvelTeX when it comes to whitespace adjustments.
Auto-import components "registered" in the markdown.components array
from the SvelTeX configuration if they are used in the markup and not
already imported in the file's <script> tag.

Note: a component is "registered" in the markdown.components array
iff there exists an object obj in the markdown.components array such
that all of the following hold:
- obj.name equals the name of the component (case-sensitive).
- obj.importPath is not undefined.
Add tests for all of the above features and fixes.
Add markdown.remarkRehypeOptions and
markdown.rehypeStringifyOptions to SvelTeX configuration when the
unified Markdown backend is used.

Fixes #2.

micromark is CommonMark compliant by default, whereas marked isn't. This makes it a more reliable reference parser for our purposes.

- Use `sanitize-html` to ensure that HTML generated by Markdown processor is valid. - Refine whitespace adjustment performed before passing markup to the Markdown processor. - Remove `` tags within HTML elements or Svelte components that cannot contain paragraphs (e.g., `*text*` becomes `text` now, ignoring insignificant whitespace. - Add `markdown.components` option to SvelTeX configuration to specify preferences in regards to how each Svelte component is treated by SvelTeX when it comes to whitespace adjustments. - Add tests for all of the above features and fixes. - Add `markdown.remarkRehypeOptions` and `markdown.rehypeStringifyOptions` to SvelTeX configuration when the `unified` Markdown backend is used. Fixes #2.

Auto-import components "registered" in the `markdown.components` array from the SvelTeX configuration if they are used in the markup and not already imported in the file's `<script>` tag. Note: a component is "registered" in the `markdown.components` array iff there exists an object `obj` in the `markdown.components` array such that all of the following hold: - `obj.name` equals the name of the component (case-sensitive). - `obj.importPath` is not `undefined`.

nvlang added 6 commits July 13, 2024 18:53

test: add tests for bad html+markdown parsing (#2)

0b18fd0

test: use micromark for tests, and narrow focus

2b2bfd0

micromark is CommonMark compliant by default, whereas marked isn't. This makes it a more reliable reference parser for our purposes.

wip

cada862

test(e2e): update deps

a4f2244

nvlang added bug Something isn't working enhancement New feature or request labels Jul 17, 2024

nvlang self-assigned this Jul 17, 2024

nvlang linked an issue Jul 17, 2024 that may be closed by this pull request

Bad markdown parsing #2

Closed

nvlang merged commit e4018bb into main Jul 17, 2024
6 checks passed

nvlang deleted the 2-bad-markdown-parsing branch July 17, 2024 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve markdown + HTML parsing #3

Improve markdown + HTML parsing #3

nvlang commented Jul 17, 2024

Improve markdown + HTML parsing #3

Improve markdown + HTML parsing #3

Conversation

nvlang commented Jul 17, 2024