Skip to content

Commit

Permalink
Fix to not drop brs in phrasing
Browse files Browse the repository at this point in the history
Closes GH-83.
  • Loading branch information
wooorm committed Jan 28, 2025
1 parent f8a2bf0 commit d6cab31
Show file tree
Hide file tree
Showing 7 changed files with 48 additions and 19 deletions.
6 changes: 5 additions & 1 deletion lib/handlers/heading.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
* @import {Heading, PhrasingContent} from 'mdast'
*/

import {dropSurroundingBreaks} from '../util/drop-surrounding-breaks.js'

/**
* @param {State} state
* State.
Expand All @@ -17,7 +19,9 @@ export function heading(state, node) {
/* c8 ignore next */
Number(node.tagName.charAt(1)) || 1
)
const children = /** @type {Array<PhrasingContent>} */ (state.all(node))
const children = dropSurroundingBreaks(
/** @type {Array<PhrasingContent>} */ (state.all(node))
)

/** @type {Heading} */
const result = {type: 'heading', depth, children}
Expand Down
10 changes: 7 additions & 3 deletions lib/handlers/p.js
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
* @import {Paragraph, PhrasingContent} from 'mdast'
*/

import {dropSurroundingBreaks} from '../util/drop-surrounding-breaks.js'

/**
* @param {State} state
* State.
Expand All @@ -13,9 +15,11 @@
* mdast node.
*/
export function p(state, node) {
// Allow potentially “invalid” nodes, they might be unknown.
// We also support straddling later.
const children = /** @type {Array<PhrasingContent>} */ (state.all(node))
const children = dropSurroundingBreaks(
// Allow potentially “invalid” nodes, they might be unknown.
// We also support straddling later.
/** @type {Array<PhrasingContent>} */ (state.all(node))
)

if (children.length > 0) {
/** @type {Paragraph} */
Expand Down
15 changes: 1 addition & 14 deletions lib/state.js
Original file line number Diff line number Diff line change
Expand Up @@ -227,20 +227,7 @@ function all(parent) {
}
}

let start = 0
let end = results.length

while (start < end && results[start].type === 'break') {
start++
}

while (end > start && results[end - 1].type === 'break') {
end--
}

return start === 0 && end === results.length
? results
: results.slice(start, end)
return results
}

/**
Expand Down
23 changes: 23 additions & 0 deletions lib/util/drop-surrounding-breaks.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/**
* @import {Nodes} from 'mdast'
*/

/**
* Drop trailing initial and final `br`s.
*
* @template {Nodes} Node
* Node type.
* @param {Array<Node>} nodes
* List of nodes.
* @returns {Array<Node>}
* List of nodes w/o `break`s.
*/
export function dropSurroundingBreaks(nodes) {
let start = 0
let end = nodes.length

while (start < end && nodes[start].type === 'break') start++
while (end > start && nodes[end - 1].type === 'break') end--

return start === 0 && end === nodes.length ? nodes : nodes.slice(start, end)
}
3 changes: 2 additions & 1 deletion lib/util/wrap.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import structuredClone from '@ungap/structured-clone'
import {phrasing as hastPhrasing} from 'hast-util-phrasing'
import {whitespace} from 'hast-util-whitespace'
import {phrasing as mdastPhrasing} from 'mdast-util-phrasing'
import {dropSurroundingBreaks} from './drop-surrounding-breaks.js'

/**
* Check if there are phrasing mdast nodes.
Expand Down Expand Up @@ -63,7 +64,7 @@ export function wrap(nodes) {
return d.type === 'text' ? whitespace(d.value) : false
})
? []
: [{type: 'paragraph', children: nodes}]
: [{type: 'paragraph', children: dropSurroundingBreaks(nodes)}]
}
}

Expand Down
3 changes: 3 additions & 0 deletions test/fixtures/br/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@
<h1><br>kilo</h1>
<h1>lima<br></h1>
<p>mike</p>november<br><p></p>
<p>oscar<span><br></span>papa</p>
<p><span><br></span>quebec</p>
<p>romeo<span><br></span></p>
7 changes: 7 additions & 0 deletions test/fixtures/br/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,10 @@ juliett
mike

november

oscar\
papa

quebec

romeo

6 comments on commit d6cab31

@tripodsan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wooorm this breaks our roundtrip tests that convert html->hast->mdast->md->mdast->hast->html.

it removes the <br> at the beginning of a paragraph.

AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
+ actual - expected
.
.
+   "            <p><code>const foo = 'bar';</code><br>Sub text</p>\n" +
-   "            <p><br><code>const foo = 'bar';</code><br>Sub text</p>\n" +

@wooorm
Copy link
Member Author

@wooorm wooorm commented on d6cab31 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Tobias! That is intentional, see the test fixture in this commit (<h1><br>kilo</h1>), it wasn’t caught before due to the p starting with a code. See also GH-83.
If you have lots of fixtures, then every patch will be a breaking change: I’d recommend a package lock if you worry about that.

@tripodsan
Copy link

@tripodsan tripodsan commented on d6cab31 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, but why is <p><br>kilo</p> not transformed into

\
kilo

?

there is no way for us to preserve the break node.

@wooorm
Copy link
Member Author

@wooorm wooorm commented on d6cab31 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many things that are not preserved — the question behind this example is: why would someone write that? It is not semantic: “br elements must be used only for line breaks that are actually part of the content, as in poems or addresses.”

@tripodsan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would someone write that?

this is a good question :-) authors tend to misuse space/enter for formatting, like for indentation and enforcing page breaks.

@wooorm
Copy link
Member Author

@wooorm wooorm commented on d6cab31 Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right ;) And, why should that misuse be persisted? I see it exactly otherwise: the goal here is to make readable and clean markdown, to drop that abuse!

Please sign in to comment.