HTML needs a mechaism for extending the parsing algorithm #8114

plinss · 2022-07-18T16:41:40Z

The HTML parsing algorithm is supposed to allow generating a consistent DOM from any given HTML input. However as new elements are added, from time to time changes are made to the parsing algorithm, e..g adding new elements as flow content.

While this is ultimately convenient for authors, it results in a different DOM structure in older clients.

HTML should have a mechanism (ideally declarative) for expressing parsing behavior so that older clients can produce the correct DOM when handling new content. This would also allow web component authors to opt-in to the same kinds of authoring improvements.

domenic · 2022-07-18T22:49:09Z

This is just XHTML, right?

plinss · 2022-07-18T23:27:47Z

In theory linking to a formal schema document could satisfy this, but this needn't have such a heavy solution.

One possibility could be a meta tag that describes a single element's parsing behavior, another could be a micro-syntax within the element's open tag (like maybe a sigil just before the >).

While XHTML had its issues, it did offer some flexibility which we lost. We traded that flexibility for authoring simplicity and a parser algorithm that was supposed to be invariant. That invariance has been broken several times, and likely will be again. Let's try to find a better solution that allows the flexibility to innovate while not breaking code.

annevk · 2022-08-29T11:05:18Z

While in theory this seems interesting, in practice I haven't seen a proposal for this that maintains all the good qualities of HTML syntax. Meta-syntax is just not very ergonomic (or internally consistent, at this point) and also introduces its own set of risks.

hsivonen · 2022-09-13T08:39:00Z

Having site-supplied declarations that affect parsing would cause parsing actions at a distance that would be hard to connect in a sensible way to all entry points to parsing. (It seems unlikely that a meta would travel into fragment parsing invocations, for example.)

Moreover, a new solution introduced now would only work prospectively when we come across this problem the next time in the future. It wouldn't solve the issue at hand relative to implementations of the current parsing algorithm already out there.

However, if a site is willing to take extra steps to accommodate already-deployed implementations, we already have syntax for that: using explicit end tags, i.e. not omitting any end tags (</p> in particular) that the spec says are permissible to omit. This can even be automated on the server side by parsing (with an up-to-date implementation) and immediately reserializing.

zcorpan · 2024-11-08T10:03:37Z

Closing as wontfix per the above comments.

plinss mentioned this issue Jul 18, 2022

<search> HTML element w3ctag/design-reviews#714

Closed

1 task

annevk added the topic: parser label Aug 29, 2022

annevk added the addition/proposal New features or enhancements label Aug 29, 2022

annevk mentioned this issue Nov 28, 2022

Customized built-in elements WebKit/standards-positions#97

Closed

zcorpan closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML needs a mechaism for extending the parsing algorithm #8114

HTML needs a mechaism for extending the parsing algorithm #8114

plinss commented Jul 18, 2022

domenic commented Jul 18, 2022

plinss commented Jul 18, 2022

annevk commented Aug 29, 2022

hsivonen commented Sep 13, 2022

zcorpan commented Nov 8, 2024

HTML needs a mechaism for extending the parsing algorithm #8114

HTML needs a mechaism for extending the parsing algorithm #8114

Comments

plinss commented Jul 18, 2022

domenic commented Jul 18, 2022

plinss commented Jul 18, 2022

annevk commented Aug 29, 2022

hsivonen commented Sep 13, 2022

zcorpan commented Nov 8, 2024