Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Add support for list elements. #5539

Closed

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Oct 20, 2023

Trac ticket: Core-60215

Adds support to the HTML Processor for the following elements, enabling the handling of list elements in HTML:

  • UL, OL, LI
  • DT, DL, DD

Previously, the HTML Processor would bail when encountering any of these tags. Now it will properly handle them and their complicated rules.

  • Add comment at top of HTML Processor indicating new tag support.
  • Needs a comprehensive suite of tests due to the complicated nature of list rules.
  • Add docblocks to new tests.

@dmsnell dmsnell force-pushed the html-api/add-support-for-lists branch 2 times, most recently from 9a7f64e to 53e3df3 Compare December 13, 2023 18:42
@dmsnell dmsnell marked this pull request as ready for review December 13, 2023 18:44
@dmsnell dmsnell changed the title WIP: HTML API: Add support for list elements. HTML API: Add support for list elements. Dec 13, 2023
Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the suite from #5794 against this PR and this case fails:

#data
<ul><li><ul></li><li>a</li></ul></li></ul>
#errors
(1,4): expected-doctype-but-got-start-tag
(1,17): unexpected-end-tag
#document
| <html>
|   <head>
|   <body>
|     <ul>
|       <li>
|         <ul>
|           <li>
|             "a"

Here's the failure:

 ✘ External html5lib with tests1/case104·-·line·1611
   ┐
   ├ HTML <ul><li><ul></li><li>a</li></ul></li></ul>
   ├  was not processed correctly.
   ├ Failed asserting that two strings are equal.
   ┊ ---·Expected
   ┊ +++·Actual
   ┊ @@ @@
   ┊      <ul>\n
   ┊        <li>\n
   ┊          <ul>\n
   ┊ -··········<li>\n
   ┊ +······<li>\n
   ┊  '

@sirreal
Copy link
Member

sirreal commented Dec 22, 2023

I'll see if I can figure out the test failure. My previous comment was wrong (the expect/actual were inverted), I've updated it.

@sirreal sirreal force-pushed the html-api/add-support-for-lists branch from 8ecc099 to c73d272 Compare December 22, 2023 18:19
@sirreal
Copy link
Member

sirreal commented Dec 22, 2023

I've rebased this.

I pushed a test and a fix for the issue the html5lib tests found. I believe has_element_in_specific_scope had a flipped boolean when it finds a tag in the termination list. Just like if it reaches HTML, the search ends without having found a the tag in scope and the search ends in failure.

This fixes the tests.

Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this, I think it's ready to ship. We've given it a good amount of testing.

Thanks!

Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@dmsnell dmsnell force-pushed the html-api/add-support-for-lists branch 3 times, most recently from b25c823 to 9e9085e Compare January 9, 2024 15:10
Copy link
Contributor

@audrasjb audrasjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I added a few comments for small enhancements.

$this->is_special( $node->node_name )
) {
/*
* > If node is in the special category, but is not an address, div,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The > character should probably be removed here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the > is included to indicate it's a quote. These are quoting the HTML standard directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming what @sirreal wrote. These are linked to the HTML5 living spec by words to make it easier to search, and the > is attempting to indicate that it's a quote. It also corresponds to every block within the step_in_body() function.

I'm happy to entertain suggestions to improve the linking, but I don't want to make it more difficult for people to cross-reference the spec by removing these quotes. Unfortunately the spec doesn't add anchors to each of these sections so there's no clear link (which is one of the original reasons I added the quotes, beyond the obvious that it puts the intention of the spec at the front in the code).

/**
* Verifies that H1 through H6 elements close an open P element.
*
* @ticket {TICKET_NUMBER}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket number is 60215

@dmsnell dmsnell force-pushed the html-api/add-support-for-lists branch from 9e9085e to 38cb836 Compare January 9, 2024 21:58
From https://github.com/html5lib/html5lib-tests/blob/a9f44960a9fedf265093d22b2aa3c7ca123727b9/tree-construction/webkit01.dat#L468-L482

Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>

Add docblocks to tests, expand comments in class docblock.
@dmsnell dmsnell force-pushed the html-api/add-support-for-lists branch from 7537300 to 5adee8a Compare January 10, 2024 13:29
@dmsnell
Copy link
Member Author

dmsnell commented Jan 10, 2024

Merged in [57264]

@dmsnell dmsnell closed this Jan 10, 2024
@dmsnell dmsnell deleted the html-api/add-support-for-lists branch January 10, 2024 14:05
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 13, 2024
Updates from WordPress/wordpress-develop at 54a09a7ec99c59b8e640bd5aacebfdbf03bb02cc

 - WordPress/wordpress-develop#5535
   Adds support for H1 - H6 elements in the HTML Processor.

 - WordPress/wordpress-develop#5725
   Pause the Tag Processor when reaching the end of the document
   inside an incomplete syntax element.

 - Incorporates linting changes from "TODO:" to "todo".

 - WordPress/wordpress-develop#5762
   Handles the "all other tags" rules in IN BODY insertion mode.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

This patch adds the blanket exclusion from the HTML API compatability
layer after they were removed and started blocking updates from Core.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files. From Gutenberg's
perspective they are no different than NPM packages.

Co-authored-by: Anton Vlasenko <43744263+anton-vlasenko@users.noreply.github.com>
Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
@@ -105,7 +105,7 @@
* - Formatting elements: B, BIG, CODE, EM, FONT, I, SMALL, STRIKE, STRONG, TT, U.
* - Heading elements: H1, H2, H3, H4, H5, H6, HGROUP.
* - Links: A.
* - Lists: DL.
* - Lists: DD, DL, DT, LI, OL, LI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should this be UL.

- *  - Lists: DD, DL, DT, LI, OL, LI.
+ *  - Lists: DD, DL, DT, LI, OL, UL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Will fix this.

dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 17, 2024
Updates from WordPress/wordpress-develop at 54a09a7ec99c59b8e640bd5aacebfdbf03bb02cc

 - WordPress/wordpress-develop#5535
   Adds support for H1 - H6 elements in the HTML Processor.

 - WordPress/wordpress-develop#5725
   Pause the Tag Processor when reaching the end of the document
   inside an incomplete syntax element.

 - Incorporates linting changes from "TODO:" to "todo".

 - WordPress/wordpress-develop#5762
   Handles the "all other tags" rules in IN BODY insertion mode.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

This patch adds the blanket exclusion from the HTML API compatability
layer after they were removed and started blocking updates from Core.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files. From Gutenberg's
perspective they are no different than NPM packages.

Co-authored-by: Anton Vlasenko <43744263+anton-vlasenko@users.noreply.github.com>
Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 23, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <sergeybiryukov.ru@gmail.com>
Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>
sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jan 23, 2024
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 25, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <sergeybiryukov.ru@gmail.com>
Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 29, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

 - WordPress/wordpress-develop#5907
   Adds support for the INPUT element in the HTML Processor

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <sergeybiryukov.ru@gmail.com>
Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Feb 6, 2024
Updates from WordPress/wordpress-develop:
 - From: WordPress/wordpress-develop@54a09a7
 - To: WordPress/wordpress-develop@7a71339

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5907
   Adds support for the INPUT element in the HTML Processor

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

 - WordPress/wordpress-develop#5976
   Avoids splitting text nodes on "<" character.

 - WordPress/wordpress-develop#5992
   Only recognize true CDATA-lookalike nodes.

 - WordPress/wordpress-develop#5975
   Prevent void tag nesting when calling `next_token()`

 - WordPress/wordpress-develop#6021
   Reset parser state after seeking.

 - https://core.trac.wordpress.org/changeset/57528
   Fix typo in setting token flag.

 - WordPress/wordpress-develop#6041
   Ensure consecutive text is all joined into one text node.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: sergeybiryukov <sergeybiryukov@git.wordpress.org>
Co-authored-by: sirreal <jonsurrell@git.wordpress.org>
Co-authored-by: dmsnell <dmsnell@git.wordpress.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants