Html api/stop at funky comments #7

dmsnell · 2023-04-05T23:43:11Z

Trac ticket:

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

This is just a safety precaution to mitigate an accidental removal of the pointer increment. No actual problems motivated this change.

@costdev

Props to @costdev for noting the unchecked indices.

The HTML API should be able to provide the ability to generate excerpts from HTMl documents given a specific maximum length. In this patch we're exploring the addition of text and HTML chunks that can be extracted while processing in order to do just this. The text chunks are similar to `.textContent` on the DOM while the HTML chunks contain raw and unprocessed HTML. These functions should likely remain low-level in the Tag Processor and be exposed from the HTML Processor to ensure that proper semantics are heeded when extracting this information, such as how `PRE` tags ignore a leading newline inside their content or how `SCRIPT` and `STYLE` content isn't part of what we want with something like `strip_tags()`. In the process of this work it's evident again that the Tag Processor ought to expose the ability to visit every token and non-tag tokens should be classified. This has already been explored in #7.

dmsnell force-pushed the html-api/stop-at-funky-comments branch 3 times, most recently from a5d3f64 to 61b15f1 Compare April 6, 2023 01:01

dmsnell added 11 commits April 12, 2023 17:18

HTML API: Support extra non-normative comment constructions

02e9e52

Fix newly introduced and broken tests

5900185

Fix implementation detail, move by one not two dashes

46af172

Move pointer increment into loop condition to avoid infinite loops

2a1e1be

This is just a safety precaution to mitigate an accidental removal of the pointer increment. No actual problems motivated this change.

Add references to Trac ticket, annotate tests

f724140

Linting issues

5f170ee

Correct unchecked indexing, feedback

ebeb870

Props to @costdev for noting the unchecked indices.

Catch another indexing operation

1c7ee80

WIP

ccc9e3b

WIP: HTML API: Stop at funky comments

93339ff

WIP

d36fb25

dmsnell force-pushed the html-api/stop-at-funky-comments branch from 61b15f1 to d36fb25 Compare April 12, 2023 15:19

dmsnell mentioned this pull request Sep 14, 2023

WIP: HTML API: Extract previous text and HTML chunks while processing. WordPress/wordpress-develop#5208

Draft

dmsnell mentioned this pull request Nov 29, 2023

HTML API: Provide mechanism to scan all tokens in an HTML document, not only the tags. WordPress/wordpress-develop#5683

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Html api/stop at funky comments #7

Html api/stop at funky comments #7

dmsnell commented Apr 5, 2023

Html api/stop at funky comments #7

Are you sure you want to change the base?

Html api/stop at funky comments #7

Conversation

dmsnell commented Apr 5, 2023