-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trim footnote anchors from excerpts #52518
Conversation
Flaky tests detected in bb26cac. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/5522773682
|
Tested
All 🟢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this one!
Hook into the the_content filter, but only when this is part of a wider get_the_excerpt filter.
Question: Is this the best approach to fix the issue?
Great question! I wasn't too familiar with how all this is hooked together, so took a look through the chain of calls and filters. I think on balance this is a pretty good approach so far, based on the following:
- In core, the default excerpt is grabbed via a default filter on
get_the_excerpt
which callswp_trim_excerpt
. wp_trim_excerpt
is responsible for getting the post content, then strips shortcodes and strips blocks that shouldn't be present in the output (excerpt_remove_blocks
), before applyingthe_content
filter.- By the time the
wp_trim_excerpt
filter is run,wp_trim_words
is already called, which strips<sup>
but without removing its contents, resulting in stray1
and2
(etc) characters being retained in the post excerpt.
So, currently, it seems the only place to hook into removal is either within the_content
or render_block
callbacks, and the_content
means the check will happen fewer times than if we check while rendering blocks 👍. (Technically, it might be possible to do it via a filter on wp_trim_words
using the $original_text
param, but that would involve re-doing the trimming behaviour, so would be a fair bit hackier IMO)
One potential alternative in core could be to still use the function introduced in this PR, but instead of applying it via the_content
filter, add it as a direct call within wp_trim_excerpt
. That could potentially make things slightly neater, but with the disadvantage that if anyone wanted to switch off this callback, they couldn't call remove_filter
to remove the call to gutenberg_trim_footnoes
.
So, all in all, on balance, I think you've found a good approach here. At first I was wondering if a CSS solution could work to set it to display: none
, but the stripping of the HTML tags in wp_trim_words
renders that impossible.
✅ Tests nicely, stripping the sup
tags in excerpts
✅ Rendering is unaffected outside of rendering excerpts
✅ Regex looks solid to me, and is specific enough that it's not going to accidentally remove other markup that shouldn't be removed
When backporting to core, it might be worth adding a couple of unit tests to confirm the removal works as expected? Other than that, this LGTM! ✨
return $content; | ||
} | ||
|
||
static $footnote_pattern = '_<sup data-fn="[^"]+" class="[^"]+">\s*<a href="[^"]+" id="[^"]+">\d+</a>\s*</sup>_'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working pretty well, thanks! I'm not sure what else we could do here.
Looks like it handles most of the inline changes we can make to the footnote sup
I only think with regexes like these, especially for new functionality, it would be great to have unit tests. That way when breaking changes come down the line we can react immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the HTML tag processor here? Cc @dmsnell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also interesting to know this.
When doing the backport I had a quick look at whether it could remove tags like it could with attributes/classes and I couldn't find anything.
The other option would be a DOM parser to strip the <sup />
and its contents but not sure if that would be a faster way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any other possible tag inside of here besides <sup>
and <a>
?
The Tag Processor cannot easily remove this pattern, but the HTML Processor would be able to if I added <sup>
support and the ability to remove a tag.
This could all plausibly happen for the WordPress 6.4 release, no promises.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there's an "ugly" way to do it with the Tag Processor by subclassing and accessing internals, but the HTML Processor is working well enough I'm not experimenting with that approach anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the HTML tag processor here?
I don't believe so because WP_HTML_Tag_Processor
doesn't have a remove_tag
method. If it did, you could do something like this:
$p = new WP_HTML_Tag_Processor( $html );
while ( $p->next_tag( array( 'tag_name' => 'SUP' ) ) ) {
if (
$p->get_attribute( 'data-fn' ) &&
$p->get_attribute( 'class' )
) {
$p->set_bookmark( 'footer_sup' );
if (
$p->next_tag() &&
'A' === $p->get_tag() &&
$p->get_attribute( 'href' ) &&
$p->get_attribute( 'id' )
) {
$p->seek( 'footer_sup' );
$p->remove_tag(); // 👈 This method doesn't exist.
}
$p->release_bookmark( 'footer_sup' );
}
}
But @dmsnell would know better 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Oops race condition)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just flagging this with @dmsnell as another place we're processing HTML 😄
+1 Also, testing this PR made me notice that footnotes are not restored with previous revisions. |
I'll create an issue 👍🏻 |
Should this be merged in time, it would be great to have in WordPress 6.3 and also in GB 16.2. |
Thanks for the thorough reviewing, everyone, and thanks for the follow-up items, @ramonjd!
Yeah, I agree that a few such solutions are on the table for Core itself, though I'm glad you're validating the current approach!
Testing concurrent formats is something I completely missed and it was a great call of yours. I was worried for a second about such situations leading to certain markup that the regular expression would then miss, but I believe that in practice this can't happen, because:
|
* Trim footnote anchors from excerpts * Add comments, fix spacing, appease linter
I just cherry-picked this PR to the release/16.2 branch to get it included in the next release: 0c1a9c6 |
…dd/defer-script-loading-strategy * 'trunk' of https://github.com/WordPress/gutenberg: Update Changelog for 16.2.0 Adding support for defined IDs in `TextControl` component (#52028) Bump plugin version to 16.2.0 Revert "Bump plugin version to 16.2.0" Bump plugin version to 16.2.0 Add maxLength to LinkControl search items (#52523) [RNMobile] Update Editor block inserter button styles and default text input placeholder/selection styles (#52269) Site Editor: Reset device preview type when exiting the editing mode (#52566) Trim footnote anchors from excerpts (#52518) Add back old Navigation and File blocks JavaScript implementation when Gutenberg is not installed (#52553) Block Editor: Ensure synced patterns are accounted for in 'getAllowedBlocks' (#52546) Fix md5 class messed up with new block key (#52557) Fix entity cache misses for single posts due to string as recordKey (#52338) Make "My patterns" permanently visible (#52531) Hide site hub when resizing frame upwards to avoid overlap (#52180) Fix "Manage all patterns" link appearance (#52532) Update navigation menu title size & weight in detail panels (#52477) Site Editor Patterns: Ensure sidebar does not shrink when long pattern titles are used (#52547) Site Editor: Restore quick inserter 'Browse all' button (#52529) Patterns: update the title of Pattern block in the block inspector card (#52010)
* Trim footnote anchors from excerpts * Add comments, fix spacing, appease linter
I just cherry-picked this PR to the update/first-batch-pre-RC1 branch to get it included in the next release: 1591f04 |
* Site Editor: Restore quick inserter 'Browse all' button (#52529) * Site Editor: Restore quick inserter 'Browse all' button * Remove leftover comment * Patterns: update the title of Pattern block in the block inspector card (#52010) * Site Editor Pages: load the appropriate template if posts page set (#52266) * This commit: - links the posts page to the homepage template when a post page is set - abstracts logic to get page item props * The Posts Page resolves to display the Home or Index template only. Adding a check to skip the Front Page * Showing homepage settings for posts pages that are set as the post page in reading settings * Post pages that have been set to display posts will redirect to first the home template, then the index template. The fallback is the post id of the page. * Reverted refactor of packages/edit-site/src/components/sidebar-navigation-screen-page/index.js Will do it in a follow up * Allow editing existing footnote from formats toolbar (#52506) * Block Editor: Display variation icon in the 'BlockDraggable' component (#52502) * Patterns: add option to set sync status when adding from wp-admin patterns list (#52352) * Show a modal to set sync status if adding pattern from pattern list page * Make sure the modal loads if post settings panel not open * don't load modal component at all if not new post * Simplify the sync status so undefined always = synced * Update packages/editor/src/components/post-sync-status/index.js --------- Co-authored-by: Ramon <ramonjd@users.noreply.github.com> * Revise LinkControl suggestions UI to use MenuItem (#50978) * Use "link" instead of "URL" for URL_TYPE * Use MenuItem for search create button * Use sentence case for "Create page" * Use a MenuGroup for search results * Use MenuItem for search item * Refactoring styles (WIP) * Preserve whitespace in results text * Reinstate result item information including permalink * Remove debugging CSS code * Reinstate CSS to control size of rich previews favicon * Remove other commented out CSS code * Reinstate selected styles * Remove more redundant CSS * Add some basic results hover/focus styling. Needs improving * Improve icon alignment * Update tests to handle wording changes * Remove inconsistent hover/focus style MenuItem already has hover/focus styles * Reinstate is-selected visual state * Update test to make sense in context of #51011 See #51011 * Fix locator for result text --------- Co-authored-by: Dave Smith <getdavemail@gmail.com> * Fix LinkControl mark highlight to bold (#52517) * Add 'reusable' keyword to Pattern blocks (#52543) * Fix missing Add Template Part button in Template Parts page (#52542) * Tweak copy for the reusable block rename hint (#52581) * Trim footnote anchors from excerpts (#52518) * Trim footnote anchors from excerpts * Add comments, fix spacing, appease linter * Site Editor: Reset device preview type when exiting the editing mode (#52566) * Make "My patterns" permanently visible (#52531) * Hide site hub when resizing frame upwards to avoid overlap (#52180) * Fix "Manage all patterns" link appearance (#52532) * Fix "Manage all patterns" link * Update focus style * Update navigation menu title size & weight in detail panels (#52477) * Update menu title size * Adjust font weight * Site Editor Patterns: Ensure sidebar does not shrink when long pattern titles are used (#52547) * Edit Site: Select templateType from the store inside the useSiteEditorSettings hook (#52503) * Add width to ensure ellipsis is applied (#52575) * Cover Block: Fix block deprecation when fixed background is enabled (#51612) * ResizableFrame: Make keyboard accessible (#52443) * ResizableFrame: Make keyboard accessible * Fix outline in Safari * Use proper CSS modifier * Add aria-label to button * Keep handle enlarged when resizing (Safari) * Add back visually hidden help text * Don't switch to edit mode * Make the handle a role="separator" * Revert to `button` * Switch description text to `div hidden` * Prevent keydown event default when right/left arrow * Change minimum frame width to 320px * Mention shift key in description text * Only render resize handle when in View mode * Fix importing classic menus (#52573) * use the same create hook for classic import * Remove redundant arg to hook --------- Co-authored-by: Dave Smith <getdavemail@gmail.com> * Site Editor: Fix navigation menu sidebar actions order and label (#52592) * Avoid errors in Dimension visualizers when switching between iframed and non-iframed editors (#52588) * Patterns: Add client side pagination to patterns list (#52538) Co-authored-by: Saxon Fletcher <saxonafletcher@gmail.com> * Site Editor: Make sidebar back button go *back* instead of *up* if possible (#52456) * Navigator: Add replace option to goTo() and goToParent() * Site Editor: Make sidebar back button go back instead of up if possible * Adapt template part hint copy (#52527) * Try "panel" instead of "page" * Update template-part-hint.js --------- Co-authored-by: George Mamadashvili <georgemamadashvili@gmail.com> Co-authored-by: Glen Davies <glendaviesnz@users.noreply.github.com> Co-authored-by: Ramon <ramonjd@users.noreply.github.com> Co-authored-by: Miguel Fonseca <150562+mcsf@users.noreply.github.com> Co-authored-by: Rich Tabor <hi@richtabor.com> Co-authored-by: Dave Smith <getdavemail@gmail.com> Co-authored-by: Robert Anderson <robert@noisysocks.com> Co-authored-by: James Koster <james@jameskoster.co.uk> Co-authored-by: Andrew Serong <14988353+andrewserong@users.noreply.github.com> Co-authored-by: arthur791004 <arthur.chu@automattic.com> Co-authored-by: Aki Hamano <54422211+t-hamano@users.noreply.github.com> Co-authored-by: Lena Morita <lena@jaguchi.com> Co-authored-by: Andrei Draganescu <me@andreidraganescu.info> Co-authored-by: Saxon Fletcher <saxonafletcher@gmail.com>
Thanks @mcsf! |
What?
Fixes #52035
Strips footnote anchors from post content whenever excerpts are being computed by WordPress. That is, this removes the
<sup><a...></sup>
marks, so that a piece of content like "Hello1, world2" becomes "Hello, world".Why?
See parent issue.
How?
Hook into the
the_content
filter, but only when this is part of a widerget_the_excerpt
filter.Question: Is this the best approach to fix the issue?
Testing Instructions
Screenshots or screencast
On the left: a post with footnotes. On the right: the index page with post excerpts stripped of footnote anchors.