-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ImageCDN parsing to rely on HTML API instead of RegExps #32700
Conversation
Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.
Interested in more tips and information?
|
Thank you for your PR! When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:
This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖 The e2e test report can be found here. Please note that it can take a few minutes after the e2e tests checks are complete for the report to be available. Follow this PR Review Process:
Still unsure? Reach out in #jetpack-developers for guidance! |
ba4718b
to
564599c
Compare
b318fbb
to
dc3a6b1
Compare
6b724e7
to
583e9ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for proposing this! This is a nice way to use WP_HTML_Tag_Processor
, and something we can do now that we'll be requiring WordPress 6.2 (#31638).
I have not tested your PR, but left a couple of comments below about how things work in the monorepo, if that can help you discover how we usually do things :)
@haqadn @Automattic/heart-of-gold can you make sure you keep an eye on this one? |
484deb0
to
ffab63f
Compare
e51843b
to
5e23682
Compare
One note I should leave here is to ask about pulling in Core's The issue here is that the test output changes but in a semantically neutral way. I could rewrite the test so that it hardcodes the newer ordering of output, but that would leave this fragility in place for the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I try and find someone to review this? are either of you happy with the proposed change?
Maybe @Automattic/heart-of-gold could take a look?
/** | ||
* Allow specific images to be skipped by Photon. | ||
* | ||
* @TODO: Does this need to pass the full HTML of the image tag? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at usage of this filter in other plugins in the WordPress.org plugin directory. While most plugins do not use this third parameter, there are a few that do, sometimes to look for a specific class name in the HTML. So I think we should keep this available if possible.
* | ||
* @see https://developer.wordpress.com/docs/photon/api/ | ||
* @TODO: What is the point of passing $images and $index. Are they required? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked around but couldn't figure out where that filter is being used today. It was used in the past, but may not be used anymore today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look good to me and no problems found after testing. 👍🏼
P.S.: I only worked on migrating from a Jetpack module to image-cdn package. I don't have prior knowledge of the implementation details on the core functionality.
I will reconstruct the existing filter value before merge. |
@@ -178,6 +178,51 @@ protected function helper_remove_image_sizes() { | |||
remove_image_size( 'jetpack_soft_oversized_after_upload' ); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these functions are okay here.
Personally, I would put them in the parent class at class-image-cdn-attachment-test-case.php
but, it looks like this is the only (current) test that extends this particular class.
Since we're late in the beta testing week, I'd say let's merge this next week after the branch cut so we get the longest amount of time to test it in the wild before the next dotorg release |
) The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and visit all images, then read and modify them based on the values of their attributes and computed Photon properties. In the previous code the `Image_CDN` class scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them. Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com>
ab212e3
to
424e68a
Compare
) The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and visit all images, then read and modify them based on the values of their attributes and computed Photon properties. In the previous code the `Image_CDN` class scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them. Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com>
424e68a
to
074a797
Compare
) The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and visit all images, then read and modify them based on the values of their attributes and computed Photon properties. In the previous code the `Image_CDN` class scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them. Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com>
074a797
to
c7493a4
Compare
) The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and visit all images, then read and modify them based on the values of their attributes and computed Photon properties. In the previous code the `Image_CDN` class scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them. Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com>
c7493a4
to
227be97
Compare
No `assertEquivalentMarkup` exists yet, so this gets around that without creating one.
Fixed broken test. @dmsnell |
@dilirity - thanks for fixing those tests - not sure how I copy/pasted the original typo on as far as I know it's ready to go, and it was ready to go, but I'm not really sure how to test and vet it to the level of quality at which I would have preferred (since I was jumping in to unfamiliar code and proposing a change in a system I'm not working in). if you think this is sound (and according to the tests and a code audit it seems ready) then it should be fine to merge. had I been in a place where I could thoroughly test or watch and react to failures post-merge I would have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I was using two different images. Ignore this. Though I'm not sure why only one of these images' width/height is 150 when both are set to use it.
So I did some testing and found something odd. Markup for resulted images below.
If you add an image via the Image block and use these settings:
On the production version of Boost, it looks like this:
<img decoding="async" width="150" height="150" src="https://i0.wp.com/honest-shark.jurassic.ninja/wp-content/uploads/2024/08/donnie-rosie-O7L3MrlSAHA-unsplash.jpg?resize=150%2C150&ssl=1" alt="" class="wp-image-12" style="aspect-ratio:4/3;object-fit:cover" srcset="https://i0.wp.com/honest-shark.jurassic.ninja/wp-content/uploads/2024/08/donnie-rosie-O7L3MrlSAHA-unsplash-scaled.jpg?resize=150%2C150&ssl=1 150w, https://i0.wp.com/honest-shark.jurassic.ninja/wp-content/uploads/2024/08/donnie-rosie-O7L3MrlSAHA-unsplash-scaled.jpg?zoom=2&resize=150%2C150&ssl=1 300w, https://i0.wp.com/honest-shark.jurassic.ninja/wp-content/uploads/2024/08/donnie-rosie-O7L3MrlSAHA-unsplash-scaled.jpg?zoom=3&resize=150%2C150&ssl=1 450w" sizes="(max-width: 150px) 100vw, 150px" data-recalc-dims="1">
However, this branch running Boost seems to ignore the size:
<img fetchpriority="high" decoding="async" width="1707" height="2560" src="https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=150%2C150&ssl=1" alt="" class="wp-image-14" style="aspect-ratio:4/3;object-fit:cover" srcset="https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?w=1707&ssl=1 1707w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=200%2C300&ssl=1 200w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=683%2C1024&ssl=1 683w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=768%2C1152&ssl=1 768w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=1024%2C1536&ssl=1 1024w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=1365%2C2048&ssl=1 1365w" sizes="(max-width: 1000px) 100vw, 1000px">
I'm not sure if this is Boost related, but I don't think we've made any changes to Image CDN. Boost is running on the Free version with only Critical CSS
and Image CDN
enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I couldn't find anything broken. Code-wise it is good and tests seem to be happy.
I think we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving after merge with trunk.
) * Refactor ImageCDN parsing to rely on HTML API instead of RegExps (#32700) The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and visit all images, then read and modify them based on the values of their attributes and computed Photon properties. In the previous code the `Image_CDN` class scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them. Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com> * Rearrange semantically equivalent test output to avoid false negatives. No `assertEquivalentMarkup` exists yet, so this gets around that without creating one. * Fix broken test * Remove unnecessary comment * Fix static analysis issues * Fix static analysis issue * Bump project version to 0.4.7-alpha * Fix project version --------- Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com> Co-authored-by: Peter Petrov <peter.petrov89@gmail.com>
Status
This is a work in progress and isn't tested or verified.This has been reviewed, but the filters aren't tested because it's unclear what code might rely on them.Due to the change in indentation the diff view is more associated with the actual changes if ignoring whitespace.
Proposed changes:
The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and extract images that are direct children of an anchor ("A" tag), then read and modify them based on the values of their attributes and computed Photon properties.
In the previous code the
Image_CDN
class scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them.Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Extra care is taken to ensure that only images that are the single child of a link are matched.
In this change the values of the
tag
key in some of the filters has changed from the initial matched HTML snippet to the name of the image tag, which could beIMG
orAMP-IMG
orAMP-ANIM
. An update to the Tag Processor or a custom sub-class thereof could provide the original HTML snippet and match the existing behavior, but that hasn't been done in this patch yet given the author's uncertainty about the use and value of those snippets.Other information:
Jetpack product discussion
This mandates running on WordPress 6.2. If we prefer being able to run on older versions, we could also include a copy of the Tag Processor in the plugin.
Does this pull request change what data or activity we track or use?
No.
Testing instructions:
Test suite should pass. Manual review is necessary though.
I don't even know what all the function I modified is supposed to do fully, so code auditing and review of the modifications is critical.