Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carousel: Improve DOM Parsing #13547

Closed
wants to merge 2 commits into from
Closed

Conversation

mdawaffe
Copy link
Member

@mdawaffe mdawaffe commented Sep 26, 2019

Fixes #13531, but not really.

Changes proposed in this Pull Request:

  • Separate ->add_data_to_container() into ->add_data_to_container() for the gallery_style filter (which filters only an opening <div> tag), and ->add_data_to_html() for general HTML.
  • Wrap input to DOMDocument->loadHTML() in fake root tag.
  • Specify blog's charset in input to DOMDocument->loadHTML().
  • Add tests for different charsets.

Is this a new feature or does it add/remove features to an existing part of Jetpack?

No: bug fix.

Testing instructions:

  • phpunit --testsuite=carousel
  • See that some tests fail :(

Proposed changelog entry for your changes:

Carousel: Improve HTML parsing.

* Separate `->add_data_to_container()` into `->add_data_to_container()`
  for the `gallery_style` filter (which filters only an opening `<div>`
  tag), and `->add_data_to_html()` for general HTML.
* Wrap input to `DOMDocument->loadHTML()` in fake root tag.
* Specify blog's charset in input to `DOMDocument->loadHTML()`.
* Add tests for different charsets.
@mdawaffe mdawaffe added [Feature] Carousel A fullscreen modal appearing when clicking on an image in a gallery or tiled gallery. [Status] Needs Review To request a review from fellow Jetpack developers. Label will be renamed soon. DO NOT MERGE don't merge it! labels Sep 26, 2019
@mdawaffe mdawaffe added this to the 7.8 milestone Sep 26, 2019
@mdawaffe mdawaffe requested review from jeherve and a team September 26, 2019 00:42
@mdawaffe mdawaffe self-assigned this Sep 26, 2019
@matticbot
Copy link
Contributor

Caution: This PR has changes that must be merged to WordPress.com
Hello mdawaffe! These changes need to be synced to WordPress.com - If you 're an a11n, please commandeer, review, and approve D33204-code before merging this PR. Thank you!

@jetpackbot
Copy link

jetpackbot commented Sep 26, 2019

Thank you for the great PR description!

When this PR is ready for review, please apply the [Status] Needs Review label. If you are an a11n, please have someone from your team review the code if possible. The Jetpack team will also review this PR and merge it to be included in the next Jetpack release.

Scheduled Jetpack release: October 1, 2019.
Scheduled code freeze: September 24, 2019

Generated by 🚫 dangerJS against b67eb64

@matticbot
Copy link
Contributor

mdawaffe, Your synced wpcom patch D33204-code has been updated.

@mdawaffe
Copy link
Member Author

mdawaffe commented Sep 26, 2019

This was the best I could come up with for improving HTML parsing via DOMDocument.

I don't think it's good enough :( As you can see from the failing (new) tests:

  1. Entities like &euml; in the input will be decoded (ë) in the output. It would be odd if we changed the bytes of a text node just to add an attribute. The ultimate output (the DOM as shown by the browser) would be the same, but something might be depending on certain characters being encoded as entities in the HTML, and this use of DOMDocument would break that.
  2. The output of ->saveHTML() is always UTF-8, no matter what the charset was of the input. This is definitely a blocker. Fixing would require using mb_convert_encoding().

The current code (#13446) also fails these tests. (The tests need to be tweaked to run with the code in #13446: there is no ->add_data_to_html() function in #13446.)

Should we switch back to the regex str_replace() for now?

@jeherve
Copy link
Member

jeherve commented Sep 27, 2019

Closing in favour of #13554

@jeherve jeherve closed this Sep 27, 2019
@kraftbj kraftbj removed the [Status] Needs Review To request a review from fellow Jetpack developers. Label will be renamed soon. label Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO NOT MERGE don't merge it! [Feature] Carousel A fullscreen modal appearing when clicking on an image in a gallery or tiled gallery. Touches WP.com Files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Carousel: issues modifying markup in some environments
5 participants