Fix broken encoding when using document fragments #99

schlessera · 2021-03-16T13:46:29Z

This ensures that the http-equiv charset meta tag needed for making the DOMDocument work properly with UTF-8 encoding is always added, even when all or parts of the HTML document structure are missing.

The current implementation does multiple regex calls, with the assumption that this optimizes for large documents where a <head> tag would normally always be present. So for any full real-world HTML document, only the first regex would ever be used. For actual document fragments, this assumes they would be small anyway, making multiple regex traversals cheap in these cases.

These multiple regex calls could be combined into one, but this would likely make the most common case much worse in terms of performance. Thoughts on that, @westonruter ?

Fixes #28

codecov · 2021-03-16T13:46:41Z

Codecov Report

Merging #99 (4758e21) into main (7d6402e) will increase coverage by 0.09%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main      #99      +/-   ##
============================================
+ Coverage     80.87%   80.96%   +0.09%     
- Complexity      928      931       +3     
============================================
  Files            48       48              
  Lines          2311     2322      +11     
============================================
+ Hits           1869     1880      +11     
  Misses          442      442

Flag	Coverage Δ	Complexity Δ
php	`80.96% <100.00%> (+0.09%)`	`0.00 <3.00> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ	Complexity Δ
src/Dom/Document.php	`83.51% <100.00%> (+0.29%)`	`222.00 <3.00> (+3.00)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7d6402e...4758e21. Read the comment docs.

src/Dom/Document.php

schlessera added 2 commits March 16, 2021 13:30

Add test to trigger bug

364e7dd

Fix test in SSR

0f2f06d

schlessera added Bug Something isn't working DOM labels Mar 16, 2021

schlessera added this to the 0.2.0 milestone Mar 16, 2021

schlessera requested a review from westonruter March 16, 2021 13:52

Compute fragment encoding test cases

68969ad

westonruter reviewed Mar 16, 2021

View reviewed changes

src/Dom/Document.php Outdated Show resolved Hide resolved

Remove unneeded compat code

4758e21

schlessera requested a review from westonruter March 17, 2021 11:01

westonruter approved these changes Mar 17, 2021

View reviewed changes

schlessera merged commit 9d70600 into main Mar 18, 2021

schlessera deleted the fix/28-broken-encoding-on-fragments-without-head branch March 18, 2021 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken encoding when using document fragments #99

Fix broken encoding when using document fragments #99

schlessera commented Mar 16, 2021 •

edited

Loading

codecov bot commented Mar 16, 2021 •

edited

Loading

Fix broken encoding when using document fragments #99

Fix broken encoding when using document fragments #99

Conversation

schlessera commented Mar 16, 2021 • edited Loading

codecov bot commented Mar 16, 2021 • edited Loading

Codecov Report

schlessera commented Mar 16, 2021 •

edited

Loading

codecov bot commented Mar 16, 2021 •

edited

Loading