Fix broken encoding when using document fragments #99
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This ensures that the
http-equiv
charset meta tag needed for making theDOMDocument
work properly with UTF-8 encoding is always added, even when all or parts of the HTML document structure are missing.The current implementation does multiple regex calls, with the assumption that this optimizes for large documents where a
<head>
tag would normally always be present. So for any full real-world HTML document, only the first regex would ever be used. For actual document fragments, this assumes they would be small anyway, making multiple regex traversals cheap in these cases.These multiple regex calls could be combined into one, but this would likely make the most common case much worse in terms of performance. Thoughts on that, @westonruter ?
Fixes #28