-
Notifications
You must be signed in to change notification settings - Fork 432
Description
This occurred when I upgraded from tidy-html5 5.4.0 to 5.6.0. I also upgraded php at the same time to 7.1.14, but php-src/ext/tidy has no recent changes
#207 (comment) seems related. I don't see any discussion about how xhtml would be affected in #207
http://validator.w3.org/check would warn about hello & bye
as an XHTML fragment. If XHTML5 is HTML5 represented as XML, shouldn't that be automatically fixed by tidy-html5?
- If I'm missing any necessary options, please let me know
The below script reproduces this bug with PHP 7.1.9 (Running tidy-html5 5.4.0) and 7.1.14 (Built with tidy-html5 5.6.0)
<?php
// Related to https://github.com/htacg/tidy-html5/issues/526 ?
const CONFIG = [
'output-xhtml' => true,
'show-errors' => false,
];
function print_tidy(string $html, array $config) {
$tidy = new tidy();
$tidy->parseString($html, $config, 'utf8');
$tidy->cleanRepair();
// errors are in $tidy->errorBuffer)
$errors = [];
if (!empty($tidy->errorBuffer)) {
$errors = array_merge($errors, explode("\n",$tidy->errorBuffer));
}
$clean = (string)$tidy;
printf("In %s: %s (errors %s)\nconfig: %s\n", PHP_VERSION, $clean, json_encode($errors), json_encode($config));
}
print_tidy('hello & bye', CONFIG);
// EDIT: I've also tried print_tidy('<body>hello & bye</body>', CONFIG + ['preserve-entities' => true, 'quote-ampersand' => true, 'doctype' => 'strict', 'output-encoding' => 'ascii', 'input-encoding' => 'utf8']);
// but none of those additional options resulted in something different with the tidy-html5 bindings
/*
In 7.1.14: <!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
hello & bye
</body>
</html> (errors [])
config: {"output-xhtml":true,"show-errors":false}
*/
/*
In 7.1.9: <!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
hello & bye
</body>
</html> (errors [])
config: {"output-xhtml":true,"show-errors":false}
*/
More details: This seems specific to the PHP bindings (Or maybe CLI bindings are setting options that I missed). echo 'hello & bye' | ./tidy -asxhtml
works properly for me, generating hello & bye
in both tidy-html5 5.4.0 and 5.6.0 (with make clean
then rebuilding with git checkouts
When I rebuilt php 7.1.14's tidy extension with tidy-html5-5.4.0, the result was hello & bye
, as expected/wanted.
For details on how I built php, see #673 (comment)