Skip to content

Unescaped & emitted despite using **output-xhtml** key bindings in 5.6.0 in PHP bindings #704

@TysonAndre

Description

@TysonAndre

This occurred when I upgraded from tidy-html5 5.4.0 to 5.6.0. I also upgraded php at the same time to 7.1.14, but php-src/ext/tidy has no recent changes

#207 (comment) seems related. I don't see any discussion about how xhtml would be affected in #207

http://validator.w3.org/check would warn about hello & bye as an XHTML fragment. If XHTML5 is HTML5 represented as XML, shouldn't that be automatically fixed by tidy-html5?

  • If I'm missing any necessary options, please let me know

The below script reproduces this bug with PHP 7.1.9 (Running tidy-html5 5.4.0) and 7.1.14 (Built with tidy-html5 5.6.0)

<?php
// Related to https://github.com/htacg/tidy-html5/issues/526 ?
const CONFIG = [
  'output-xhtml' => true,
  'show-errors' => false,
];
 
function print_tidy(string $html, array $config) {
	$tidy = new tidy();
	$tidy->parseString($html, $config, 'utf8');
	$tidy->cleanRepair();
	// errors are in $tidy->errorBuffer)
	$errors = [];
	if (!empty($tidy->errorBuffer)) {
	    $errors = array_merge($errors, explode("\n",$tidy->errorBuffer));
	}
	$clean = (string)$tidy;
	printf("In %s: %s (errors %s)\nconfig: %s\n", PHP_VERSION, $clean, json_encode($errors), json_encode($config));
}
print_tidy('hello & bye', CONFIG);
// EDIT: I've also tried print_tidy('<body>hello & bye</body>', CONFIG + ['preserve-entities' => true, 'quote-ampersand' => true, 'doctype' => 'strict', 'output-encoding' => 'ascii', 'input-encoding' => 'utf8']);
// but none of those additional options resulted in something different with the tidy-html5 bindings
/*
In 7.1.14: <!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
hello & bye
</body>
</html> (errors [])
config: {"output-xhtml":true,"show-errors":false}
*/
/*
In 7.1.9: <!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
hello &amp; bye
</body>
</html> (errors [])
config: {"output-xhtml":true,"show-errors":false}
*/

More details: This seems specific to the PHP bindings (Or maybe CLI bindings are setting options that I missed). echo 'hello & bye' | ./tidy -asxhtml works properly for me, generating hello &amp; bye in both tidy-html5 5.4.0 and 5.6.0 (with make clean then rebuilding with git checkouts

When I rebuilt php 7.1.14's tidy extension with tidy-html5-5.4.0, the result was hello &amp; bye, as expected/wanted.

For details on how I built php, see #673 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions