-
Notifications
You must be signed in to change notification settings - Fork 8
Problem with tests 427664 and 427672 on certain OSes! #3
Comments
Ok, I just added a test 427664-1 to the Then added a Now this new test passes both win32 and Ubuntu linux testing, no problems... but... BUT, RPI still insists on flagging a difference in the message output! ;=(( The Now the way I read it, neither of test message outputs are correct!!! Why is the This points to a real character encoding problem in Of course the invalid attribute is not output to the html, so we can not see what tidy would do in that case... another test to be done... But it found a valid utf-8 input, but flagged it as a non-valid attribute Anyway, continuing to explore this problem... but any help, pointers, ideas, very welcome... |
Back to looking more closely at the attribute When the
Namely, Now the attribute name fails in And tidy uses Ok, there's the problem!While the warning message is correctly encoded as utf-8, the The
Also during this careful analysis I found at least two instances where loading this And ensuring there is no sign extention helps a little in this case, the message output still contains the bad sequence of bytes So that clearly explains why there is a difference, but does nothing about fixing the problem, which is there in all OSes/architectures! Now, presumably, if this was not an attribute name, but just some text, then in outputting the utf-8 tidy would translate it back to latin1 in the html. That is still be be tested and verified. But that open another question. Should the message file be utf-8 encoded? In this case the user has a configuration of Anyway, next is to explore carefully how tidy correctly outputs utf-8 to html. Obviously it can not be using a byte-by-byte output! There must be a way to correctly sequence utf-8... Or maybe instead of writing out these message byte-by-byte, they are just written whole? In my debug mode, I do that to the log file, and correctly get And to consider filing a tidy issue on tidy-html5... Some things to think about here! Some feedback would be very encouraging... |
Have opened an Issue 383 in the tidy-html5 repo to address this problem... |
After applying a 'fix' in tidy-html5, in the Please checkout and build tidy-html5, |
And it 100% passed in Ubuntu 14.04 LTS - YIPEE ;=)). And added a convenient Now to try Raspberry Pi 2, running Raspbian GNU/Linux 8.0 (jessie), with ARMv7 Processor rev 5 (v7l)... |
Well that was a little slower! Had to modify the scripts to get things to run at all... seems again the bash is not exactly the same, or something else was wrong... so created a run-testsG.sh, testallG.sh, _environG.sh and testoneG.sh to get it all working smoothly... And boy do I hate using pico as an editor in my remote putty connection to RPI... feels like back in the dark ages ;=)) AND of course had to remember to add another tidy config of But eventually got it all working... And it PASSED 100% - all Have left the new case-427664-1.html there, but had to create its own config, case-427664-1.conf, since that test contains only valid utf-8, so can not use the If others could test and report in the MAC OS X, and/or in every other system, maybe this issue can be closed ;=)) |
@geoffmcl, sorry I've been slow to get to this, but I was able to do this much on Mac OS X: testall.sh: Done 225 tests - see /Users/jderry/Development/htacg/tidy-html5-tests/cases/testbase-results.txt
run-tests.sh: Running 'diff -ua /Users/jderry/Development/htacg/tidy-html5-tests/cases/testbase-expects /Users/jderry/Development/htacg/tidy-html5-tests/cases/testbase-results'
run-tests.sh: Appear to have PASSED test 2
run-tests.sh: See full results in /Users/jderry/Development/htacg/tidy-html5-tests/cases/testbase-results.txt |
@balthisar thanks for testing in OS X, and advising the compare fully PASSES ;=)) I have now also conducted the tests in 2 more systems -
That is now a full pass in six (6) systems! As a git test, pushed some small changes from RPI... seems all working now...
Perhaps it is time to merge this Will close this and shortly merge |
There is a difference in output message text using ARM / Raspberry Pi 2, RPI, first reported as Issue 258, Issue 266, Issue 269, by @vielmetti, back in Sep 13, 2015. Thank you for that report. Maybe others...
One or both may also be a problem on the MAC OS X, reported by @balthisar. To be verified.
First to try to examine the exact reason for these two test...
Both these test inputs existed in SF CVS source, without a special config file. Both reported a segfault at that time! And both input files seems exactly the same as in this github tests repo. And in a binary compare, namely in_427664.html == case-427664.html and in_427672.html == case-427672.html! So no change has been made in the inputs. Of course, the SF CVS has no `testbase-expects' output to compare with...
However, re-running
tidy04aug00
, even adding the suggested-utf8
option, on each file, does NOT produce a segfault, as far as I can see...But running
tidy04aug00
, for which I do not have the source, on both inputs, using DrMemory, does show it has -But this is not exacly a segfault due to a NULL pointer! And repeating the tests using
tidy2000
, for which we do have the source, does not show any problems...And while, for some reason I can not yet run DrMemory using the current tidy 5.1.45++, it also appears to not have a segfault! And need to also try in linux using valgrind, ASAN, testing...
But, for sure, that segfault seems to have been solved, the reason for the two tests.
So there remains this mystery of the character encoding differences in the message output in certain OS environments, which still need to be solved.
What is in the
testbase
input, andtestbase-expects
?Essentially both input file have
<body name="xx">
. A comment in the files says thename
is supposed to be 2 bytes hex c3 87, but it is not! Now maybe this is a corruption from a long way back, but even in SF CVS source thename
is a 4 byte sequence ofC3 31 2F 32
.Thus, in their present state, both inputs do not verify as valid utf-8 text. They would if changed back to the
c3 87
given in the comment, and yet to test if that changes the situation.In parsing this document, tidy finds this 4 byte sequence is not a valid attribute
name
, and outputs a warning. Now it is the value output for thatname
in the warning message differs in RPI OS. And maybe in OS X, still to be verified.Tidy in Windows, and Ubuntu linux consistently outputs a 9 byte sequence
EF BF BF EF BF BF 31 2F 32
, and this is what is intestbase-expects
, so the compare is exact. No problem.While Tidy in RPI outputs, in
testbase-results
a 7 byte sequencec3 83 c2 83 31 2f 32
, so the diff fails. A problem.What can we do?
1/2
, which is still invalid, so keeps the tests meaning.c3 87
, change the expected accordingly - to be tested.EF BF BF
output.I seek ideas and comments on what would be best?
As previously expressed, I think it is important that we have a consistent set of tests across all OSes, and to not have to try and explain a difference every time someone stumbles across it.
Help Needed
The text was updated successfully, but these errors were encountered: