Problem with tests 427664 and 427672 on certain OSes!

There is a difference in output message text using ARM / Raspberry Pi 2, RPI, first reported as [Issue 258](https://github.com/htacg/tidy-html5/issues/258), [Issue 266](https://github.com/htacg/tidy-html5/issues/266), [Issue 269](https://github.com/htacg/tidy-html5/issues/269), by @vielmetti, back in Sep 13, 2015. Thank you for that report. Maybe others...

One or both may also be a problem on the MAC OS X, reported by @balthisar. To be verified.

First to try to examine the exact **reason** for these two test...

```
Test 427664 - now https://sourceforge.net/p/tidy/bugs/4/
#4 Missing attr values cause NULL segfault
Created: 2001-05-27 Creator: Terry Teague
```

```
Test 427672 - now https://sourceforge.net/p/tidy/bugs/10/
#10 Non-std attrs w/multibyte names segfault
Created: 2001-05-27 Creator: Terry Teague
```

Both these test inputs existed in SF CVS source, without a special config file. Both reported a **segfault** at that time! And both input files seems exactly the same as in this github tests repo. And in a binary compare, namely in_427664.html == case-427664.html and in_427672.html == case-427672.html! So no change has been made in the inputs. Of course, the SF CVS has no `testbase-expects' output to compare with...

However, re-running `tidy04aug00`, even adding the suggested `-utf8` option, on each file, does **NOT** produce a segfault, as far as I can see...

But running `tidy04aug00`, for which I do **not** have the source, on both inputs, using DrMemory, does show it has -

```
 Error #1: UNADDRESSABLE ACCESS beyond top of stack: reading 4 byte(s)
```

But this is **not** exacly a segfault due to a NULL pointer! And repeating the tests using `tidy2000`, for which we do have the source, does not show any problems...

And while, for some reason I can not yet run DrMemory using the current tidy 5.1.45++, it also appears to **not** have a **segfault**! And need to also try in linux using valgrind, ASAN, testing...

But, for sure, that **segfault** seems to have been **solved**, the reason for the two tests. 

So there remains this mystery of the character encoding differences in the message output in certain OS environments, which still need to be **solved**.
#### What is in the `testbase` input, and `testbase-expects`?

Essentially both input file have `<body name="xx">`. A comment in the files says the `name` is supposed to be 2 bytes hex c3 87, but it is **not**! Now maybe this is a corruption from a long way back, but even in SF CVS source the `name` is a 4 byte sequence of `C3 31 2F 32`.

Thus, in their present state, both inputs do **not** verify as valid utf-8 text. They would if changed back to the `c3 87` given in the comment, and yet to test if that changes the situation.

In parsing this document, tidy finds this 4 byte sequence is **not** a valid attribute `name`, and outputs a warning. Now it is the value output for that `name` in the warning message differs in RPI OS. And maybe in OS X, still to be verified.

Tidy in Windows, and Ubuntu linux consistently outputs a 9 byte sequence `EF BF BF EF BF BF 31 2F 32`, and this is what is in `testbase-expects`, so the compare is exact. No problem.

While Tidy in RPI outputs, in `testbase-results` a 7 byte sequence `c3 83 c2 83 31 2f 32`, so the diff fails. A problem.
#### What can we do?
- Reduce the attribute name to just `1/2`, which is still invalid, so keeps the tests meaning.
- Change the file back to valid utf-8 `c3 87`, change the expected accordingly - to be tested.
- Maybe a fix in Tidy code could force RPI to use `EF BF BF` output.
- If also a problem in OS X, maybe exclude the 2 tests.
- If only in RPI, be ready to explain that this difference exists in these 2 tests.
- Other choices?

I seek ideas and comments on what would be best?

As previously expressed, I think it is important that we have a **consistent** set of tests across **all** OSes, and to not have to try and explain a difference every time someone stumbles across it.

**Help Needed**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problem with tests 427664 and 427672 on certain OSes! #3

What is in the `testbase` input, and `testbase-expects`?

What can we do?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem with tests 427664 and 427672 on certain OSes! #3

Description

What is in the testbase input, and testbase-expects?

What can we do?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

What is in the `testbase` input, and `testbase-expects`?