Skip to content

Pushing for full HTML5 support #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
geoffmcl opened this issue Aug 4, 2014 · 25 comments
Closed

Pushing for full HTML5 support #122

geoffmcl opened this issue Aug 4, 2014 · 25 comments

Comments

@geoffmcl
Copy link
Contributor

geoffmcl commented Aug 4, 2014

Removed this rather broken orignal post, but still 'Pushing for full HTML5 support.

Regards,
Geoff.

@geoffmcl
Copy link
Contributor Author

geoffmcl commented Aug 4, 2014

Also another broken post...

Best regards,
Geoff.

@geoffmcl geoffmcl closed this as completed Aug 4, 2014
@skynet
Copy link
Member

skynet commented Aug 25, 2014

Hello Geoff –

Just came across your tidy-fork repo and saw that yours is more frequently updated than w3c/tidy-html5.

Can you please tell us what’s new in your version and if you plan to document your changes. Is it possible to merge these changes back here?

Thank you.

IR

@geoffmcl
Copy link
Contributor Author

Hi Ionel,

What a pleasure to get your email ;=)) It seemed no one was interested
in furthering tidy so I reverted back to other things...

what’s new in your version

First I swung over the a cmake build which supports cross platform
building using a wide variety of 'native' tools all through a single,
easy to maintain CMakeLists.txt. And currently build a WIN32 and WIN64
in Windows, and an Ubuntu linux 14.04 64-bit versions of tidy. At
present it is quite a basic form, but lots of other cmake 'features'
could be added... if needed... One obvious one on my TODO list is to
supply the RELEASE_DATE and PLATFORM_NAME from within CMakeLists.txt
(removing version.h)...

I now have quite a LOT of experience with cmake, and even have a
'cmake-test' repo - https://gitorious.org/fgtools/cmake-test - where I
experiment with lots of different 'features'...

Second, I gathered as much information from the W3C site, particularly,
but not restricted to,
http://www.w3schools.com/html/html5_new_elements.asp... and added about
30 test files in test/html5 to try to test each of the 'new' elements in
HTML5, making fixes to the code where it failed... some elements were
'missing' in the w3c/tidy-html5 source, perhaps deliberately...

Then added about a dozen pairs of html to test elements removed in 5 -
the pair consists of the element in HTML4, and then the element in
HTML5, again adding/fixing code where necessary to get through these
tests. The idea here is that if the document is found to be HTML4 then
there should be no warning, however is found to be HTML5 then there
should be a warning (or error), and if tidy is configured to clean, then
it should try to remove/fix what it can. As you may know tidy has always
replaced

with
    so some of this was already there... but more
    to do...

    Then back to the original nearly 250 'test/input' files, to try to check
    that the HTML5 additions/changes had NOT badly broken HTML4 and earlier...

    Here I started to run out of steam ;=(( With no one to discuss with,
    bounce ideas off, no feedback! There were things that I could not decide
    one way or another, so as stated drifted back to other things...

    Is is possible to merge these changes back here?

    That was always intended, but I do not have commit rights to
    w3c/tidy-html5, nor to the original cvs repo, although as you may know
    have been involved with tidy for quite a long LONG time... but no one
    was responding, testing...

    The Questions:

    My main concern is how to keep HTML4 and earlier support, while still
    being able to handle HTML5, since they do have some contradictory elements?

    Of course one course of action would be to separate the library into to
    2 separate libraries, with one specifically for HTML5, and build a tidy5
    app, but that seems 'ugly'...

    In testing I have been using doctype: html5 to tell tidy to go HTML5
    'mode', and trigger extra HTML5 checks and fixes, but I think ideally
    tidy should 'detect' this, maybe because the document -

    (a) has the HTML5 doctype
    (b) has a without a content attribute
    (c) contains new element only in HTML5
    or a combination of these, or other things... but I never got this right...

    And maybe such a determination that this IS html5 could trigger some
    dynamic changes in the main element/attributes tables, but this seems a
    little difficult with the caching of these table element...

    So here we are... I would be willing to put in some more time on tidy,
    but only if there is a solid feedback channel, and others doing some
    testing, reporting...

    Regards,
    Geoff.

@geoffmcl geoffmcl reopened this Aug 26, 2014
@skynet
Copy link
Member

skynet commented Sep 4, 2014

Geoff

Thanks so much for the quick response and the insights.

While there are other libraries and tools for checking and cleaning up HTML, tidy-html5 is definitely the most reputable and it should be maintained, one way or another, but preferably under W3C. As of today (Sept 4, 2014) there are 59 open tickets, 12 pull requests and ... zero commits since Aug 21, 2012.

At this point there are two options: re-engage developers with commit access, or fork the repo and continue development. What do you, or others, think?

Thanks.

Ionel

Ionel

@petdance
Copy link
Contributor

petdance commented Sep 4, 2014

On Sep 4, 2014, at 2:00 PM, Ionel Roiban notifications@github.com wrote:

At this point there are two options: re-engage developers with commit access, or fork the repo and continue development. What do you, or others, think?

Who would be doing the "continue development"? It seems to me that the problem isn't the repo or the project but that we don't have developers with the time/motivation/etc to work on it.

If there are developers who are ready to do development on the project, then I wonder what they think next steps should be. You say "re-engage developers with commit access", which sounds like you're saying that more people would work on the project if they had commit access. Is that the case? Who are these developers that are being stymied by lack of commit access?

Andy Lester => www.petdance.com

@marcoscaceres
Copy link
Contributor

I would strongly suggest that people who are willing to work on this fork this repo into a HTML5Tidy organization here on GitHub - or that @sideshowbarker donate the project to such an organization. Then, developers who are willing to maintain the project can better organize, review code, etc.

@petdance
Copy link
Contributor

petdance commented Sep 4, 2014

Do you actually have "developers who are willing to maintain the project"? I don't think forking the project is going to make developers appear.

Don't get me wrong: I'd LOVE to have some forward direction on the project, and I'd contribute, but mostly it needs strong direction.

@skynet
Copy link
Member

skynet commented Sep 4, 2014

The only problem with a fork is that it will lose the reputation that the W3C organization provides to such an important project. Prior to forking and moving development elsewhere due diligence should take place here, in order to get W3C to acknowledge the situation.

@marcoscaceres
Copy link
Contributor

@skynet, that's a good point - but I don't think we can depend on the W3C on its own here. This is really just @sideshowbarker's pet project and not a W3C thing (i.e., they don't provide funding or support in any way). Clearly, @sideshowbarker doesn't have time to do any work on this (as he has stated in other bugs).

I'll wait for him to decide what to do. I've spoken to him in the past about opening it up, but he hasn't yet found anyone that he trusts in the community to take the project over. I don't see what option there is but to open it up to a few people who have shown, through PRs, capable of reviewing and maintaining the code.

Maybe we can get a show of interest from people who could commit time to this?

@skynet
Copy link
Member

skynet commented Sep 4, 2014

A better documentation Website could also help with making the project self-sustainable, increase traffic and adoption. http://w3c.github.io/tidy-html5/ is not enough for that purpose.

@geoffmcl
Copy link
Contributor Author

geoffmcl commented Sep 5, 2014

Hi skynet, petdance, marcoscaceres,

Good to read and 'feel' some interest ;=)).

1: Where should this discussion take place?

I do not think issues #122 here is the right place!

I would whole heartedly suggest tidy-dev/sorceforge
is where it should be at to -

(a) involve the current maintainer there, arnaud02,
charles, and bjoern... and other contributors now
years back.

(b) get an email when something is posted.

I can/will not check back at #122 all the time!
Just too inconvenient...

Ionel has been kind enough to cc's me otherwise
I would not have known about this discussion...

Except for me, admittedly not much has been happening
on tidy-dev of late, but a discussion like this could
possibly revive it.

I will try to cross post this reply... and include
the replies...

2: who can put in coding time

To repeat, I can, but to also repeat, I can/will not
do that in a vacuum ;=))

I have been contributing code fixes since around 2000,
some 14/15 years. Maybe longer...

But I need others to be testing, commenting, contributing,
debating... and hopefully some of those will have a
deeper understanding of what the w3C wants...

Ionel, you mentioned there are 59 open tickets, and 12
pull requests...

Well, I would look at these if -

(a) html-tidy5 is updated to where my tidy-fork is at,
and those needed are rechecked against the update.

(b) pull requests can only be merged by those with write
access... so I would need that access...

3: which repo

Do not really care, but would certainly these days
prefer a .git base repo...

(a) sideshowbarker was new to me, and still to explore
what is there... what is the aim... seems quite
active... at least on the 'validator' part...

(b) tidy-html5 - yes, subject to 2:(a) and (b) above.

(c) original sourceforge svn. Possibly, but only if
git is used which I understand is supported, and
then subject to 2: (a) and (b) again.

(d) My tidy-fork. Well this is where I will continue for
now...

Others???

4: Discussion on some of the conflicts between 4 and 5

As expressed, if the one 'library' is to do BOTH at
the same time, then it needs 'rules' when to treat
it as 4 or earlier, and when to treat a document as
5 and thus tidy/report/warn appropriately,

This is my current stalling point, and seek ideas...

Regards,
Geoff.

@skynet
Copy link
Member

skynet commented Sep 7, 2014

Geoff -

We shall continue the due diligence for contacting and getting W3C to acknowledge the situation but instead of waiting I would just fork and move on - I am sure there are a lot of developers waiting to contribute and help with testing. Thanks for yours and other's interest in reviving this project.

Ionel

@geoffmcl
Copy link
Contributor Author

Hi Ionel,

Yes, as you suggest am moving on with
my fork ;=)) But still not many 'testers'...

I just did a push to always show the
DOCTYPE information provided NOT quiet!

And in fact have accepted my first
feature request to support AngularJS
attributes...

Especially treating -
<script type="text/ng-template" id="id">
html content
<script>
as HTML5 content rather than javascript.

See here -
geoffmcl#2

Do you know what is the W3C position on
AngualarJS?

I note they have a tutorial on it in
W3Schools...

All for a better tidy...

Regards,
Geoff.

@skynet
Copy link
Member

skynet commented Sep 14, 2014

Glad to hear about your progress!

Just because they are named W3Schools does not mean they are associated in any way with W3C. Moreover their site is far from being a trustworthy reference source. It is a site built for SEO.

AngularJS is a Google project. My first guess is that, instead of micro-managing third-party attributes, you should make it possible for anyone to add custom attributes to the list of valid items.

Cheers!

@skynet
Copy link
Member

skynet commented Dec 16, 2014

Any news about this?

@geoffmcl
Copy link
Contributor Author

Hi Ionel,

While most of the AnuglarJS stuff was
relatively easy to support, trying to support
the 'template' just got _TOO_ difficult,
since tidy5 would need to fall back to html
parsing while in a script, so I abandoned it ;=()

At this point my tidy5 fork does everything I
want from a tidy5, so have done nothing new of
late...

The last effort was a few weeks ago just to
blow away ALL the previous 'build' methods,
most of which would no longer work anyway,
leaving only CMake...

Regards,
Geoff.

On 16/12/14 16:57, Ionel Roiban wrote:

Any news about this?


Reply to this email directly or view it on GitHub
#122 (comment).

@skynet
Copy link
Member

skynet commented Dec 18, 2014

Sounds good, thanks!

@balthisar
Copy link
Member

@geoffmcl, I can't do a PR to your fork, but perhaps you'll take a look at this:

It's your fork as ca. November, with several of the PR's in this branch manually merged in, primarily support for all of the vast array of attributes (plus my own PR for a secondary messages callback filter that I need for language localization).

Because so many of the existing PRs are based on the master branch, there were tons and tons of conflicts I had to handle manually.

It's a shame this branch isn't maintained, but as far as I can tell from activity, your fork is a lot more canonical than mine, and it would be good to have many of these PR's merged into a fork with visibility.

@geoffmcl
Copy link
Contributor Author

Hi balthisar,

Sorry for the delayed reply but still busy with
some other things for a little longer... but time
for a brief reply...

Wow, it is great to read LOTS of 'tidy' activity ;=))

Yes, after forking and adding what I wanted I did NOT
set up PR's because, quite frankly, nobody seemed
interested ;=(( There were already several outstanding
PR's and issues that did not seem to be getting any
attention...

As soon as I get a chance I will look at your -
https://github.com/balthisar/tidy-html5/tree/geoffmcl-restart

Now also watching htacg/tidy-html5...

And joined the HTML Tidy Advocacy Community Group right
after I 'joined' W3C, as 'geoffmcl' ;=))

Anyway, great to read lots... be back soonest... obviously
am still very interested in tidy's future...

Regards,
Geoff.

@balthisar
Copy link
Member

Geoff, we all understand time commitments! I'm hopeful that we won't let get things to backlogged anymore. We're still trying to figure out a proper strategy for the outstanding PRs.

Have you run through the regression tests with my fork? It's on my very long to-do list. Given that it includes a good number of the PR's, it would be convenient if we could verify its trustworthiness. (Note, have to remove my private project branding, though!).

@geoffmcl
Copy link
Contributor Author

Hi Jim,

My other projects have quietened down so found
some time to devote to tidy ;=))

  1. Cloned your fork - geoffmcl-restart branch

One item needed was to add a va_copy(a,b) to
compile in windows... in include\platform.h
added in the windows section -

#ifndef va_copy
#define va_copy(dest, src) (dest = src)
#endif

  1. Merged yours with my tidy-fork... quite a
    number of conflicts but ALL easily resolved ;=))

(i) The biggest one was src/attrdict.c - not
sure why really but git merge always gets broken
up over lots of space changes...

You had carefully moved every existing definition
out 4 spaces to match the new largest attribute
TidyAttr_ARIA_ACTIVEDESCENDANT, lot of good work,
but git gets confused... and say both modified:

On carefully checking this was NOT true, and
I was able to just copy yours over mine, and
all is happy.

(ii) The only other conflicts were all in the
'build' files.

I have decided to go 100% CMake since it generates
native make files for just about any desired
system, so had DELETED ALL the old make, starting
with the root Makefile, then in build directory,
deleted dirs gmake, gnuauto, msvc2010, rpm...

And fixed the README.md and README.html to match...

When I deleted all these same things in your
clone, the merge went very smoothly, except for
(i) above, but that is just a copy.

  1. After a windows compile check, and ran a few
    tests, mainly on ARIA, but included other things,
    did a push to my fork... to course to a
    geoffmcl-restart branch at this time...

And did a compile check in my Ubuntu 14.04 linux...
NO PROBLEMS ;=))

  1. Began to attack the issues open against my
    fork, and was able to close a number (3 or 4) with
    mostly simple fixes...

The most difficult is one concerning treating
ids as case sensitive, since we have to retain
the warnings/errors for html4 while removing
them if html5...

But have a good trace on this, and should be
able to close it soon also...

Also picked up one issue from htacg/tidy-html5
which was able to close changing just one word ;=))

For <script src="..." async>, change CH_PCDATA
to CH_BOOL for TidyAttr_ASYNC...

Started to look at some others... there are 64...

And looking at the 14 PRs, but most of which would
NOT be needed by my current tidy_fork/geoffmcl-restart
branch...

  1. Regression test

I certainly agree this is becoming IMPORTANT ;=()

To me it is important that tidy5 can do ALL that
the previous tidy did without problems, while
still being able to handle all the new and changed
stuff from HTML5...

I have had it as a note to myself, as Issue #1,
opened 6 Aug 2014, so it is about time I got
around to it ;=))

Will certainly try to deal with that during this
week, or soonest...

  1. Where to next?

(a) At this time htacg/tidy-html5 now seems so far
behind ;=(( and no cmake!

(b) I could easily bring your balthisar/tidy-html5 up
to mine, and maybe we could continue to keep them
in sync...

Of course until I see a direction emerging I will
continue to 'fix', 'update', 'test' my tidy-fork...

What is your idea?

Regards,
Geoff.

@balthisar
Copy link
Member

@geoffmcl, direction should be coming soon. Don't despair again.

@geoffmcl
Copy link
Contributor Author

So have pushed develop-500 branch to this repo, and would really appreciate any testing and reviewing of this branch. $ git checkout develop-500

This should make my tidy-fork redundant...

Will try to add the infrastructure label to this, since it contains a lot of discussion... and have also posted a message to -
https://lists.w3.org/Archives/Public/public-htacg/2015Jan/thread.html
where some of this dicussion could continue...

And hope any, all of you can help testing this latest... Thanks...

@skynet
Copy link
Member

skynet commented Jan 24, 2015

👍

@balthisar
Copy link
Member

I will close this issue as "overly broad." :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants