Omd_tyxml #211

shonfeder · 2020-06-28T21:00:41Z

This is just at an early exploratory stage, to explore out the space and be sure I'm headed in the right direction, before I actually set out in earnest.

Closes: #82

src/omd.ml

src/omd.mli

tests/dune.inc

tests/omd_tyxml.ml

dune-project

shonfeder · 2020-06-30T03:35:44Z

Thanks very much for the initial feedback, @nojb! And sorry for the very rough state! As I noted, I only meant to rough in some broad strokes to get direction on a few major points, and your replies here -- and on the originating issue -- have been super helpful and informative in this direction.

shonfeder

AH! I forgot to submit these comments. They help make more sense

tests/omd_tyxml.ml

nojb · 2020-07-04T18:37:08Z

@shonfeder let me know when the code is ready for another review pass. Thanks!

shonfeder · 2020-07-06T03:29:54Z

I've still got some cleanup to do here, but the basic functionality is in place and, afaik, all the tests should be passing in spirit. I'm finding the tests quite testy, due to trivial white space differences and arbitrary formatting differences (some end with new lines, some done, one way of generation produces breaks after certain tags, another way doesn't, etc).

I pulled in lambdasoup as a testing dep hoping to fully normalize the HTML so we wouldn't have to fuss with string munging, but it's not working as well as expected. In fact, for some reason some (but not all!) of the HTML test cases are ending up with duplicated tags, causing spurious test failures. I have no idea what's causing these, but I'll look with fresh eyes later in the week.

nojb · 2020-07-06T03:41:57Z

I've still got some cleanup to do here, but the basic functionality is in place and, afaik, all the tests should be passing in spirit. I'm finding the tests quite testy, due to trivial white space differences and arbitrary formatting differences (some end with new lines, some done, one way of generation produces breaks after certain tags, another way doesn't, etc).

One easy way around this is to pretty-print the expected HTML output found in the test files. The output is already being extracted to generate the .html files for each test. It is just a matter of inserting a pretty print call before storing the .html field in parse_test_spec function in extract_tests.ml. Then you won't need to do any normalization (apart from the pretty printing).

I was resisting to do this because when pretty-printing there is always the chance that a bug may squeak in. But if it is getting too hard to nail the spec formatting on the head, I think this is a valid way to go.

nojb · 2020-07-06T03:48:46Z

I pulled in lambdasoup as a testing dep hoping to fully normalize the HTML so we wouldn't have to fuss with string munging, but it's not working as well as expected. In fact, for some reason some (but not all!) of the HTML test cases are ending up with duplicated tags, causing spurious test failures. I have no idea what's causing these, but I'll look with fresh eyes later in the week.

Sounds good, let me know when it is ready for another review pass. Thanks!

Drup

Happy to see this going forward! Here's a bunch of preliminary reviews, don't hesitate to ask questions.

omd_tyxml/omd_tyxml.ml

Drup · 2020-07-10T13:56:07Z

omd_tyxml/omd_tyxml.ml

+  | Strong s   -> Html.[strong ~a:[] (of_inline s)]
+  | Hard_break -> Html.[br ~a:[] ()]
+  (* TODO Add option for verified html ?*)
+  | Html raw   -> Html.Unsafe.[data raw]


This is "correct" as long as the HTML is textual. As soon as you start using the non-textual Tyxml backends (for instance, Tyxml_js`), this is broken and you have to do more work.

Long term, I think it would be beneficial to provide a functor over the HTML module to adapt to different Tyxml instantiations (and provide a pre-applied instance for text!). The "right" solution is then to decode the HTML with lambdasoup and build actual tyxml trees. Then you can build DOM trees from markdown directly, without going through text.

I'm not sure whether we want this generalization as part of the current PR, or as a followup, but I agree we should target it in the long run.

I think we first need to settle settle whether or not we can replace the bespoke html generator with Tyxml.

shonfeder · 2020-07-12T00:40:52Z

I was resisting to do this because when pretty-printing there is always the chance that a bug may squeak in. But if it is getting too hard to nail the spec formatting on the head, I think this is a valid way to go.

Your fears have been validated, @nojb! I've been going a bit crazy trying to figure out why there is weird duplicate html in some of the generated test fragments (causing the parsing checks to fail). Turns out, a bug indeed squeaked in :)

utop # let html = {|<p><a href="foo\
bar"></p>
|};;
val html : string = "<p><a href=\"foo\\\nbar\"></p>\n"

utop # print_endline html;;
<p><a href="foo\
bar"></p>

- : unit = ()
utop # Soup.(parse html |> pretty_print) |> print_endline;;
<p>
 <a href="foo\
bar"></a>
</p>
<a href="foo\
bar">
</a>
- : unit = ()

Drup · 2020-07-12T09:55:27Z

Well, that's because your HTML is incorrect (unclosed a tag that is not converted to a standalone element), which triggers recovery in lambdasoup.

shonfeder · 2021-01-10T01:56:14Z

Oof! I nearly forgot about this! Life has came at us fast this year 😬 -- still, sorry for letting this linger so long!

I don't have time to dig into this again this weekend, but I hope to revisit the next weekend. IIRC, this isn't far off really, and thanks to Drup's feedback it should be easily to make the suggested improvements and push through the final bit.

For sandboxed development

These are just for exploring initial implementation approach. The actual tests should work on the specs the same way that the omd tests do.

I did not mean to add this file

Using the same approach and infrastructure as the core omd package.

(On my machine in any case.)

Thanks to Drup for guidance here.

More guidance from Drup.

As per the spec and current HTML implementation.

shonfeder

I think this ready for review, and for us to decide whether we want the Tyxml backend to replace the bespoke html generation or not!

Thanks @nojb and @Drup for the patience, and sorry for the verrrrry looong turnaround here.

shonfeder · 2021-02-20T02:54:43Z

tests/common.ml

@@ -0,0 +1,4 @@
+let normalize_html s =


We won't need this as a separate module if we decide to go with one package.

shonfeder · 2021-02-20T02:55:23Z

tests/dune

 (modules extract_tests))

+;  Code shared between various parts of the testing apartus


Ditto re: not needing this if we go with one packages.

shonfeder · 2022-05-24T02:31:40Z

None of this has any purchase in my working memory any more, the AST has changed, and there are numerous conflicts. So I'm gonna close this and will consider opening separately at some point.

shonfeder mentioned this pull request Jun 28, 2020

TyXML #82

Open

shonfeder force-pushed the omd-tyxml branch from 6a3827b to ec2379d Compare June 28, 2020 21:11

nojb suggested changes Jun 29, 2020

View reviewed changes

shonfeder commented Jun 30, 2020

View reviewed changes

tests/omd_tyxml.ml Outdated Show resolved Hide resolved

shonfeder force-pushed the omd-tyxml branch from a9b8b69 to a36223b Compare July 6, 2020 02:38

shonfeder changed the title ~~WIP: Add otional omd_tyxml package~~ WIP: Add optional omd_tyxml package Jul 6, 2020

Drup reviewed Jul 10, 2020

View reviewed changes

shonfeder mentioned this pull request Jul 12, 2020

Some HTML is parsed incorrectly aantron/markup.ml#53

Closed

shonfeder force-pushed the omd-tyxml branch 2 times, most recently from 9409796 to 0f51039 Compare January 22, 2021 01:46

shonfeder added 12 commits January 21, 2021 20:48

Ignore _ocaml directory

ec4a88f

For sandboxed development

Add omd_tyxml package

111577b

Add tests for ocaml_tyxml

65ed352

Expose the Html submodule

ebe14dc

Add WIP tests

adf4c2e

These are just for exploring initial implementation approach. The actual tests should work on the specs the same way that the omd tests do.

Add initial explorations of omd_tyxml module

303864b

Remove file picked up during merge conflict resolution

3d077a3

I did not mean to add this file

Fix more merge conflict mistakes

e2dc9dc

Stub in missing values

22e4a46

Use diff testing based on markdown specs

33af31d

Using the same approach and infrastructure as the core omd package.

Correct type alias

408313e

Switch back to Omd.doc -> Tyxml.doc approach

a28ebcf

shonfeder added 9 commits January 21, 2021 20:48

Cover most inline elements

6fbd2e9

Add lambdasoup as test dependency

0304ce4

Restore block dropped during rebase

99bef0c

Add Tyxml backend for all element types

1b2c3c9

Some cleanup

423fcb4

Remove dead code

9476531

Fix type misnomers

ce8ab3f

Fix failing tests

35d5d38

(On my machine in any case.)

Remove unsafe coerce and improve conversion

6a7c590

Thanks to Drup for guidance here.

shonfeder force-pushed the omd-tyxml branch from 0f51039 to 6a7c590 Compare February 8, 2021 02:18

shonfeder added 8 commits February 7, 2021 21:23

Use cons_opt function instead of clumsy append

94b2f94

More guidance from Drup.

Convert > 6h into paragraphs

8cf71fb

As per the spec and current HTML implementation.

Don't coerce tight list items

204741d

Don't coerce in definition list translation

eb0be13

Clean up comments and reorganize

3735ab8

Fix type in omd_tyxml description

4266b36

Cleanup and document

e00103e

Update TODOs

a36925c

shonfeder changed the title ~~WIP: Add optional omd_tyxml package~~ Omd_tyxml Feb 20, 2021

shonfeder commented Feb 20, 2021

View reviewed changes

shonfeder requested a review from nojb February 20, 2021 02:57

Comment tweaks

f98007b

nojb force-pushed the master branch 2 times, most recently from a01807e to 4026767 Compare March 8, 2021 11:54

shonfeder added this to the 2.0 milestone Apr 14, 2021

shonfeder closed this May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Omd_tyxml #211

Omd_tyxml #211

shonfeder commented Jun 28, 2020 •

edited by nojb

Loading

shonfeder commented Jun 30, 2020

shonfeder left a comment •

edited

Loading

nojb commented Jul 4, 2020

shonfeder commented Jul 6, 2020

nojb commented Jul 6, 2020

nojb commented Jul 6, 2020

Drup left a comment

Drup Jul 10, 2020

shonfeder Feb 20, 2021

shonfeder commented Jul 12, 2020 •

edited

Loading

Drup commented Jul 12, 2020

shonfeder commented Jan 10, 2021 •

edited

Loading

shonfeder left a comment

shonfeder Feb 20, 2021 •

edited

Loading

shonfeder Feb 20, 2021 •

edited

Loading

shonfeder commented May 24, 2022

		(modules extract_tests))

		; Code shared between various parts of the testing apartus

Omd_tyxml #211

Omd_tyxml #211

Conversation

shonfeder commented Jun 28, 2020 • edited by nojb Loading

shonfeder commented Jun 30, 2020

shonfeder left a comment • edited Loading

Choose a reason for hiding this comment

nojb commented Jul 4, 2020

shonfeder commented Jul 6, 2020

nojb commented Jul 6, 2020

nojb commented Jul 6, 2020

Drup left a comment

Choose a reason for hiding this comment

Drup Jul 10, 2020

Choose a reason for hiding this comment

shonfeder Feb 20, 2021

Choose a reason for hiding this comment

shonfeder commented Jul 12, 2020 • edited Loading

Drup commented Jul 12, 2020

shonfeder commented Jan 10, 2021 • edited Loading

shonfeder left a comment

Choose a reason for hiding this comment

shonfeder Feb 20, 2021 • edited Loading

Choose a reason for hiding this comment

shonfeder Feb 20, 2021 • edited Loading

Choose a reason for hiding this comment

shonfeder commented May 24, 2022

shonfeder commented Jun 28, 2020 •

edited by nojb

Loading

shonfeder left a comment •

edited

Loading

shonfeder commented Jul 12, 2020 •

edited

Loading

shonfeder commented Jan 10, 2021 •

edited

Loading

shonfeder Feb 20, 2021 •

edited

Loading

shonfeder Feb 20, 2021 •

edited

Loading