Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omd_tyxml #211

Closed
wants to merge 36 commits into from
Closed

Omd_tyxml #211

wants to merge 36 commits into from

Conversation

shonfeder
Copy link
Collaborator

@shonfeder shonfeder commented Jun 28, 2020

This is just at an early exploratory stage, to explore out the space and be sure I'm headed in the right direction, before I actually set out in earnest.

Closes: #82

src/omd.ml Outdated Show resolved Hide resolved
src/omd.mli Outdated Show resolved Hide resolved
src/omd.mli Outdated Show resolved Hide resolved
src/omd.mli Outdated Show resolved Hide resolved
tests/dune.inc Outdated Show resolved Hide resolved
tests/omd_tyxml.ml Outdated Show resolved Hide resolved
dune-project Outdated Show resolved Hide resolved
dune-project Outdated Show resolved Hide resolved
@shonfeder
Copy link
Collaborator Author

Thanks very much for the initial feedback, @nojb! And sorry for the very rough state! As I noted, I only meant to rough in some broad strokes to get direction on a few major points, and your replies here -- and on the originating issue -- have been super helpful and informative in this direction.

Copy link
Collaborator Author

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH! I forgot to submit these comments. They help make more sense

tests/omd_tyxml.ml Outdated Show resolved Hide resolved
@nojb
Copy link
Contributor

nojb commented Jul 4, 2020

@shonfeder let me know when the code is ready for another review pass. Thanks!

@shonfeder shonfeder changed the title WIP: Add otional omd_tyxml package WIP: Add optional omd_tyxml package Jul 6, 2020
@shonfeder
Copy link
Collaborator Author

I've still got some cleanup to do here, but the basic functionality is in place and, afaik, all the tests should be passing in spirit. I'm finding the tests quite testy, due to trivial white space differences and arbitrary formatting differences (some end with new lines, some done, one way of generation produces breaks after certain tags, another way doesn't, etc).

I pulled in lambdasoup as a testing dep hoping to fully normalize the HTML so we wouldn't have to fuss with string munging, but it's not working as well as expected. In fact, for some reason some (but not all!) of the HTML test cases are ending up with duplicated tags, causing spurious test failures. I have no idea what's causing these, but I'll look with fresh eyes later in the week.

@nojb
Copy link
Contributor

nojb commented Jul 6, 2020

I've still got some cleanup to do here, but the basic functionality is in place and, afaik, all the tests should be passing in spirit. I'm finding the tests quite testy, due to trivial white space differences and arbitrary formatting differences (some end with new lines, some done, one way of generation produces breaks after certain tags, another way doesn't, etc).

One easy way around this is to pretty-print the expected HTML output found in the test files. The output is already being extracted to generate the .html files for each test. It is just a matter of inserting a pretty print call before storing the .html field in parse_test_spec function in extract_tests.ml. Then you won't need to do any normalization (apart from the pretty printing).

I was resisting to do this because when pretty-printing there is always the chance that a bug may squeak in. But if it is getting too hard to nail the spec formatting on the head, I think this is a valid way to go.

@nojb
Copy link
Contributor

nojb commented Jul 6, 2020

I pulled in lambdasoup as a testing dep hoping to fully normalize the HTML so we wouldn't have to fuss with string munging, but it's not working as well as expected. In fact, for some reason some (but not all!) of the HTML test cases are ending up with duplicated tags, causing spurious test failures. I have no idea what's causing these, but I'll look with fresh eyes later in the week.

Sounds good, let me know when it is ready for another review pass. Thanks!

Copy link
Member

@Drup Drup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this going forward! Here's a bunch of preliminary reviews, don't hesitate to ask questions.

omd_tyxml/omd_tyxml.ml Outdated Show resolved Hide resolved
omd_tyxml/omd_tyxml.ml Outdated Show resolved Hide resolved
omd_tyxml/omd_tyxml.ml Outdated Show resolved Hide resolved
omd_tyxml/omd_tyxml.ml Outdated Show resolved Hide resolved
| Strong s -> Html.[strong ~a:[] (of_inline s)]
| Hard_break -> Html.[br ~a:[] ()]
(* TODO Add option for verified html ?*)
| Html raw -> Html.Unsafe.[data raw]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "correct" as long as the HTML is textual. As soon as you start using the non-textual Tyxml backends (for instance, Tyxml_js`), this is broken and you have to do more work.

Long term, I think it would be beneficial to provide a functor over the HTML module to adapt to different Tyxml instantiations (and provide a pre-applied instance for text!). The "right" solution is then to decode the HTML with lambdasoup and build actual tyxml trees. Then you can build DOM trees from markdown directly, without going through text.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether we want this generalization as part of the current PR, or as a followup, but I agree we should target it in the long run.

I think we first need to settle settle whether or not we can replace the bespoke html generator with Tyxml.

@shonfeder
Copy link
Collaborator Author

shonfeder commented Jul 12, 2020

I was resisting to do this because when pretty-printing there is always the chance that a bug may squeak in. But if it is getting too hard to nail the spec formatting on the head, I think this is a valid way to go.

Your fears have been validated, @nojb! I've been going a bit crazy trying to figure out why there is weird duplicate html in some of the generated test fragments (causing the parsing checks to fail). Turns out, a bug indeed squeaked in :)

utop # let html = {|<p><a href="foo\
bar"></p>
|};;
val html : string = "<p><a href=\"foo\\\nbar\"></p>\n"

utop # print_endline html;;
<p><a href="foo\
bar"></p>

- : unit = ()
utop # Soup.(parse html |> pretty_print) |> print_endline;;
<p>
 <a href="foo\
bar"></a>
</p>
<a href="foo\
bar">
</a>
- : unit = ()

@Drup
Copy link
Member

Drup commented Jul 12, 2020

Well, that's because your HTML is incorrect (unclosed a tag that is not converted to a standalone element), which triggers recovery in lambdasoup.

@shonfeder
Copy link
Collaborator Author

shonfeder commented Jan 10, 2021

Oof! I nearly forgot about this! Life has came at us fast this year 😬 -- still, sorry for letting this linger so long!

I don't have time to dig into this again this weekend, but I hope to revisit the next weekend. IIRC, this isn't far off really, and thanks to Drup's feedback it should be easily to make the suggested improvements and push through the final bit.

@shonfeder shonfeder force-pushed the omd-tyxml branch 2 times, most recently from 9409796 to 0f51039 Compare January 22, 2021 01:46
@shonfeder shonfeder changed the title WIP: Add optional omd_tyxml package Omd_tyxml Feb 20, 2021
Copy link
Collaborator Author

@shonfeder shonfeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ready for review, and for us to decide whether we want the Tyxml backend to replace the bespoke html generation or not!

Thanks @nojb and @Drup for the patience, and sorry for the verrrrry looong turnaround here.

@@ -0,0 +1,4 @@
let normalize_html s =
Copy link
Collaborator Author

@shonfeder shonfeder Feb 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't need this as a separate module if we decide to go with one package.

(modules extract_tests))

; Code shared between various parts of the testing apartus
Copy link
Collaborator Author

@shonfeder shonfeder Feb 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto re: not needing this if we go with one packages.

@shonfeder shonfeder requested a review from nojb February 20, 2021 02:57
@nojb nojb force-pushed the master branch 2 times, most recently from a01807e to 4026767 Compare March 8, 2021 11:54
@shonfeder shonfeder added this to the 2.0 milestone Apr 14, 2021
@shonfeder
Copy link
Collaborator Author

None of this has any purchase in my working memory any more, the AST has changed, and there are numerous conflicts. So I'm gonna close this and will consider opening separately at some point.

@shonfeder shonfeder closed this May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TyXML
3 participants