-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile to JS and publish in NPM #22
Comments
Using js_of_ocaml compiles Markup.ml to JS out of the box, but that's what it's designed to be able to do. Unfortunately the file is 262k of minified JS. JSOO does tend to produce quite large files though due to the way it works. Bucklescript is going to be tricky at the moment. The setup is easy; it compiles from source using npm-based dependencies, so you just add Unfortunately Bucklescript is still based on OCaml 4.02.3, so it's missing I don't know whether it's HTML5 compliant, but I've compiled a cut down version of |
hmm, I hadn't actually looked at the Makefile - I can see pre-4.03 compilation options in there and I just compiled it on 4.02.3 😂 Maybe it can work with bucklescript today. We'll need to figure out how to get I don't know much about how the different compilation targets work but maybe there will need to be a pre-processing script that lets bucklescript compile it? |
Markup.ml definitely supports OCaml < 4.03, as you saw :) Indeed, it builds down to 4.01, and its own codebase inherently (and deliberately) builds down to 3.12, Let me know if you want any assistance looking into this. I'm not sure about what preprocessing would be necessary – just haven't looked into yet. My hope would be that none is necessary, as none is necessary on OCaml... On the subject of Regarding Nethtml, it's definitely not HTML5 compliant. I actually am not sure if it's compliant with anything, but there wasn't much of an HTML spec before HTML5. In fact, I started with also a cut-down Nethtml in Lambda Soup, and then replaced it exactly for that reason. Besides not truly parsing HTML and not having error recovery, Nethtml is (or was at the time, still is AFAIK) limited to ASCII only (or Latin-1?), which was a real blocker. It also interpreted some attributes incorrectly, did not support namespaces, and did not do well on embedded SVG. So, I wrote Markup.ml to replace it in Lambda Soup. Because it doesn't implement the giant state machines of HTML, though, Nethtml is a much smaller parser, and it will probably always be smaller when compiled to JS. However, I have a long-standing task to optimize Markup.ml for both speed and size, so I'd be happy to work on that. Part of the reason for writing a thorough test suite was to be able to change the internal representation as needed. |
I'm not sure if preprocessing is required either. At this point I probably just need to understand more about how Markup.ml compiles, specifically when As a starting point I need to constrain which files Bucklescript deals with; by default it grabs everything in the source folder which means it's also failing on I'll see what I can come up with after digging through the build processes. Eliminating Nethtml doesn't explicitly support unicode as far as I can tell but I'm finding that it doesn't corrupt the characters either. It's very lenient with HTML which can be annoying but I'm working with fairly fixed source document structure. Size is definitely a positive for nethtml. Even with JSOO the 57k of bucklescript code I mentioned earlier only bloats out to 80k. |
Yes, The structure of Markup is a bunch of implementation modules I imagine, modulo the This may also be somehow useful. |
Concerning size, it would take work, but it's possible to replace the implementation with all sorts of things, like tables, etc., but of course it's better to see what the starting point for size is first :) You can also save on size by omitting the XML parser and writer, so I had a thought previously to break up Markup.ml into a bunch of packages, so you can just pick and choose what you want. |
(I "love" reading my own docs, now I have like ten TODOs of things to improve in CONTRIBUTING.md). |
Package uchar is a dependency of uutf since uutf 1.0.0. uutf is used by Markup.ml extensively; however, Markup.ml also uses module Uchar (from package uchar) directly in a few places. To be more clear, this commit makes uchar a direct dependency of package markup. The direct uses of the module Uchar were introduced in the same commit that adapted Markup.ml to uutf 1.0.0, 19f21c4 as this was forced by uutf 1.0.0. Related #22.
In case you already followed the link to |
OK, I've made decent progress. It was all fairly obvious once I realised Whether I delete the xml references or not doesn't make any difference to the final size; I'm using rollup.js to bundle the files together and it does tree-shaking. None of the My test app was simply:
The size difference over JSOO was less than I expected, only about 15% (268483 vs 228259 bytes). And Bucklescript is getting something wrong, there's a JavaScript error thrown in the HTML writer when the queue ( I'm not sure where to go from here. Due to the slapped-together nature of what I've done I can't easily make a branch to show any of this, although maybe I could do it with git submodules? 🤔 |
Didn't need submodules in the end, I just added basic |
A branch in a fork would have worked, but a repo is fine :) Let me know what you think is needed to get this into |
I did make branches for the extra files needed to work with bucklescript: My repo is just a test app linking to them. This is where removing Before we think about merging this though we need to report the bug in the bucklescript compiler. I mentioned that I'd hit a problem on discord and (as expected) was told it would really help if a smaller reproduction case can be found. I'm not sure how to even figure out what the problem is though much less build a simple case. Maybe if we put our heads together on it - I can definitely translate the JS stack into ocaml file references, and probably even step through the code to see what the surrounding conditions are before the error. |
The developer of bucklescript picked this up and found the issue without us needing to build a simple test case. It's an edge case in recursion, apparently. There should be a fix available soon! |
Great! Is there a PR/issue/discussion somewhere (Discord? haven't looked in a few days :/). I'm curious to know what the edge case is. |
Hi. I'm not sure what the state of this discussion is, but I can report on my recent experience of using markup with js_of_ocaml (wrt bucklescript, I would very much like to use jsoo and not BS, and do not really care for the size of the produced js file). Essentially it is very easy to run into a stack overflow, when using markup compiled to js. I suspect this is due to the CPS used in the implementation, which I assume is hard for jsoo to optimize into tail-recursion. One solution I see would be to manually trampoline in markup, which is a somewhat invasive change... Repro case: https://paste.isomorphis.me/O8j |
Well, Markup already abuses some references internally. The long-term roadmap for the library was to get it really really well-tested, then replace the innards with fully imperative code, or whatever is necessary to get desired performance characteristics (or non-stack-busting characteristics, as the case may be). Just, nobody has demanded this to date :) Taking a look at the repro case... |
Alright, so it's confirmed related almost certainly to jsoo. I expanded the lipsum text to about 11000 lines, and that ran just fine when compiled as bytecode and run natively. By comparison, the original few lines, run through jsoo, do break the stack in Node.js. |
I've opened an issue for this, #26. I don't know that much about how js_of_ocaml would interact with CPS, but I guess it's time to learn. |
It would be kind of cool if Markup.ml turned out to be the fastest HTML parser in JS – good PR for OCaml/ReasonML. But I don't know if this can be expected :) Maybe it's the slowest :)
Also, I didn't find an HTML5 parser for JS that was actually HTML5-compliant, in an earlier search. This is important because the actual HTML5 specification contains extensive error-recovery logic, which is the result of long discussions about best practices, held between the very well-informed people who created the HTML5 spec.
Not sure how huge the HTML parser would be when compiled to JS, though.
The text was updated successfully, but these errors were encountered: