Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build on OCaml 5 (river pin) #2609

Merged
merged 6 commits into from
Aug 9, 2024
Merged

Build on OCaml 5 (river pin) #2609

merged 6 commits into from
Aug 9, 2024

Conversation

aantron
Copy link
Contributor

@aantron aantron commented Jul 22, 2024

With this PR, it is possible to build and run OCaml.org on OCaml 5 with

make deps
make start

The build was failing due to problems with a configuration script in ocamlnet, a transitive dependency of OCaml.org, through package River. For details on the problem in ocamlnet, see aantron/ocamlnet@7ef255a:

Work around broken HAVE_BYTES detection

configure uses ocamlc -safe-string to determine whether the compiler has the bytes type. However, on OCaml 5, this command, without input files, exits with a non-zero exit code and prints No input files. This causes configure to wrongly believe that -safe-string is not supported, which then triggers a multitude of compilation errors.

This commit (in ocamlnet) changes the code to unconditionally assume that bytes is available. A better way is probably to adapt OASIS, or what generated the configure script.

The code in ocamlnet's configure script is here. cc @gerdstolpmann


This PR pins ocamlnet to a fork with the above workaround, until ocamlnet is fixed upstream and released. See the OCaml 5 compatibility MR in ocamlnet.


This PR points package ocamlnet to my own fork on GitHub, which could become a problem for git bisect or other purposes in the future, if the fork were to become inaccessible. However, I see that OCaml.org already has dependencies pointed to other repos, so perhaps this is fine. Please let me know if you'd like me to put the ocamlnet patch somewhere else.

aantron added 2 commits July 22, 2024 12:29
The build was failing due to problems with a configuration script in
ocamlnet, a transitive dependency of OCaml.org, through package river.
For details on the problem in ocamlnet, see

  aantron/ocamlnet@7ef255a

This commit pins ocamlnet to a fork with a workaround, until ocamlnet is
fixed upstream and released. See

  https://gitlab.com/gerdstolpmann/lib-ocamlnet3/-/merge_requests/24
@aantron
Copy link
Contributor Author

aantron commented Jul 22, 2024

I'm not sure what to do about the deployability check. The link, judging from the logs linked on it, is for a different PR, and the errors look unrelated to this one.

Copy link
Collaborator

@cuihtlauac cuihtlauac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fix for docker build is in this PR: aantron#1

The other Dockerfile will also need to be updated

@aantron
Copy link
Contributor Author

aantron commented Jul 24, 2024

Thank you! I've merged that in and changed the base image in .devcontainer/Dockerfile, which is the other Dockerfile that I see in this repo.

Should I push branches directly to this repo rather than my own fork for collaboration?

@aantron
Copy link
Contributor Author

aantron commented Jul 24, 2024

A separate option regarding ocamlnet might be to port River to use Lambda Soup for HTML parsing. Lambda Soup itself originally used ocamlnet, but I later wrote Markup.ml as its new back end to parse HTML5 according to the actual HTML5 spec, it's focused on HTML only without any wider functionality, builds on OCaml 5, uses Dune, etc. cc @kayceesrk

@cuihtlauac
Copy link
Collaborator

Should I push branches directly to this repo rather than my own fork for collaboration?

If you have the rights to do it, yes that's more convenient.

@tmattio
Copy link
Collaborator

tmattio commented Jul 29, 2024

A separate option regarding ocamlnet might be to port River to use Lambda Soup for HTML parsing. Lambda Soup itself originally used ocamlnet, but I later wrote Markup.ml as its new back end to parse HTML5 according to the actual HTML5 spec, it's focused on HTML only without any wider functionality, builds on OCaml 5, uses Dune, etc. cc @kayceesrk

I think that would be a preferable option, I can merge a PR and cut a release of River if needed.

@aantron
Copy link
Contributor Author

aantron commented Aug 8, 2024

Can confirm that with river pinned to tarides/river#15, ocamlnet is no longer in the dependency cone of OCaml.org. I was able to install, build, and run OCaml.org main on 5.2.0 with the river pin instead of the ocamlnet pin as presently proposed in this PR.

If we choose to go that route, I would remove the ocamlnet pin from this PR, and keep everything else (updating the Dockerfiles, the Makefile, etc., to use 5.2.0). With a new release of River available, we shouldn't need any new pins or constraints.

ocamlorg.opam Outdated
@@ -70,9 +70,6 @@ build: [
]
dev-repo: "git+https://github.com/ocaml/ocaml.org.git"
pin-depends: [
# See https://gitlab.com/gerdstolpmann/lib-ocamlnet3/-/merge_requests/24
# See https://github.com/aantron/ocamlnet/commit/7ef255a0a994ef2721f3359077b36bcdd0428ee4
["ocamlnet.dev" "git+https://github.com/aantron/ocamlnet#7ef255a0a994ef2721f3359077b36bcdd0428ee4"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to get this generated from a dune-workspace (pin ...) stanza. Do you have any experience as to why this isn't being generated in this commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the .opam.template file. This is no longer an issue.

@aantron
Copy link
Contributor Author

aantron commented Aug 9, 2024

I've switched this PR to pin the River PR rather than ocamlnet.

Based on a recommendation from @sabine, to test, I ran

rm -rf data/planet/*
make scrape

to test for differences in the scraped blogs. Besides that for many sources the result was 404 or timeout, the differences actually due to the River PR were insignificant:

  • The literal texts of self-closing tags like <hr/> were replaced by <hr>, per the serialization algorithm in the HTML5 spec.
  • There is no aggressive encoding of characters like single quotes with e.g. &rsquo;. The serialization algorithm in the HTML5 spec specifies UTF-8 and does not call for encoding these as entities. Characters that must be encoded still are -- for example, &gt;. The previous serializer used in River encoded aggressively as if outputting Latin-1.
  • <pre> is followed by a newline. The HTML5 parser requires swallowing this newline when parsing, and the output is semantically equivalent.
  • Changes like c29f667#diff-460316799c4fad86d97128339c0d566dd168995dd19a3b3b832840c856ebc577R27 are the result of HTML5 error correction and represent how browsers load HTML in practice.

I've uploaded the data diff as a throwaway commit: c29f667

I wonder if the diff at c29f667#diff-e7720bbc02cf67fbb25f22d45e6c0e65048b0225ebbe9819beb97c6b159f54beL110 is due to the blog post being updated after it was scraped for OCaml.org.

Copy link
Collaborator

@sabine sabine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the diff in question seems related to the blog post being updated after scraping, and that's fine.

Looks good to me!

@sabine sabine changed the title Build on OCaml 5 (ocamlnet -safe-string workaround) Build on OCaml 5 (river pin) Aug 9, 2024
@sabine sabine merged commit cbf0dd3 into ocaml:main Aug 9, 2024
3 checks passed
sabine pushed a commit to ggsmith842/ocaml.org that referenced this pull request Nov 29, 2024
* Build on OCaml 5

The build was failing due to problems with a configuration script in
ocamlnet, a transitive dependency of OCaml.org, through package river.
For details on the problem in ocamlnet, see

  aantron/ocamlnet@7ef255a

This commit pins ocamlnet to a fork with a workaround, until ocamlnet is
fixed upstream and released. See

  https://gitlab.com/gerdstolpmann/lib-ocamlnet3/-/merge_requests/24

* Try with CI on 5.2.0

* Fix Dockerfile

* Fix .devcontainer/Dockerfile

* Update dune files .opam is generated from

* Switch to upgraded River instead of ocamlnet

---------

Co-authored-by: Cuihtlauac ALVARADO <cuihtmlauac@tarides.com>
sabine pushed a commit to ggsmith842/ocaml.org that referenced this pull request Dec 10, 2024
* Build on OCaml 5

The build was failing due to problems with a configuration script in
ocamlnet, a transitive dependency of OCaml.org, through package river.
For details on the problem in ocamlnet, see

  aantron/ocamlnet@7ef255a

This commit pins ocamlnet to a fork with a workaround, until ocamlnet is
fixed upstream and released. See

  https://gitlab.com/gerdstolpmann/lib-ocamlnet3/-/merge_requests/24

* Try with CI on 5.2.0

* Fix Dockerfile

* Fix .devcontainer/Dockerfile

* Update dune files .opam is generated from

* Switch to upgraded River instead of ocamlnet

---------

Co-authored-by: Cuihtlauac ALVARADO <cuihtmlauac@tarides.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants