Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OCaml 5.2 #134

Open
wants to merge 94 commits into
base: main
Choose a base branch
from
Open

Support OCaml 5.2 #134

wants to merge 94 commits into from

Conversation

shym
Copy link
Contributor

@shym shym commented Apr 18, 2024

This PR brings support for OCaml 5.2 (5.2.0~beta2 at the moment).
It proposes to do so by revising substantially how the cross compiler is built and so, unfortunately, drops support for OCaml 4.x compilers.

This builds upon the great work that has been done in the following previous PRs:

and so I chose to branch it from #129’s branch, before the changes to the memory allocator (which are not necessary for OCaml 5.2), and rebased it all on main. All the comments in those PRs have been taken into account.

This PR is marked as draft as it should be merged only when OCaml 5.2 is released but it can be tested in a 5.2.0~beta2 switch.

Main changes

To support OCaml 5.2, this PR extends nolibc as necessary to build the runtime. Most of those extensions are inherited from those previous PRs.
Apart from that, the main changes of this PR are as follows.

Patches to the compiler sources

This PR proposes to change from the scheme of modifying the OCaml build system by seds and echos and instead brings a series of (version-specific) patches in patches/ that are applied on the sources when they are fetched.

There two main reasons for this change.

  • The first is that those patches (except the last one) were written in a way to improve the capability of the compiler build system to generate cross compilers, not only for Solo5, with the intention of upstreaming them. This would ease a lot the maintenance for OCaml/Solo5 (and hopefully similar projects). As we are quite advanced in the release cycle for 5.2 and so those patches would not get merged into that release, the idea is to take the time to test whether that’s really what we would require for Solo5 and propose them upstream when a strong case can be made for them.
  • The second reason is that having those modifications as full patches (they are git format-patch patches, so they contain a full explanation message for each change) makes them easier to review. (In my review of the previous PRs, the modified Makefile of the compiler was broken so make was generating warnings, even if it was still able to build the compiler; using actual patches makes it easier to make sure what modified build system we will actually rely on)

One interesting outcome of this deep revision of the compiler build system is the fact that the .opt versions of the compilers are built, which should give better performance, in particular to build large unikernels.

Open question for those patches: I kept git format-patch default of using a binary diff (for configure, as it is declared binary in the compiler) as what is useful to review is the changes to configure sources that are always bundled in the same patch. git format-patch -a patches would be bigger (twice the size, with the current patch series) but they might be preferred for other reasons.

A specific toolchain

The package now creates and installs a {aarch64,x86_64}-solo5-ocaml-* toolchain, in a similar fashion to what Solo5 does. The main reason for creating that toolchain in the first place was to avoid baking the build-time directories containing nolibc and openlibm into the generated OCaml compilers. Two versions of the toolchain are generated: one with the build-time directories, which is added to PATH during the build of the compiler; one with the final destination directories, which is installed in bin.
Besides making it possible to change the -I and -L between build-time and final versions, I further used this idea to:

  • store in the cc all its other required options (such as -include _solo5/overrides.h), since this machinery is rather well-placed there,
  • wrap all the binutils similarly; as an end-user, I find it handier to have all the prefixed tools available rather than having to remember for each of them which tool I should call (aarch64-solo5-none...*? aarch64-linux-gnu-*? ...). This fits in nicely with OCaml configure’s toolchain discovery so that stating the --target is enough to define all the tools of the toolchain.

Installation

This PR proposes to delegate the actual installation to OPAM (or opam-installer) by creating a ocaml-solo5.install file rather than invoking the install command itself. The initial reason for this choice is that it is fairly easy to bend OCaml’s make install into generating the list of all the files it could install and then tell OPAM to install them only if they were actually built. So we can skip building parts of the compiler (ocamllex, etc.) just to keep make install happy when they don’t make sense.

This comes with the natural consequence that it follows OPAM’s conventions when installing files, so the compiler is installed in %{prefix}%/lib/ocaml-solo5 instead of %{prefix}%/solo5-sysroot (I find following the OPAM convention a rather good idea anyhow). If we relax the name of the findlib toolchain, it becomes possible to install simultaneously ocaml-solo5 and ocaml-solo5-cross-aarch64 (in my testing, I didn’t switch between the two as much as I would have if they had been co-installable), since the only common file between those two packages is the findlib’s solo5.conf.

Smaller changes

Quite a few commits are clean-ups for things that were natural to fix while working on this (such as ensuring that make will rebuild nolibc and openlibm libraries iff that should be done). I feel that those small changes will make maintenance smoother.

Another change from previous PRs is the choice to drop stdatomic.h from nolibc and go with the one of the underlying compiler, to make sure the C compiler will compile C atomics as it is intended to do. This only needs a small patch to Solo5 for FreeBSD, to pull it along with the other freestanding headers when installing the Solo5 toolchain.

Testing

This has been tested on:

  • Debian x86_64 with gcc, target aarch64, building (and briefly running) the dns-resolver hvt mirage unikernel,
  • Debian x86_64 with gcc, target x86_64, at least running the internal tests,
  • FreeBSD 14 x86_64 with clang, target x86_64, running the internal tests; this needs a small patch on Solo5, to access the stdatomic.h header, see Fetch stdatomic.h from system on FreeBSD Solo5/solo5#574
  • OpenBSD 7.3 x86_64 with clang, target x86_64, running the internal tests
  • OpenBSD 7.5 x86_64 with clang: Solo5 is currently broken there, Initial support for OpenBSD 7.5 Solo5/solo5#573 has been opened there with some steps towards supporting the latest OpenBSD.

Open question

I have been wondering about how interrupts are set up in Solo5 (without taking the time to really dig, in particular because the precise set up depends on the binding), because the way the OCaml 5 runtime sets up its stacks is really different from OCaml 4.
In particular, the OCaml 5 runtime assumes that only OCaml code should ever be triggered on a fiber stack. Is Solo5 already currently set up so that it can be guaranteed?
Commit “WIP Disable the x86_64 red zone” is strongly related to that question, on a slightly different angle.

To do before merging

  • the questions in the “WIP” commits should be decided
  • the initial supported version of OCaml should be decided (last “WIP” commits)
  • README should be updated
  • dependency on ocaml-src should be restored
  • TODOs should be addressed
  • DROPME commits should be dropped

@dinosaure
Copy link
Member

I have been wondering about how interrupts are set up in Solo5 (without taking the time to really dig, in particular because the precise set up depends on the binding), because the way the OCaml 5 runtime sets up its stacks is really different from OCaml 4.
In particular, the OCaml 5 runtime assumes that only OCaml code should ever be triggered on a fiber stack. Is Solo5 already currently set up so that it can be guaranteed?
Commit “WIP Disable the x86_64 red zone” is strongly related to that question, on a slightly different angle.

To contextualize the Solo5 side, I found:

@shym
Copy link
Contributor Author

shym commented Apr 22, 2024

Thank you for the pointers. How to properly setup interrupt handling on x86_64 is quite some rabbit hole! To sum up what I learnt:

  • switching stack automatically on interrupt with no priviledge change is possible on x86_64 with IST,
  • setting up the hardware correctly for a real bare-metal kernel is terribly tricky (1),
  • but Solo5 makes a very limited and controlled use of interrupts, so that it is (hopefully) bypassing the reentrance problem,
  • if there is nevertheless a reentrance problem for interrupts, the problem is happening on the interrupt stacks, there should not be problems on the stacks as seen by the OCaml runtime.

So I’ve dropped the commit disabling the red zone.
(And pinned the branch of Solo5 to fix FreeBSD support)

@hannesm
Copy link
Member

hannesm commented May 6, 2024

Dear @shym, thanks for rebasing. Would you mind to mark it as ready for review? (Unless you still have some things you want to address first -- but in the meeting today it sounded like this is ready from your side)

@shym shym marked this pull request as ready for review May 6, 2024 09:29
@shym
Copy link
Contributor Author

shym commented May 6, 2024

I’ve rebased on main (cc @palainp) and added a few small fixes (left explicitly as fixup!... commits for now, to make it easier to resync for reviews that were already under way).
It was marked as draft (maybe wrongly, I see) as it should not be merged before OCaml 5.2 is out, but it is ready for review as is.

Copy link
Contributor Author

@shym shym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice only now that something doesn’t look just right in the implementation of mmap with a non-NULL argument (cc @palainp).

nolibc/mmap.c Outdated Show resolved Hide resolved
nolibc/stubs.c Show resolved Hide resolved
@shym
Copy link
Contributor Author

shym commented May 6, 2024

I dug deeper into the usage of mmap in the OCaml code base and got reassured: if I understand correctly, mmap is used with a (non-NULL) address only with MAP_FIXED to commit and decommit memory (set its access to rw- or ---) that has been previously reserved. So we should be fine just ignoring those mmaps, since we don’t handle memory protection. 😅
I added commit Handle mmap(addr..., MAP_FIXED, ...) uses to change the behaviour accordingly.
I would have been reassured about my understanding if I had managed to find an example that follows that code path, but the small programs I ran didn’t commit/decommit memory.

Finally, I also updated the PR to OCaml 5.2 RC1, released on Friday.

@palainp
Copy link
Member

palainp commented May 16, 2024

Thank you, sorry I've still not managed to test :(
But now that 5.2 is out, I guess it's ok to update this PR against the latest release :)

@shym
Copy link
Contributor Author

shym commented May 21, 2024

I’ve updated the PR to OCaml 5.2.0 as it was released last week.

Note: I’ve reverted to depend on the ocaml package (as it was before) rather than a precise dependency on a compiler package (ocaml-base-compiler or ocaml-variants, which was more relevant for pre-release versions of the compiler). But the build will fail later if the compiler in PATH is not exactly 5.2.0 because I left a strict test of version equality in the configure script. I don’t think it’s worth trying to relax that constraint (ie allow to build the 5.2.0 cross-compiler in a 5.2.0~rc1 switch) so I feel more confident keeping it that way.

@palainp
Copy link
Member

palainp commented May 21, 2024

Thank you @shym for that update! I can confirm that this PR compiles and qubes-mirage-firewall compiles & runs as with 4.14.2 (first fast.com tests show around 10-15% more bandwidth now but this wasn't conducted with a strict evaluation protocol :) ). I'll also try with longer runs.

@dinosaure
Copy link
Member

On my side, I approve this PR. It took time for several people and I think the review of the various maintainers and authors has been done. Thank you @shym for taking over and considering what may have blocked us in our previous PRs, I consider that this one can be merged 👍. /cc @mirage/core

@palainp
Copy link
Member

palainp commented May 22, 2024

I also approve this PR, but as it breaks the supported compilers versions, and I think we also need to keep a 4.4 branch corresponding to the current head before merging :)

@shym
Copy link
Contributor Author

shym commented Jul 19, 2024

I’ve just updated the patches to the compiler so that it contains the patches that are already integrated upstream and in the version integrated upstream: it will make it much easier to re-synchronize when updating the patches for the next version.
I’ve also been working on the other patches, to generalize them to cover more cross-compiler use cases, but I judged it best not to update them that late in the review process and considering that they are not yet upstreamed.
If you want to have a look, it might be easier to compare the branches:

  • before that update: solo5,
  • after that update: solo5-v1.5,
  • the currently cooking version of the patches, if you will adventurous: solo5-v2.

@hannesm
Copy link
Member

hannesm commented Nov 15, 2024

I merged the other PRs outstanding, Would you mind to rebase? Also, the cirrus CI should now that solo5 0.9 is released, work fine!?

@shym
Copy link
Contributor Author

shym commented Nov 18, 2024

I rebased on main. git range-diff bd6b8dfecdd2..0b3b7ff7ae4dc8 b993ec677dcb31..99c2ea45405e82 is as expected pointing only the commits changing CI.

@hannesm
Copy link
Member

hannesm commented Nov 18, 2024

I think you can remove the pin now, and depend on >= 0.9.0...

opam pin add -n solo5 'https://github.com/shym/solo5.git#freebsd-stdatomic'
[WARNING] Running as root is not recommended
[ERROR] Could not synchronize /.opam/5.2.0/.opam-switch/sources/solo5 from "git+https://github.com/shym/solo5.git#freebsd-stdatomic":
        Git repository seems just initialized, try again after your first commit
[solo5.0.9.0] fetching sources failed: git+https://github.com/shym/solo5.git#freebsd-stdatomic
[ERROR] Error getting source from git+https://github.com/shym/solo5.git#freebsd-stdatomic:
          - git+https://github.com/shym/solo5.git#freebsd-stdatomic

so, get rid of 95b7fbd should be fine.

then the question is: 5.2.1 support? 5.3? what should we target with an initial merge and release?

@hannesm
Copy link
Member

hannesm commented Nov 18, 2024

another remark: we used to depend on the ocaml-src package, now with e6caff2 there's (instead?) and extra-spirce introduced. why? from my point of view, keeping the unnecessary changes minimal would be appreciated. if you're eager for such a change, could you propose such a pr for the main branch?

README.md Outdated
@@ -71,8 +71,7 @@ Run: `solo5-hvt _build/solo5/main.exe`

## Supported compiler versions

Tested against OCaml 4.12.1 through 4.14.1. Other versions may require
changing `configure.sh`.
Tested against OCaml 5.0. Other versions may require changing `configure.sh`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5.2

Add two scripts to create an .install file:
- `gen_ocaml_install.sh` drives (bends...) OCaml's `make install` in order
  to generate two subfiles, namely `_build/ocaml.install.lib{,exec}`,
  containing the lines to install the compiler itself, into the `lib`
  and `libexec` sections
- `gen_dot_install.sh` generates the complete `.install` file, including
  the content of the subfiles for the compiler

The generated `.install` file diverges from the current installation
process in that it tries to follow the standard organisation of the OPAM
directory: the OCaml cross-compiler is installed in `%{_:lib}%` (ie
`%{prefix}%/lib/ocaml-solo5`) instead of `%{prefix}%/solo5-sysroot`.
Note that the paths fed to `configure.sh` must be consistent with that
for the installed compiler to work as expected.
In the standard use case, namely installation through OPAM, those
targets are no longer necessary, since a `.install` file is generated so
that OPAM does the installation per se internally
Remove install.sh and uninstall.sh
@shym
Copy link
Contributor Author

shym commented Nov 26, 2024

I updated the branch so that it relies on ocaml-src to get the compiler sources and ocamlfind to find those sources.
It also now provides an implementation of strerror_r as it is actually used by the runtime. So I’ve reverted to using sequential integers for error codes, so that the implementation of strerror_r can simply pick the error message in an array with no useless empty cells. The error messages are placed in a separate file, borrowed from OpenBSD but keeping only the entries we use.

I’ve also added a small (silly) example unikernel that raises an exception using an strerror_r-generated payload, if only to make sure that it is indeed used. As expected, without the implementation of strerror_r, the unikernel terminated with:

STUB: abort: strerror_r() called

instead of:

Fatal error: exception Sys_error("/tmp: Function not implemented")

(which admittedly reads funny).

shym and others added 19 commits November 26, 2024 13:17
To drop when ocaml-src 5.2.1 is released
Co-authored-by: Fabrice Buoro <fabrice@tarides.com>
Rephrase the comments to catch up with the evolutions of `signal.h`

Co-authored-by: Fabrice Buoro <fabrice@tarides.com>
Follow a suggestion by Christiano Haesbaert, as Solo5 doesn't provide
signals, it might just be ignored
As of OCaml 5.2.0 beta1, `sigfillset` is only used in functions creating
threads, and so that will abort in any case. Ignoring the function may
make it more future-proof, though.
Make extra sure that, if `pthread_create` is called, the caller will
know it failed
Since the main Makefile doesn't know the dependencies for the nolibc and
openlibm libraries, create intermediate phony targets.
That scheme ensures, even when one library already exists, that:
- when building a target that depend on that library, a recursive make
  is invoked to check that it is up-to-date,
- if the recursive make doesn't update the library, the dependency on
  the library is not a reason to rebuild the target.
OCaml calls mmap with a non-null address and with the `MAP_FIXED` flag
only on already reserved memory to commit or decommit that memory block,
ie to set its protection to `PROT_READ|PROT_WRITE` or to `PROT_NONE`, in
the `caml_mem_commit` and `caml_mem_decommit` functions
So we accept this particular case without allocating memory that would
leak since the OCaml code base simply ignores the returned value (as
`MAP_FIXED` enforces the returned value to be either `addr` or
`MAP_FAILED`)
Accordingly, remove the man page extract that says that `addr` is just a
hint since OCaml always sets `MAP_FIXED` when `addr` isn't null
`strerror_r` is used by the runtime to render some system errors so
implement it to get useful information in such cases
`strdup` is not used in the OCaml runtime, as it is explicitly
reimplemented to use the runtime allocator
@shym
Copy link
Contributor Author

shym commented Nov 26, 2024

I updated the PR with 5.2.1 support (with a temporary overlay for ocaml-src). I plan to drop 5.2.0 when ocaml-src.5.2.1 is available in your nearest opam repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants