Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate linux syscalls via the Linux source tree #11447

Merged
merged 1 commit into from
May 17, 2022
Merged

Generate linux syscalls via the Linux source tree #11447

merged 1 commit into from
May 17, 2022

Conversation

The-King-of-Toasters
Copy link
Contributor

Currently, all supported Linux architectures have a hand-written SYS enum. Updating the list is done in an ad-hoc, manual process:

  1. You have to remember that a new kernel version added some syscalls.
  2. You need to go check each architecture's list and copy the definitions accordingly. You can't blindly copy numbers between archs, as each one has its own unique numbering scheme.
  3. You need to make sure you add the arch. specific syscalls that are usually external to the main list.
  4. You also need to rename certain syscalls to fit in with the reset of the stdlib code (e.g. newfstatat -> fstatat64).

This process is error prone, and can introduce subtle errors in the list.
For example, the aarch64 list contains arch_specific_syscall - this is an offset, not a syscall. See the riscv64 list for how it handles this.

The solution would be to automate the list generation directly from the Linux source tree, which is what this commit does.
A new tool -generate_linux_syscalls.zig - generates the SYS for each arch using the syscall tables in the Linux source tree. On architectures without a table, it runs zig cc as a pre-processor to extract the system-call numbers from the Linux headers.

I'm marking this as a draft to see if any of my TODOs could be resolved. Otherwise, I think it's ready as is - barring CI errors.

@The-King-of-Toasters
Copy link
Contributor Author

Also, I'm generating the new list based on the 5.17 stable release. I checked 5.18 and there were no changes. This means that some new entries have been added since the lists were last updated.

@daurnimator
Copy link
Contributor

2. You can't blindly copy numbers between archs, as each one has its own unique numbering scheme.

New syscalls are meant to be consistent between architectures. Only older syscalls (for which the work has already been done) should differ.

@The-King-of-Toasters
Copy link
Contributor Author

New syscalls are meant to be consistent between architectures. Only older syscalls (for which the work has already been done) should differ.

True, I think alpha is the the only exception. It seems all the archs. sync up from 403 onwards, as per this comment:
https://github.com/torvalds/linux/blob/59250f8a7f3a60a2661b84cbafc1e0eb5d05ec9b/include/uapi/asm-generic/unistd.h#L781

@matu3ba
Copy link
Contributor

matu3ba commented Apr 16, 2022

I think you need to rebase against master, as in CI zig fmt broke for test/compiler_errors:

261 + cd /workspace
262 + /workspace/_debug/staging/bin/zig fmt --check . --exclude test/compile_errors/
263 ./_debug/staging/lib/zig/std/os/linux/syscalls.zig
264 ./lib/std/os/linux/syscalls.zig

cool stuff btw. sounds also useful to store these information in a portable way for other projects to reuse (ie JSON).

@The-King-of-Toasters
Copy link
Contributor Author

I think you need to rebase against master, as in CI zig fmt broke for test/compiler_errors:

Done. I also missed a space for the switch, which was probably causing the CI to fail.

@@ -0,0 +1,3487 @@
// This file is automatically generated.
// See tools/generate_linux_syscalls.zig for more info.
pub const SYS = switch (@import("builtin").cpu.arch) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were a fn which returned the enum type instead:

  • zls would autocomplete it for the current architecture
  • applications could use the syscall numbers for a different architecture (dont know the usecase)

For example:

pub fn NewSYS(comptime arch: @TypeOf(@import("builtin").cpu.arch)) type {
   return switch (arch) {
      // rest of code
   };
}

pub const SYS = NewSYS(@import("builtin").cpu.arch);

Copy link
Contributor Author

@The-King-of-Toasters The-King-of-Toasters Apr 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first versions had separate enums for each arch and a public SYS that switched for the correct one, similar to how it is now with linux.arch_bits. The only use for different variables would be checking the syscall numbers in a seccomp program.

When you say:

zls would autocomplete it for the current architecture

is this something that can be fixed in zls?

@matu3ba
Copy link
Contributor

matu3ba commented Apr 23, 2022

some questions on the generality of the approach:

  1. Do you think this approach can be generalized for arbitrary single-line macro definitions ? Take as motivational example gcc handling freebsd definitions like this (@HAVE_POSIX_SPAWN@ and @REPLACE_POSIX_SPAWN@ are replaced by users build system):
/* Flags to be set in the 'posix_spawnattr_t'.  */
#if @HAVE_POSIX_SPAWN@
/* Use the values from the system, but provide the missing ones.  */
# ifndef POSIX_SPAWN_SETSCHEDPARAM
#  define POSIX_SPAWN_SETSCHEDPARAM 0
# endif
# ifndef POSIX_SPAWN_SETSCHEDULER
#  define POSIX_SPAWN_SETSCHEDULER 0
# endif
#else
# if @REPLACE_POSIX_SPAWN@
/* Use the values from the system, for better compatibility.  */
/* But this implementation does not support AIX extensions.  */
#  undef POSIX_SPAWN_FORK_HANDLERS
# else
#  define POSIX_SPAWN_RESETIDS           0x01
#  define POSIX_SPAWN_SETPGROUP          0x02
#  define POSIX_SPAWN_SETSIGDEF          0x04
#  define POSIX_SPAWN_SETSIGMASK         0x08
#  define POSIX_SPAWN_SETSCHEDPARAM      0x10
#  define POSIX_SPAWN_SETSCHEDULER       0x20
# endif
#endif
  1. Did you play around with computing the diff via comptime, ie to include a check for occuring fields (and have some nice output how field name were changed)?
  2. Side-question: Stuff like char signedness can be overwritten by the user or macros, so the only good way is to detect it via macro (ie one of those https://exchangetuts.com/what-causes-a-char-to-be-signed-or-unsigned-when-using-gcc-1639592646188352 options). Did you play with that?
  3. Can we leverage stdint.h / limits.h like suggested here Leverage compiler system defines to determine types and sizes for IntelliSense microsoft/vscode-cpptools#7355
    with an additional check for the existence of both on the platform? https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html Purpose would be user-generated abi checks (both for c compilers via static_assert and Zig compilers) to decide, if code has expected (identical) sizes. Would also improve the lsp.
    " You should not use these macros directly; instead, include the appropriate headers. Some of these macros may not be defined on particular systems if GCC does not provide a stdint.h header on those systems."
    As I understand it, it should be fine to check __has_include("limits.h"), __has_include("stdint.h") and check macro definition existence with error out otherwise.

reminder for me: https://dev.midipix.org/runtime/psxtypes/issue/1

@The-King-of-Toasters
Copy link
Contributor Author

Since a month has passed and no-one has commented on my TODOs, I'm going to mark this as ready for review after some changes:

  1. Rebase on master.
  2. Split all the different arch numbers into separate public variables, (e.g. SysMips) and have SYS be a switch statement similiar to how it is already. This should help with Jared's comments.
  3. Make unsupported archs. issue a @compileError instead of unreachable.

@andrewrk
Copy link
Member

I'm generally in favor of this approach. Looking forward to reviewing it, when it is ready for review.

@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. standard library This issue involves writing Zig code for the standard library. os-linux labels May 14, 2022
@The-King-of-Toasters The-King-of-Toasters marked this pull request as ready for review May 14, 2022 13:32
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work!

I have some small requested changes to move some stuff around, but if those are addressed then I think this is ready to be merged.

Don't worry about what ZLS can or can't do. It's a third party project that is implemented in a fundamentally limited way; I have plans for more sophisticated IDE integration that are directly part of the Zig project itself.

lib/std/os/linux/syscalls.zig Outdated Show resolved Hide resolved
lib/std/os/linux/syscalls.zig Outdated Show resolved Hide resolved
lib/std/os/linux.zig Outdated Show resolved Hide resolved
Previously, updating the `SYS` enum for each architecture required
manually looking at the syscall tables and inserting any new additions.

This commit adds a tool, `generate_linux_syscalls.zig`, that automates
this process using the syscall tables in the Linux source tree. On
architectures without a table, it runs `zig cc` as a pre-processor to
extract the system-call numbers from the Linux headers.
@andrewrk andrewrk merged commit a436991 into ziglang:master May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. os-linux standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants