Skip to content

Fix assembly for x32 ABI #438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

abatyiev
Copy link

@abatyiev abatyiev commented Jan 3, 2025

The x32 ABI is special ABI for long mode on x64, that uses 32 bit integers and pointers instead of 64 bit. This patch fixes assembly code for x32 ABI.

See https://bugs.gentoo.org/942562
https://wiki.debian.org/X32Port

@oconnor663 oconnor663 requested a review from sneves January 3, 2025 19:45
@sneves
Copy link
Collaborator

sneves commented Feb 10, 2025

I'll be honest, I thought x32 had been deprecated years ago. Is this something popular enough to warrant support?

@abatyiev
Copy link
Author

Gentoo does support it and I'm using it.
The problem for me is that mesa uses BLAKE3 now and therefore I need a fix here.

@oconnor663
Copy link
Member

What would be our options for CI testing this? Apart from what GH Actions provides directly, we use QEMU via cross-rs to test esoteric architectures (big endian etc.), but it doesn't look like cross-rs supports x32.

@abatyiev
Copy link
Author

It seems like you are using Ubuntu: it has x32 support in regular x86_64 releases. (There are packages like gcc-13-x86-64-linux-gnux32, libc6-x32, binutils-x86-64-linux-gnux32 etc). One way to provide correct compiler for the build system is to use CMake's toolchain file ( https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html

Basically you need a regular amd64 kernel with CONFIG_X86_X32=y and a toolchain. Regular amd64 ABI and x32 ABI can coexist (together with i686 ABI) on the same machine.

@nabijaczleweli
Copy link

nabijaczleweli commented Jun 1, 2025

Can repro the fix on my SSE2-capable x32 machine:

$ ninja -C .b && cc -L.b -I. ~/tarta.d/uwu/a.c -lblake3 && ./a.out
ninja: Entering directory `.b'
[2/2] Linking C static library libblake3.a
Segmentation fault
$ curl -SL https://github.com/BLAKE3-team/BLAKE3/pull/438.patch | git am
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 16720  100 16720    0     0  20279      0 --:--:-- --:--:-- --:--:--  291k
Applying: Fix assembly for x32 ABI
$ ninja -C .b && cc -L.b -I. ~/tarta.d/uwu/a.c -lblake3 && ./a.out
ninja: Entering directory `.b'
[5/5] Linking C static library libblake3.a
3b66b313c1481abbe678cc31e692937404b855a7a37803ee0759905f7e6fa53b38f21da38d5e0973bd2c261090c14ddf6ed7c9ab920e051c8b9a60a3d0c0a585c93357411ee2d1248cdbd77efbfd39ecb0f3071c21c0f31b1056cb5d117dc64186710ca9416639ccc36e3023ce0d0312e350db02b39336cf978325bd1d76a636041c6dc135ee7fb50c048d8599bf57ca0367ac3e4cccec1fdeedbaa587d510edca6d0473a887a7307034146e4e329fda36918aa3eb0bdf356b244fa509f447065c6035f8e897d182725b793abd77b99a66529b6e7bf8f16f012de94bb46881a4211b1847e8a223c13789ed46695087c8ef0377f2269ff18866cf7f5c8c157f9e
$ cat /etc/debian_version
trixie/sid
$ uname -a
Linux szarotka 6.7.7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.7.7-1 (2024-03-02) x86_64 GNU/Linux
$ dpkg --list
--list       --listfiles
$ dpkg --print-architecture
x32
$ ninja -C .b && cc -L.b -I. ~/tarta.d/uwu/a.c -lblake3 && ./a.out
ninja: Entering directory `.b'
[2/2] Linking C static library libblake3.a
Segmentation fault
$ curl -SL https://github.com/BLAKE3-team/BLAKE3/pull/438.patch | git am
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 16720  100 16720    0     0  20279      0 --:--:-- --:--:-- --:--:--  291k
Applying: Fix assembly for x32 ABI
$ ninja -C .b && cc -L.b -I. ~/tarta.d/uwu/a.c -lblake3 && ./a.out
ninja: Entering directory `.b'
[5/5] Linking C static library libblake3.a
3b66b313c1481abbe678cc31e692937404b855a7a37803ee0759905f7e6fa53b38f21da38d5e0973bd2c261090c14ddf6ed7c9ab920e051c8b9a60a3d0c0a585c93357411ee2d1248cdbd77efbfd39ecb0f3071c21c0f31b1056cb5d117dc64186710ca9416639ccc36e3023ce0d0312e350db02b39336cf978325bd1d76a636041c6dc135ee7fb50c048d8599bf57ca0367ac3e4cccec1fdeedbaa587d510edca6d0473a887a7307034146e4e329fda36918aa3eb0bdf356b244fa509f447065c6035f8e897d182725b793abd77b99a66529b6e7bf8f16f012de94bb46881a4211b1847e8a223c13789ed46695087c8ef0377f2269ff18866cf7f5c8c157f9e
$ cat /etc/debian_version
trixie/sid
$ uname -a
Linux szarotka 6.7.7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.7.7-1 (2024-03-02) x86_64 GNU/Linux
$ dpkg --list
--list       --listfiles
$ dpkg --print-architecture
x32
$ dpkg --print-foreign-architectures
amd64
i386

for

#include <stdio.h>
#include "blake3.h"
static uint8_t data[1024 * 1024 * 100];
int main() {
  blake3_hasher h;
  blake3_hasher_init(&h);
  blake3_hasher_update(&h, data, sizeof(data));
  blake3_hasher_finalize(&h, data, 256);
  for(int i = 0; i < 256; ++i)
    printf("%02x", data[i]);
  putchar('\n');
}

@oconnor663
Copy link
Member

oconnor663 commented Jun 1, 2025

Another option might be to just fall back to the intrinsics implementation on x32? Am I right to think that those should actually work?

@nabijaczleweli
Copy link

@hvdijk
Copy link

hvdijk commented Jul 28, 2025

I had not seen that this PR was already here when I submitted #500, apologies. In my PR, note the additional mov esi, esi / mov edx, edx compared to this PR. They are required, this PR is not enough, the ABI does not guarantee that these registers are zero-extended on entry so without those instructions, when the functions later use rsi/rdx they behave unreliably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants