Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flags to support OpenBSD #157

Merged
merged 1 commit into from
Jun 21, 2024
Merged

Add flags to support OpenBSD #157

merged 1 commit into from
Jun 21, 2024

Conversation

ron-at-swgy
Copy link
Contributor

@ron-at-swgy ron-at-swgy commented Jun 20, 2024

Summary

There are two primary changes included here. This commit aims to address #156

Type length

This commit includes OpenBSD in the platform checks used to check for size_t, uint64_t, unsigned long and unsigned long long all being 64 bits in length.

Consider the following C code compiled and run on the target platform

#include <unistd.h>
#include <stdio.h>

int
main(int argc, char **argv)
{
        printf("sizeof(unsigned long) = %d\n", sizeof(unsigned long));
        printf("sizeof(long) = %d\n", sizeof(long));
        printf("sizeof(int64_t = %d\n", sizeof(int64_t));
        printf("sizeof(unsigned long long) = %d\n", sizeof(unsigned long long));
        printf("sizeof(uint64_t) = %d\n", sizeof(uint64_t));
        printf("sizeof(long *) = %d\n", sizeof(long *));
        printf("sizeof(uint64_t *) = %d\n", sizeof(uint64_t *));
        return 0;
}

When compiled and run:

 $ ./sizes 
sizeof(unsigned long) = 8
sizeof(long) = 8
sizeof(int64_t = 8
sizeof(unsigned long long) = 8
sizeof(uint64_t) = 8
sizeof(long *) = 8
sizeof(uint64_t *) = 8

Invalid CPU Feature String

Addtionally, the checks for some avx513* flags are removed, as avx512fp16 yield an error: "invalid cpu feature string for builtin" when compiling with clang.

Possibly related to llvm/llvm-project#65320

Testing

The compilation step succeeds and results in the following:

host$ pwd
/home/user/x86-simd-sort/builddir
host$ ls -l lib/
total 144
-rw-r--r--  1 user  user  21368 Jun 20 13:43 liblibavx.a
drwxr-xr-x  2 user  user    512 Jun 20 13:43 liblibavx.a.p
-rw-r--r--  1 user  user   2506 Jun 20 13:43 liblibicl.a
drwxr-xr-x  2 user  user    512 Jun 20 13:42 liblibicl.a.p
-rw-r--r--  1 user  user  20704 Jun 20 13:43 liblibskx.a
drwxr-xr-x  2 user  user    512 Jun 20 13:43 liblibskx.a.p
-rw-r--r--  1 user  user   1378 Jun 20 13:43 liblibspr.a
drwxr-xr-x  2 user  user    512 Jun 20 13:42 liblibspr.a.p

I attempted to build numpy updating the submodule for this project to use my fork. It passes the original failure I encountered but still fails. This indicates I have other dependencies to validate. I can conclusively say the changes here do not make the situation worse!

Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this! Best way to test the changes is by adding a CI run on an openBSD docker container.

@ron-at-swgy
Copy link
Contributor Author

Thank you for your review, I have added the clang version 18 condition as requested. I'm not familiar with any OpenBSD Docker images. I'll try swapping the numpy submodule to use my fork and see if it works

@ron-at-swgy ron-at-swgy marked this pull request as ready for review June 20, 2024 21:59
r-devulap
r-devulap previously approved these changes Jun 20, 2024
Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. thanks!

This commit includes OpenBSD in the platform checks used to check for
size_t, uint64_t, unsigned long and unsigned long long all being 64
bits in length.

Addtionally, the checks for some `avx513*` flags are removed, as
`avx512fp16` yield an error: "invalid cpu feature string for builtin"
when compiling with clang.

This commit aims to address intel#156
@ron-at-swgy
Copy link
Contributor Author

I amended the commit so the changes would be formated with clang-format-16. Hopefully this will pass CI now

@r-devulap
Copy link
Contributor

Could you confirm that this fixes the problem?

@ron-at-swgy
Copy link
Contributor Author

Could you confirm that this fixes the problem?

Ragu, thank you for following up. My goal is to build numpy 2.0 - so this change fixes one of the errors I encounter along the way:

host$ make
c++ -o kvsort -mavx512vl -mavx512dq -I../src -std=c++17 -O3 avx512-kv.cpp
c++ -o qsortavx2 -mavx2 -I../src -std=c++17 -O3 skx-avx2.cpp
c++ -o qsortavx512 -mavx512vl -mavx512dq -I../src -std=c++17 -O3 skx-avx2.cpp
c++ -o qsortspr -mavx512vl -mavx512dq -mavx512vbmi2 -mavx512fp16  -I../src -std=c++17 -O3 spr-16bit.cpp
c++ -o qsorticl -mavx512vl -mavx512bw -mavx512dq -mavx512vbmi2 -I../src -std=c++17 -O3 icl-16bit.cpp
host$ for file in !(Makefile|*.cpp)      
> do                            
> echo "Running example $file"  
> ./$file                       
> echo "Result was $?"          
> done
Running example kvsort
Illegal instruction (core dumped) 
Result was 132
Running example qsortavx2
Result was 0
Running example qsortavx512
Illegal instruction (core dumped) 
Result was 132
Running example qsorticl
Illegal instruction (core dumped) 
Result was 132
Running example qsortspr
Illegal instruction (core dumped) 
Result was 132

When trying to build and run the /examples, they compile but only qsortavx2 exits successfully. The others dump cores with illegal hardware instruction errors. This may be expected as all the other examples expect avx512 support.

The same behavior was found in both of the following environments:

# host 1, a Dell running in a datacenter
machdep.cpuvendor=GenuineIntel
machdep.cpuid=0x306c3
machdep.cpufeature=0xbffbfbff
# host 2, a small hp computer
machdep.cpuvendor=GenuineIntel
machdep.cpuid=0x506e3
machdep.cpufeature=0xbffbfbff

Again, this is probably expected. If there is other code I could run to exercise the xf86-simd-sort functionality, I can do so, but I think this changeset accomplishes its goal. What is your opinion?

@r-devulap
Copy link
Contributor

you can configue meson to build a test suite with -Dbuild_tests=true, and then run meson test in the builddir to run the test suite.

@ron-at-swgy
Copy link
Contributor Author

host$ meson test
ninja: Entering directory `/home/user/x86-simd-sort/builddir'
[18/18] Linking target testexe
test-keyvalue.cpp(libtests_kvsort.a.p/test-keyvalue.cpp.o:(std::__1::vector<double, std::__1::allocator<double>> get_array<double>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, unsigned long, double, double)) in archive tests/libtests_kvsort.a): warning: rand() may return deterministic values, is that what you want?
1/1 x86 simd sort tests        TIMEOUT        30.02s   killed by signal 15 SIGTERM
>>> MALLOC_PERTURB_=76 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 LD_LIBRARY_PATH=/home/user/x86-simd-sort/builddir/ ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 /home/user/x86-simd-sort/builddir/testexe


Ok:                 0   
Expected Fail:      0   
Fail:               0   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            1   

Full log written to /home/user/x86-simd-sort/builddir/meson-logs/testlog.txt

The test times out on both of my available machines.

@r-devulap
Copy link
Contributor

sorry, didn't realize meson test times out in 30s by default. You could directly just run ./testexe by itself or use meson test -t 100 (on my SKX it needs about < 3 min to run the whole thing).

@ron-at-swgy
Copy link
Contributor Author

success! 183 seconds on one platform, 189 on the other

@r-devulap
Copy link
Contributor

Perfect, will merge this then. And will update NumPy soon after. Thanks for your contribution @ron-at-swgy!

@r-devulap r-devulap merged commit 9a1b616 into intel:main Jun 21, 2024
11 checks passed
@r-devulap
Copy link
Contributor

@ron-at-swgy Porting this to numpy. See numpy/numpy#26797

@ron-at-swgy
Copy link
Contributor Author

I just checked out and installed your branch of numpy and I'm excited to report total success!!

In [5]: np.version.full_version
Out[5]: '2.1.0.dev0+git20240625.f160b2a'

In [6]: arr = np.array(range(100))

In [7]: arr
Out[7]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [9]: arr.std()
Out[9]: np.float64(28.86607004772212)

In [10]: arr.mean()
Out[10]: np.float64(49.5)

In [11]:                                                                          
Do you really want to exit ([y]/n)? y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants