Regarding detection and use of processor feature sets #535

anzz1 · 2023-03-26T17:37:18Z

anzz1
Mar 26, 2023

I was just thinking if it would be a good idea to have the ability to detect and enable/disable various instruction sets at runtime instead of the current compile-time defines? For the windows release builds and possibly future statically linked releases for unixes (flatpaks, static docker binary builds, mac dmg's, etc.) this would also remove the need to have multiple builds for the different feature sets.

This would be easy to implement in a very fast and completely platform-independent way by using straight cpu instructions. For x86 it can be done simply using the cpuid 0F A2 instruction. You can retrieve all the instructions compatible with the current processor with that one simple instruction.

This is already how I check for AVX512F availability in the windows-latest-cmake gh action runner in here: https://github.com/ggerganov/llama.cpp/blob/34c1072e497eb92d81ee7c0e12aa6741496a41c6/.github/workflows/build.yml#L174-L181

The code part being:

  unsigned int u;

  __asm{
    push ebx
    xor ecx,ecx
    mov eax, 7
    cpuid
    shr ebx, 16
    and bx, 1
    mov u, ebx
    pop ebx
  }

ARM I'm not too familiar with but I believe the mrs instruction is the one to use for it.

The only minor issue is that some compilers (looking at you, MSVC) do not support inline assembly for x64 for the reason of being too lazy to make a decent compiler. Fortunately this can be circumvented by simply putting the instructions in an array like this . That way inline assembly can be used in x64 and is also supported by all compilers.

slaren · 2023-03-26T20:58:38Z

slaren
Mar 26, 2023
Maintainer

I think this will need to happen at some time, but it is going to require a major refactor.

16 replies

slaren Mar 27, 2023
Maintainer

but how the cpuid is implemented is really a non-issue here anyway, the right question to ask is whether it should be done.

What would be the alternative for distributing binaries? I am not sure that anyone downstream could effectively use llama.cpp as a library until this is fixed.

anzz1 Mar 28, 2023
Author

What would be the alternative for distributing binaries? I am not sure that anyone downstream could effectively use llama.cpp as a library until this is fixed.

By using git submodules? Like the example shown here in this recent issue #560
https://github.com/Bip-Rep/sherpa/tree/main/src

Isn't the unix ecosystem designed exactly around compiling everything from source? As there are always compatibility problems with precompiled binaries on linux/unix , the way libraries vary greatly between distributions makes statically linking a non-option in most cases? Or at least I've always had trouble with linux and binaries.

I'm not arguing against you here, merely asking for clarification.

anzz1 Mar 28, 2023
Author

Anyway I put up an example implementation of a feature like this here in case some is interested or needs such a thing:
https://github.com/anzz1/cpuid

slaren Mar 28, 2023
Maintainer

Isn't the unix ecosystem designed exactly around compiling everything from source?

I think by far the most common way to distribute software in the linux ecosystem is through pre-compiled binaries, not from source. Distributions like gentoo that insist on having the user compile everything are the exception, not the norm.

anzz1 Mar 28, 2023
Author

Isn't the unix ecosystem designed exactly around compiling everything from source?

I think by far the most common way to distribute software in the linux ecosystem is through pre-compiled binaries, not from source. Distributions like gentoo that insist on having the user compile everything are the exception, not the norm.

Oh I see, shows my lack of experience in not having used desktop linuxes much at all. I have a fair amount of experience in the server space, but there it's pretty much always been (not including recent developments like docker) a choice between building from source or using the package manager where the distro maintainers handle the logistics of building the source and packaging distribution.

kunnis · 2024-04-11T05:03:14Z

kunnis
Apr 11, 2024

For Windows, you can use https://learn.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=msvc-170 to make the cpuid calls. (and MinGW I think can use the same includes) By wrapping the cpuid calls, you can write one compiler specific version of cpuid, one for cpuidex. Linux has cpuid.h, which has wrapper functions for cpuid and cpuidex. And since these are OS libraries, you don't have to worry about which compiler supports it. Not sure about freeBSD, but at least you won't have to deal with MSVC.

For non x86 platforms, you already have to have a separate binary, so there's no concern with runtime processor detection (until you want to make stuff for specific arm processors, and you can let the arm devs handle that).

0 replies

kunnis · 2024-04-13T02:42:06Z

kunnis
Apr 13, 2024

Groups like KoboldCPP have a lot of problems with this because they mostly release a binary, so they have to kind of target the lowest common CPU instead of using detection.

I was thinking of looking into this. I think the first step would be making a file for processor feature detection, and having someone review it. Then we can start migrating all the guards in ggml-impl.c to use those at runtime instead of at compile time. I was thinking that this new ggml-cpuid.h would expose a series of global bools for the various features, and an init() function to initialize all the flags based on the detected processor. You could also set the bools after initialization to disable specific processor flags for testing or comparison without recompiling.

I'd considered other ideas, but I don't want anything that might decrease performance in performance critical code. One thing mentioned was function pointers, but that'd end up requiring anywhere switching on platform specific code to do a function call that can't be inlined vs just a conditional jump.

@ggerganov @slaren do you have any thoughts?

3 replies

ggerganov Apr 14, 2024
Maintainer

AFAIU proper support requires some major changes in ggml. This does not seem very high-priority to me, since I believe there are alternative solutions: ggml-org/whisper.cpp#1939 (comment). How difficult is for projects such as KoboldCPP to build multiple ggml binaries and dynamically load the correct one? This way we delegate the feature set detection outside of ggml and don't have to make changes to support this. Unless there are some major obstacles in doing that in practice

There is also the following attempt: ggml-org/whisper.cpp#1261. But I haven't looked in the details yet.

anzz1 Apr 14, 2024
Author

For maximum performance, you'd definitely want to eliminate as much branches/conditional jumps as possible. In another words, the more static and less dynamic code you have, the faster it is. Put ggml which contains the performance-critical calculations inside a library, make the GitHub runner build this library for each possible feature-set combination, then have the frontend KoboldCPP detect the feature set on startup and have it load the correct library accordingly. This way you can have best of both worlds, a one-size-fits-all solution while also having the performance of optimizations allowed by static compilation for a specific feature-set.

kunnis Apr 14, 2024

Ah, I should have thought of compiling the library multiple times with different settings. Good call.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding detection and use of processor feature sets #535

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 19 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Regarding detection and use of processor feature sets #535

anzz1 Mar 26, 2023

Replies: 3 comments · 19 replies

slaren Mar 26, 2023 Maintainer

slaren Mar 27, 2023 Maintainer

anzz1 Mar 28, 2023 Author

anzz1 Mar 28, 2023 Author

slaren Mar 28, 2023 Maintainer

anzz1 Mar 28, 2023 Author

kunnis Apr 11, 2024

kunnis Apr 13, 2024

ggerganov Apr 14, 2024 Maintainer

anzz1 Apr 14, 2024 Author

kunnis Apr 14, 2024

anzz1
Mar 26, 2023

Replies: 3 comments 19 replies

slaren
Mar 26, 2023
Maintainer

slaren Mar 27, 2023
Maintainer

anzz1 Mar 28, 2023
Author

anzz1 Mar 28, 2023
Author

slaren Mar 28, 2023
Maintainer

anzz1 Mar 28, 2023
Author

kunnis
Apr 11, 2024

kunnis
Apr 13, 2024

ggerganov Apr 14, 2024
Maintainer

anzz1 Apr 14, 2024
Author