Replies: 3 comments 19 replies
-
I think this will need to happen at some time, but it is going to require a major refactor. |
Beta Was this translation helpful? Give feedback.
-
For Windows, you can use https://learn.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=msvc-170 to make the cpuid calls. (and MinGW I think can use the same includes) By wrapping the cpuid calls, you can write one compiler specific version of cpuid, one for cpuidex. Linux has cpuid.h, which has wrapper functions for cpuid and cpuidex. And since these are OS libraries, you don't have to worry about which compiler supports it. Not sure about freeBSD, but at least you won't have to deal with MSVC. For non x86 platforms, you already have to have a separate binary, so there's no concern with runtime processor detection (until you want to make stuff for specific arm processors, and you can let the arm devs handle that). |
Beta Was this translation helpful? Give feedback.
-
Groups like KoboldCPP have a lot of problems with this because they mostly release a binary, so they have to kind of target the lowest common CPU instead of using detection. I was thinking of looking into this. I think the first step would be making a file for processor feature detection, and having someone review it. Then we can start migrating all the guards in ggml-impl.c to use those at runtime instead of at compile time. I was thinking that this new I'd considered other ideas, but I don't want anything that might decrease performance in performance critical code. One thing mentioned was function pointers, but that'd end up requiring anywhere switching on platform specific code to do a function call that can't be inlined vs just a conditional jump. @ggerganov @slaren do you have any thoughts? |
Beta Was this translation helpful? Give feedback.
-
I was just thinking if it would be a good idea to have the ability to detect and enable/disable various instruction sets at runtime instead of the current compile-time defines? For the windows release builds and possibly future statically linked releases for unixes (flatpaks, static docker binary builds, mac dmg's, etc.) this would also remove the need to have multiple builds for the different feature sets.
This would be easy to implement in a very fast and completely platform-independent way by using straight cpu instructions. For x86 it can be done simply using the cpuid
0F A2
instruction. You can retrieve all the instructions compatible with the current processor with that one simple instruction.This is already how I check for AVX512F availability in the
windows-latest-cmake
gh action runner in here: https://github.com/ggerganov/llama.cpp/blob/34c1072e497eb92d81ee7c0e12aa6741496a41c6/.github/workflows/build.yml#L174-L181The code part being:
ARM I'm not too familiar with but I believe the
mrs
instruction is the one to use for it.The only minor issue is that some compilers (looking at you, MSVC) do not support inline assembly for x64 for the reason of being too lazy to make a decent compiler. Fortunately this can be circumvented by simply putting the instructions in an array like this . That way inline assembly can be used in x64 and is also supported by all compilers.
Beta Was this translation helpful? Give feedback.
All reactions