-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6600/6600 XT/6650 XT gfx1032 libraries for compilation of Kobold.cpp #655
Comments
InformationEDIT: The one I created as "lazy" seems to be missing, I created "non-lazy" for rocblas and tensile rel-5.7.1 and I am attaching it. with this commit that was merge last week I was able to generate "lazy" for gfx1032 without any patch. i used rocblas's develop branch. i will explain step by step how i did it below. ROCm/Tensile@efbe0c0 SetupInstallGit for Windows
ROCm Windows SDK (i used 5.7.1) ADD PATHCmake and Ninja:
Git:
Perl:
RC (when compiling koboldcpp I get an error saying rc not found, so I added it to path):
ROCM
vcpkg
Rocblasgo to another folder for example downloads etc.
Open x64 native tools as ADMIN and go to the rocblas folder
Now let me explain here, I have been struggling with the rmake.py command for two days, even if I pass -a with gtx1032 or other parameters, I still get an error. If you are not as unlucky as me, you may not get an error here. It doesn't matter if you also get an error, the command just needs to generate some things and put them in place After the rmake.py command is finished(Continue with x64 native tools console) for non-lazy
for lazy
Generated kernel and tensilelibrary files with TensileCreateLibrary without any error. We now have our kernel and tensilelibrary files in the C:\SomeOutputFolder folder. Attachmentsfiles I generated for gfx1032; |
I use openhermes-2.5-mistral-7b.Q6_K.gguf, I put the kernel and TensileLibrary file I shared above https://github.com/LostRuins/koboldcpp/files/14129073/gfx1032_none_lazy-rocm-5.7.1.zip under AMD\ROCm\5.7\bin\rocblas\library. I compiled the latest koboldcpp-rocm version myself for gfx1032. I am using HIP SDK 5.7.1 My initial parameters for openhermes are as follows(kcpps) {"model": null, "model_param": "D:/Ai/models/openhermes-2.5-mistral-7b.Q6_K.gguf", "port": 5001, "port_param": 5000, "host": "", "launch": false, "lora": null, "config": null, "threads": 8, "blasthreads": 8, "highpriority": false, "contextsize": 8192, "blasbatchsize": 512, "ropeconfig": [1.0, 10000.0], "smartcontext": false, "noshift": false, "bantokens": null, "forceversion": 0, "nommap": false, "usemlock": false, "noavx2": false, "debugmode": 0, "skiplauncher": false, "hordeconfig": null, "noblas": false, "useclblast": null, "usecublas": ["normal", "0"], "usevulkan": null, "gpulayers": 33, "tensor_split": null, "onready": "", "multiuser": 1, "remotetunnel": false, "foreground": false, "preloadstory": null, "quiet": false, "checkforupdates": 0, "ssl": null} This is the result: Processing Prompt [BLAS] (316 / 316 tokens)
Generating (250 / 250 tokens)
ContextLimit: 566/8192, Processing:2.01s (6.4ms/T), Generation:11.61s (46.4ms/T), Total:13.62s (54.5ms/T = 18.36T/s) I see 7.7gb vram usage in task manager, I think the result is great. I can say that I got rid of dual-booting for llm :) If you want to save those who have gfx1032 cards and compile their own .exe like me, you can add these files to the pre-build binaries 😄 @YellowRoseCx |
Adding them into KoboldCpp-ROCm 1.57.1.yr1, hopefully everything works as intended xD |
I realized later that the "lazy" one I shared was a bit incomplete and even unusable, so I added information at the top of this post #655 (comment), then I created and added "none-lazy" for the 5.7.1 HIP SDK version. The "none-lazy" one works smoothly and properly, I recommend adding the "none-lazy" one in the new version. I saw that the "lazy" one was added in the new version, which unfortunately will not work :( I am adding the link again to avoid confusion @YellowRoseCx |
I cant use the none lazy one because then I cant use the other ones from gfx1031 because it would overwrite the file Tensilelibrary.dat |
yes, that would be a problem, I didn't think about that. gfx1032 owners will compile it themselves then, I wrote how to compile and create an exe on discord and I'll share it here;
make_pyinstaller_exe_rocm_only.bat copy create a new .bat change rocm version from 5.5 to 5.7 only then run that bat file. it will create exe under koboldcpp-rocm\dist |
Could you try compiling for gpu targets gfx1031 and gfx1032? It should output only 1 tensilelibrary.dat then |
I'm glad you told me that :) I compiled it without any problems, I used rocblas and tensile rel-5.7.1 branches. python rmake.py -a gfx1031;gfx1032 --merge-architectures --no-lazy-library-loading -t "D:\Ai\5-7-1\Tensile" -d -j 16 -v I'll explain step by step how I compiled it a little later, just for information :) Attachments |
InstallGit for Windows
ROCm Windows SDK (i used 5.7.1) ADD PATHCmake and Ninja:
Git:
Perl:
ROCM:
vcpkg
Rocblasgo to another folder for example downloads etc.
Tensilego to another folder
Open x64 native tools(without Admin) and go to the rocblas folder
After the rmake.py command is finished open x64 native tools console with ADMİN
|
I always get this error when compiling with the parameters --lazy-library-loading --no-merge-architectures, if someone can tell me how to solve this error I can also compile the "lazy" one for gfx1031 and gfx1032. I don't understand why, it compiles with --merge-architectures --no-lazy-library-loading without any error. Reading logic files: Launching 16 threads for 108 tasks...
Reading logic files: Done.
[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||] 100% (0.4 secs elapsed)
Using fallback for arch: gfx1031
Using fallback for arch: gfx1032
# Writing Custom CMake
# Writing Kernels...
Generating kernels: Launching 16 threads...
Generating kernels: Done.
*
Compiling source kernels: Launching 16 threads...
Compiling source kernels: Done.
# Kernel Building elapsed time = 82.0 secs
# Tensile Library Writer DONE
################################################################################
[4/257] library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat ecc6f16db1efb076
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat ecc6f16db1efb076
Error copying file (if different) from "D:\Ai\5-7-1\rocBLAS\build\release\Tensile\library\TensileLibrary_lazy_gfx1032.dat" to "D:/Ai/5-7-1/rocBLAS/build/release/Tensile/library".
Batch file failed at line 61 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\Ai\5-7-1\rocBLAS\rmake.py", line 512, in <module>
main()
File "D:\Ai\5-7-1\rocBLAS\rmake.py", line 505, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\Ai\5-7-1\rocBLAS\rmake.py", line 468, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.2288.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 16 --verbose all' returned non-zero exit status 1 |
I'm building a new koboldcpp version now to see if it works |
By the way, one thing I noticed is that the tensilelibrary.dat file may be related to the Tensile version regardless of the cards. When I do SHA check, it gives the same result as my previous build. I also compared it with the kernel and library file from your first build where you supported gfx1031, I think the compiler(rocblas, tensile) used HIP SDK version 5.5.1 and that's why both kernel and tensilelibrary SHAs are not consistent. With new HIP versions and card support, if you take a base version(sdk, tensile, rocblas) and tell the card owners to compile in that version and send the kernel file, it seems to work fine. |
I have a 6600XT card now, should I can use the zip file or I have to do build step like you? @jasyuiop I think it little overhead for me |
You don't need to bother with compiling the kernel or koboldcpp, I compiled the kernel for gfx1032 and @YellowRoseCx added it to the new releases, just do the following and you're good
|
Aw so sweet, thank you so much @jasyuiop |
@jasyuiop for me it is stuck at: Should i try waiting even longer or has the command finished doing what's needed? |
no, you should wait, but if you proceed as in the message you quoted, you may get an error If you follow all the steps as I describe here, you should not get any error #655 (comment) the reason I got an error there was because I was missing something, I realized it too late :) |
Thanks. I'm actually trying to build for gfx1010, how should i adapt the process in the quoted comment? |
if you followed exactly the same path, you only need to change the parameter for gfx1010 (don't forget to change the path for the tensile folder and change the -j parameter depending on how many cores you have) python rmake.py -a gfx1010 --merge-architectures --no-lazy-library-loading -t "D:\Ai\5-7-1\Tensile" -d -j 16 -v |
Information
I have a rx 6600(gfx1032) video card, I can use rocblas on linux using "export HSA_OVERRIDE_GFX_VERSION=10.3.0" But there is no kernel and Tensilelibrary support for rocblas gfx1032 on windows.
I had version 5.5.1 Rocm installed on my system. I used rocm-5.5.1 branches of rocBLAS and Tensile.
I applied this patch to Tensile; https://raw.githubusercontent.com/ulyssesrr/docker-rocm-xtra/f25f12835c1d0a5efa80763b5381accf175b200e/rocm-xtra-rocblas-builder/patches/Tensile-fix-fallback-arch-build.patch
Resources I follow
ggerganov#1087 (comment)
#441
https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/guide_build_llamacpp_on_windows_with_amd_gpus_and/
using the information here I was able to create a "non-lazy merged library" for gfx1032. I could not create the "lazy" one no matter what I did.
Results
using the generated Kernels.so-000-gfx1032.hsaco and TensileLibrary.dat files I was able to load 7b llm completely on the gpu in koboldcpp-rocm, I got an average speed of 25t/s in a new chat.
Progress
I installed version 5.7.1 ROCm, I am trying to make lazy and non-lazy versions for gfx1032 without any patches using release/rocm-rel-5.7 branches of tensile and rocblas. I don't know if I can compile it successfully, if I succeed I will add those files.
The last word
I would appreciate if you add these files to the pre-builds in future releases. @YellowRoseCx
Attachments
gfx1032_none_lazy.zip
The text was updated successfully, but these errors were encountered: