Merging tensors of larger models #1

kir-gadjello · 2023-03-10T21:33:07Z

Currently, only LLaMA-7B is supported since I haven't figured out how to merge the tensors of the bigger models. However, in theory, you should be able to run 65B on a 64GB MacBook

It shouldn't be hard to merge tensors with my https://github.com/kir-gadjello/zipslicer library, but it's pure Python! If you want to keep the project pure C++ you might want to write a standalone gist script that uses zipslicer to unpack weight shards into binary files.

ggerganov · 2023-03-10T22:02:58Z

Thanks! The bigger problem now is that I am out of disk space, haha!
Anyway, will try to figure out something later

theontho · 2023-03-11T19:37:15Z

Leave a tip jar to get a @ggerganov bigger SSD and / or macbook :D

eous · 2023-03-11T19:55:50Z

Its kinda pointless now but I was able to merge the 30B and 65B with this core bit of hackery added to the convert script.

+    fname_model = sys.argv[1] + "/consolidated." + str(i).zfill(2) + ".pth"
+    model_i = torch.load(fname_model, map_location="cpu")
+    
+    # Since the models are split, we need to append the tensors changing the shape/size
+    for k, v in model_i.items():
+        if k in model:
+            if model[k].dtype != v.dtype:
+                print("ERROR: Tensor types do not match: ", model[k].dtype, " vs ", v.dtype)
+                sys.exit(1)
+            elif len(model[k].shape) == 1:
+                print("Skipping tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                continue
+            elif k == "output.weight":
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                model[k] = torch.cat((model[k], v), dim=0)
+                print("New shape: ", model[k].shape)                
+                continue
+            elif "tok_embeddings" in k:
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                model[k] = torch.cat((model[k], v), dim=1)
+                print("New shape: ", model[k].shape)
+                continue
+            elif "attention.wo" in k:
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                model[k] = torch.cat((model[k], v), dim=1)
+                print("New shape: ", model[k].shape)
+                continue
+            elif "feed_forward.w2" in k:
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+                model[k] = torch.cat((model[k], v), dim=1)
+                print("New shape: ", model[k].shape)
+            else:
+                print("Concatenating tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype, " with shape: ", model[k].shape)
+                model[k] = torch.cat((model[k], v), dim=0)
+                print("New shape: ", model[k].shape)
+        else:
+            print("Adding tensor: " + k + " with shape: ", v.shape, " and type: ", v.dtype)
+            model[k] = v
+    del model_i```

ggerganov · 2023-03-12T06:22:47Z

Fixed with 007a8f6

On startup, we go through all the parts and merge them dynamically in the ggml buffers.

…l-instead-of-wget-1 Update command for downloading the weights to use `curl` `curl` is preinstalled on macOS and the new command is equivalent to the `wget` version but avoids having to install `wget`. This should save people some time.

… refactors

broken change: delete original profile ggerganov#1 from q_f32 profiles

Support redpajama

broken change: delete original profile ggerganov#1 from q_f32 profiles

Fix streaming

* kquants_iter for hipblas and add gfx803 * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16 * remove dmmv_f16 for now

* koboldcpp-ROCm Port commit 3416c98 Merge: 5eb17f0 4c4e435 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Aug 25 13:46:56 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 5eb17f0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Aug 25 13:38:21 2023 -0500 ROCm Port update * use hipblas based on cublas * Update Makefile for the Cuda kernels * Expand arch list and make it overrideable * Fix multi GPU on multiple amd architectures with rocblas_initialize() (ggerganov#5) * add hipBLAS to README * new build arg LLAMA_CUDA_MMQ_Y * fix half2 decomposition * Add intrinsics polyfills for AMD * AMD assembly optimized __dp4a * Allow overriding CC_TURING * use "ROCm" instead of "CUDA" * ignore all build dirs * Add Dockerfiles * fix llama-bench * fix -nommq help for non CUDA/HIP --------- Co-Authored-By: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Co-Authored-By: ardfork <134447697+ardfork@users.noreply.github.com> Co-Authored-By: funnbot <22226942+funnbot@users.noreply.github.com> Co-Authored-By: Engininja2 <139037756+Engininja2@users.noreply.github.com> Co-Authored-By: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> Co-Authored-By: jammm <2500920+jammm@users.noreply.github.com> Co-Authored-By: jdecourval <7315817+jdecourval@users.noreply.github.com> commit b34f4bd Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Aug 19 17:12:52 2023 -0500 Update README.md commit 7d11961 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Aug 14 23:03:12 2023 -0500 remove force DMMV commit cd61aa0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Aug 12 17:24:31 2023 -0500 restore main_gpu parameter commit 4a042f3 Author: Henri Vasserman <henv@hot.ee> Date: Sat Aug 12 10:51:46 2023 +0300 gfx1100 support --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> Co-authored-by: jammm <2500920+jammm@users.noreply.github.com> Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com> commit 8913bc6 Author: Henri Vasserman <henv@hot.ee> Date: Fri Aug 11 10:16:02 2023 +0300 Allow overriding CC_TURING commit e77a4c3 Author: Henri Vasserman <henv@hot.ee> Date: Fri Aug 11 10:00:07 2023 +0300 Merge 'origin/master' into hipblas commit cc4c4e3 Author: Engininja2 <139037756+Engininja2@users.noreply.github.com> Date: Fri Aug 11 09:43:14 2023 +0300 New __dp4a assembly Now compatible with gfx900 and faster as well. commit 1a03b70 Author: Henri Vasserman <henv@hot.ee> Date: Fri Aug 11 09:30:28 2023 +0300 Undo mess --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> commit 4366ff9 Author: DannyDaemonic <DannyDaemonic@gmail.com> Date: Thu Aug 10 13:11:36 2023 -0700 Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows. commit 811ff85 Author: Christian Demsar <crasm@git.vczf.us> Date: Thu Aug 10 10:28:27 2023 -0400 Add --n-predict -2 for stopping generation on full context (ggerganov#2565) commit 37c9717 Author: Martin Krasser <krasserm@googlemail.com> Date: Thu Aug 10 12:16:38 2023 +0200 Fix grammar-based sampling issue in server (ggerganov#2566) commit d18ecd5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Aug 10 13:19:41 2023 -0500 make mmq gen faster for amd commit 243894a Author: Henri Vasserman <henv@hot.ee> Date: Thu Aug 10 12:14:40 2023 +0300 ws fix commit ac2f14d Author: Engininja2 <139037756+Engininja2@users.noreply.github.com> Date: Thu Aug 10 12:11:27 2023 +0300 AMD assembly optimized __dp4a Doesn't seem to work for gfx900, so commented out. commit 9dba0c9 Author: Henri Vasserman <henv@hot.ee> Date: Thu Aug 10 12:09:28 2023 +0300 Fix merge --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com> commit f570b5c Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 22:11:20 2023 -0500 Revert "revert cuda changes as they are bugggy" This reverts commit 1541bf8. commit 1541bf8 Author: Concedo <39025047+LostRuins@users.noreply.github.com> Date: Wed Aug 9 22:36:41 2023 +0800 revert cuda changes as they are bugggy commit bacc202 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 20:37:17 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit b7cb4cf Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 20:00:52 2023 -0500 additional fixes commit fadae72 Merge: 518eb2a 8f8ab6c Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 18:45:50 2023 -0500 Merge branch 'hipblas' into develop4Main commit 518eb2a Merge: bda0215 cae6a84 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 18:32:10 2023 -0500 Merge remote-tracking branch 'upstream/concedo' into develop2Main commit bda0215 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 18:17:54 2023 -0500 update makefile to multisystem path commit 8f8ab6c Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 9 18:05:03 2023 -0500 hipLDFLAG Path change Unix to multisystem in Makefile changed the hardcoded linux distro hipblas LD path from -L/opt/rocm/lib to use the defined ROCM_PATH variable to be flexible with ROCm on non-Linux OS commit 610ba4c Merge: 4024f91 25d43e0 Author: Henri Vasserman <henv@hot.ee> Date: Wed Aug 9 23:54:58 2023 +0300 Merge 'origin/master' into hipblas commit 4024f91 Author: Henri Vasserman <henv@hot.ee> Date: Wed Aug 9 01:56:44 2023 +0300 Add intrinsics polyfills for AMD --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com> Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com> commit ab62128 Merge: d91456a f5bfea0 Author: Henri Vasserman <henv@hot.ee> Date: Wed Aug 9 00:37:01 2023 +0300 Merge 'origin/master' into hipblas commit ee9fa2a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Aug 2 01:53:58 2023 -0500 Update Makefile commit d91456a Author: ardfork <134447697+ardfork@users.noreply.github.com> Date: Mon Jul 31 20:35:00 2023 +0300 fix half2 decomposition commit c1cb70d Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 31 19:56:44 2023 +0300 new build arg LLAMA_CUDA_MMQ_Y commit c1664a0 Merge: 4336231 0728c5a Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 31 19:32:27 2023 +0300 Merge 'origin/master' into hipblas commit 848558d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 30 20:02:52 2023 -0500 import vars logic fix commit b650b84 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 30 00:21:36 2023 -0500 Update easy_KCPP-ROCm_install.sh commit 8573a67 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 21:31:12 2023 -0500 remove duplicate code and fix typo remove duplicate tooltip commit 430986e Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 21:07:34 2023 -0500 hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available " if len(runopts)==6 else + " commit dd0db72 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 20:52:31 2023 -0500 hide "missing" if all are built move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available commit 43fffb6 Merge: 0ed65a4 b40550c Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 19:13:15 2023 -0500 Merge branch 'concedo' commit 0ed65a4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 18:34:21 2023 -0500 Hide unavailable backends & Add tooltip over backend count Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command Add tooltip when hovering over backend count label hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built commit 2a26398 Merge: cee2e9d 31486eb Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 29 15:16:33 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 4336231 Author: Henri Vasserman <henv@hot.ee> Date: Sat Jul 29 18:35:56 2023 +0300 add hipBLAS to README --------- Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com> commit f8e3fc6 Author: Henri Vasserman <henv@hot.ee> Date: Sat Jul 29 14:16:46 2023 +0300 rocblas init stuff commit d2ade63 Merge: cde52d6 8a88e58 Author: Henri Vasserman <henv@hot.ee> Date: Sat Jul 29 12:59:48 2023 +0300 Merge 'origin/master' into hipblas commit cee2e9d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 26 23:36:55 2023 -0500 Only Show Available Backends in GUI Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command commit 7863610 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 26 13:27:22 2023 -0500 Update easy_KCPP-ROCm_install.sh commit 731cd6e Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jul 25 22:39:50 2023 -0500 Create easy_rocm_install.sh commit f154685 Merge: cbdc1f3 94e0a06 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jul 25 22:25:10 2023 -0500 Merge branch 'concedo_experimentalMAIN' commit cbdc1f3 Merge: 5b838d4 9731682 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 16:53:21 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit cde52d6 Merge: 8e8054a 84e09a7 Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 24 12:22:58 2023 +0300 Merge 'origin/master' into hipblas commit 8e8054a Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 24 12:20:49 2023 +0300 Add rocblas to build files commit 1f6294d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 03:52:01 2023 -0500 Fix multi GPU on multiple amd architectures with rocblas_initialize() (ggerganov#5) * initialize rocblas commit 5b838d4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 03:10:35 2023 -0500 amd multigpu full layer offload w/o vram scratch commit 9bfb2fd Merge: b379f9d 66328fc Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 03:07:44 2023 -0500 Merge branch 'concedo_experimental' commit b379f9d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 03:07:00 2023 -0500 Revert "amd multigpu full layer offload w/o vram scratch" This reverts commit 9adfc8e. commit 9adfc8e Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 02:56:40 2023 -0500 amd multigpu full layer offload w/o vram scratch commit 05c792e Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 24 00:18:48 2023 -0500 initialize rocblas commit ade68d0 Merge: 521ad6b 56995ca Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 23 20:25:05 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 521ad6b Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jul 20 21:42:33 2023 -0500 lazy import_var error handling for saves commit 9553e52 Merge: cac6650 f036109 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jul 20 19:59:41 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit cac6650 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 17 23:05:02 2023 -0500 Makefile fix! Allows hip/clblast build together commit 3db70b5 Merge: 2ec4466 7568d1a Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 18 01:54:17 2023 +0300 Merge 'origin/master' into hipblas commit f208670 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jul 14 02:56:03 2023 -0500 improve error handling with gpu names commit 860e738 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jul 14 00:33:03 2023 -0500 Show GPU names in GUI, Only show GPUs that exist changed the pre-set 1,2,3 and 1,2,3,all settings that the GPU selector had and replaced them with a function that grabs the GPU names and sets the names as the values for the selector boxes. commit 2ec4466 Author: Henri Vasserman <henv@hot.ee> Date: Thu Jul 13 13:44:02 2023 +0300 Update build flags. GGML_CUDA_DMMV_Y is now GGML_CUDA_MMV_Y so update your build instructions. GGML_CUDA_FORCE_DMMV is always enabled. --------- Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> commit cd36b18 Merge: afcb8fe 1cbf561 Author: Henri Vasserman <henv@hot.ee> Date: Thu Jul 13 13:03:01 2023 +0300 Merge 'origin/master' into hipblas commit ac7ebc3 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 18:32:18 2023 -0500 add hipBLAS name scheme to GUI and update README commit 7f85cc5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 17:35:54 2023 -0500 update makefile and ggml.c commit 6ca3499 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 15:43:45 2023 -0500 ggml.c fix commit 770e674 Merge: 2b289cd 5941514 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 15:24:36 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 2b289cd Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 14:30:00 2023 -0500 Update c-cpp.yml commit 5dae95a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 14:28:51 2023 -0500 Update c-cpp.yml commit b37cd73 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 12 14:27:04 2023 -0500 Create c-cpp.yml to test Actions commit afcb8fe Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 11 18:09:27 2023 +0300 Add new config option commit 8c2c497 Merge: e610466 2347463 Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 11 17:53:54 2023 +0300 Merge 'origin/master' into hipblas commit e610466 Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 11 17:53:14 2023 +0300 Expand arch list and make it overrideable commit 80e4e54 Merge: 7735c5a 1d16309 Author: Henri Vasserman <henv@hot.ee> Date: Mon Jul 10 02:09:28 2023 +0300 Merge 'origin/master' into hipblas commit 8432e9d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 9 16:55:30 2023 -0500 Update Makefile commit b58c189 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 9 16:20:00 2023 -0500 Add multi-gpu CuBLAS support to new GUI commit 0c1c71b Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 8 07:56:57 2023 -0500 Update Makefile commit f864f60 Author: Johannes Gäßler <johannesg@5d6.de> Date: Sat Jul 8 00:25:15 2023 +0200 CUDA: add __restrict__ to mul mat vec kernels (ggerganov#2140) commit 4539bc2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 8 01:36:14 2023 -0500 update makefile for changes commit 912e31e Merge: 74e2703 ddaa4f2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jul 7 23:15:37 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 74e2703 Merge: cf65429 f9108ba Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jul 5 15:16:49 2023 -0500 Merge branch 'LostRuins:concedo' into main commit 7735c5a Merge: c3e3733 7ee76e4 Author: Henri Vasserman <henv@hot.ee> Date: Tue Jul 4 17:09:16 2023 +0300 Merge 'origin/master' into hipblas commit cf65429 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 16:56:40 2023 -0500 print cuda or opencl based on what's used commit 72c16d2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 16:45:39 2023 -0500 Revert "fix my mistake that broke other arches" This reverts commit 777aed5. commit 777aed5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jul 3 15:53:32 2023 -0500 fix my mistake that broke other arches commit 27780a9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 16:03:27 2023 -0500 rocm fixes commit f52c7d4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 16:02:58 2023 -0500 Revert "rocm fixes" This reverts commit 2fe9927. commit 2fe9927 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:58:21 2023 -0500 rocm fixes commit efe7560 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:55:43 2023 -0500 Revert "move HIPBLAS definitions into ggml-cuda.h" This reverts commit bf49a93. commit 4fc0181 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 15:55:36 2023 -0500 Revert "move hipblas definitions to header files" This reverts commit 2741ffb. commit 89eb576 Merge: 2741ffb 3d2907d Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jul 2 14:44:13 2023 -0500 Merge branch 'LostRuins:concedo' into main commit c3e3733 Author: Henri Vasserman <henv@hot.ee> Date: Sun Jul 2 15:51:31 2023 +0300 ROCm fixes commit 15db19a Merge: 04419f1 46088f7 Author: Henri Vasserman <henv@hot.ee> Date: Sun Jul 2 15:39:57 2023 +0300 Merge 'origin/master' into hipblas commit 2741ffb Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 17:07:42 2023 -0500 move hipblas definitions to header files commit bf49a93 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 16:38:50 2023 -0500 move HIPBLAS definitions into ggml-cuda.h commit 540f4e0 Merge: 2c3b46f eda663f Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jul 1 14:58:32 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 2c3b46f Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 18:43:43 2023 -0500 changes to fix build commit c9e1103 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 18:20:07 2023 -0500 Update ggml_v2-cuda-legacy.cu for ROCM commit b858fc5 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 17:49:39 2023 -0500 changes to work with upstream commit 69a0c25 Merge: 096f0b0 1347d3a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 29 16:59:06 2023 -0500 Merge remote-tracking branch 'upstream/concedo' commit 04419f1 Merge: bb16eff d3494bb Author: Henri Vasserman <henv@hot.ee> Date: Wed Jun 28 23:30:10 2023 +0300 Merge 'origin/master' into hipblas commit bb16eff Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 15:27:10 2023 -0500 headers fix; add kquants_iter for hipblas and add gfx803 (ggerganov#1) * kquants_iter for hipblas and add gfx803 * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16 * remove dmmv_f16 for now commit 096f0b0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 15:27:02 2023 -0500 revert unnecessary hipblas conditionals commit d81e81a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 28 14:48:23 2023 -0500 Update Makefile hipblas nvcc correction commit c8ae945 Merge: c1e5c83 0be54f7 Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 27 10:50:37 2023 +0300 Merge 'origin/master' into hipblas commit 2579ecf Merge: abed427 d2034ce Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 25 17:50:04 2023 -0500 Merge branch 'LostRuins:concedo' into main commit c1e5c83 Merge: 35a6031 447ccbe Author: Henri Vasserman <henv@hot.ee> Date: Sun Jun 25 21:40:05 2023 +0300 Merge 'origin/master' into hipblas commit 35a6031 Merge: df7346c 66a2555 Author: Henri Vasserman <henv@hot.ee> Date: Sun Jun 25 10:57:48 2023 +0300 Merge 'origin/master' into hipblas commit abed427 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jun 24 19:16:30 2023 -0500 reorganize If statements to include proper headers commit 06c3bf0 Merge: ea6d320 8342fe8 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sat Jun 24 16:57:20 2023 -0500 Merge branch 'LostRuins:concedo' into main commit ea6d320 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Fri Jun 23 01:53:28 2023 -0500 Update README.md commit 4d56ad8 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 16:19:43 2023 -0500 Update README.md commit 21f9308 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 15:42:05 2023 -0500 kquants_iter for hipblas and add gfx803 commit df7346c Merge: 5dd2fbe 7487137 Author: Henri Vasserman <henv@hot.ee> Date: Thu Jun 22 20:51:09 2023 +0300 Merge 'origin/master' into hipblas commit b6ff890 Merge: eb094f0 e6ddb15 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Thu Jun 22 12:42:09 2023 -0500 Merge branch 'LostRuins:concedo' into main commit eb094f0 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 23:59:18 2023 -0500 lowvram parameter description commit 3a5dfeb Merge: 665cc11 b1f00fa Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 16:53:03 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit 665cc11 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Wed Jun 21 01:13:19 2023 -0500 add lowvram parameter commit 222cbbb Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 19:03:28 2023 -0500 add additional hipblas conditions for cublas commit e1f9581 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 16:51:59 2023 -0500 Add hip def for cuda v2 commit 3bff5c0 Merge: a7e74b3 266d47a Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Tue Jun 20 13:38:06 2023 -0500 Merge branch 'LostRuins:concedo' into koboldcpp-rocm commit a7e74b3 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 22:04:18 2023 -0500 Update README.md commit 5e99b3c Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 22:03:42 2023 -0500 Update Makefile commit 9190b17 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Mon Jun 19 21:47:10 2023 -0500 Update README.md commit 5dd2fbe Merge: 67e229b 20568fe Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 20 01:23:12 2023 +0300 Merge 'origin/master' into hipblas commit 2780ea2 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 15:48:00 2023 -0500 Update Makefile commit 04a3e64 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:33:39 2023 -0500 remove extra line commit cccbca9 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:31:17 2023 -0500 attempt adding ROCM hipblas commit a44a1d4 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:31:01 2023 -0500 attempt adding ROCM hipblas commit b088184 Author: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com> Date: Sun Jun 18 14:30:54 2023 -0500 attempt adding ROCM hipblas commit 67e229b Merge: 6f7c156 b241649 Author: Henri Vasserman <henv@hot.ee> Date: Sun Jun 18 00:36:54 2023 +0300 Merge 'origin/master' into hipblas commit 6f7c156 Merge: 61df8e9 fc45a81 Author: Henri Vasserman <henv@hot.ee> Date: Sat Jun 17 16:53:22 2023 +0300 Merge 'origin/master' into hipblas commit 61df8e9 Author: Henri Vasserman <henv@hot.ee> Date: Wed Jun 14 22:46:10 2023 +0300 add cudaMemset commit a836529 Merge: 85f902d 254a7a7 Author: Henri Vasserman <henv@hot.ee> Date: Wed Jun 14 22:41:55 2023 +0300 Merge 'origin/master' into hipblas commit 85f902d Merge: 4362e80 b50b570 Author: Henri Vasserman <henv@hot.ee> Date: Thu Jun 8 10:50:28 2023 +0300 Merge 'origin/master' into hipblas commit 4362e80 Merge: fa5b3d7 17366df Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 23:14:40 2023 +0300 Merge 'origin/master' into hipblas commit fa5b3d7 Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 18:47:00 2023 +0300 fix makefile. commit 1ba4ce4 Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 18:41:08 2023 +0300 Revert "warp size fixes" It seems like 32 is faster for me, at least and it won't cause so many conflicts. This reverts commit 5d6eb72. commit 5d6eb72 Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 18:32:41 2023 +0300 warp size fixes commit 33091a9 Merge: 9fdaa1d 2d43387 Author: Henri Vasserman <henv@hot.ee> Date: Tue Jun 6 16:19:23 2023 +0300 Merge 'origin/master' into hipblas commit 9fdaa1d Author: Henri Vasserman <henv@hot.ee> Date: Sat May 27 19:17:53 2023 +0300 Add more defs For forward compatibility ggerganov#1607 commit a4648c1 Merge: 4c8b3fb 0ecb1bb Author: Henri Vasserman <henv@hot.ee> Date: Sat May 27 18:22:39 2023 +0300 Merge 'origin/master' into hipblas commit 4c8b3fb Author: Henri Vasserman <henv@hot.ee> Date: Fri May 26 01:08:53 2023 +0300 add configurable vars commit 30d921a Author: Henri Vasserman <henv@hot.ee> Date: Fri May 26 01:03:56 2023 +0300 and makefile commit a593a4f Author: Henri Vasserman <henv@hot.ee> Date: Fri May 26 00:55:28 2023 +0300 Add missing parameters commit 174bf6a Merge: f80ce7a 1fcdcc2 Author: Henri Vasserman <henv@hot.ee> Date: Fri May 26 00:44:23 2023 +0300 Merge 'origin/master' into hipblas commit f80ce7a Merge: 600ace3 ac7876a Author: Henri Vasserman <henv@hot.ee> Date: Thu May 25 00:02:50 2023 +0300 Merge branch 'origin/master' into hipblas commit 600ace3 Author: Henri Vasserman <henv@hot.ee> Date: Sat May 20 23:42:20 2023 +0300 update warp size commit b19fefe Author: Henri Vasserman <henv@hot.ee> Date: Sat May 20 23:28:08 2023 +0300 Forwardcompat commit c66115b Merge: a0b2d5f b8ee340 Author: Henri Vasserman <henv@hot.ee> Date: Sat May 20 18:29:31 2023 +0300 Merge 'origin/master' into hipblas commit a0b2d5f Merge: 8bab456 2a5ee02 Author: Henri Vasserman <henv@hot.ee> Date: Tue May 16 17:08:29 2023 +0300 Merge 'origin/master' into hipblas commit 8bab456 Merge: 2956630 b5c9295 Author: Henri Vasserman <henv@hot.ee> Date: Mon May 15 00:01:12 2023 +0300 Merge 'origin/master' into hipblas commit 2956630 Merge: 0fe6384 f048af0 Author: Henri Vasserman <henv@hot.ee> Date: Sat May 13 13:12:52 2023 +0300 Merge 'origin/master' into hipblas commit 0fe6384 Author: Henri Vasserman <henv@hot.ee> Date: Fri May 12 17:22:11 2023 +0300 fix makefile commit 605560d Merge: 127f68e 089b1c9 Author: Henri Vasserman <henv@hot.ee> Date: Fri May 12 16:12:53 2023 +0300 Merge 'origin/master' into hipblas commit 127f68e Merge: 070cbcc b608b55 Author: Henri Vasserman <henv@hot.ee> Date: Thu May 11 20:21:27 2023 +0300 Merge 'origin/master' into hipblas commit 070cbcc Author: Henri Vasserman <henv@hot.ee> Date: Sun May 7 18:10:56 2023 +0300 occupanct function commit a3296d5 Merge: 0aefa6a e129551 Author: Henri Vasserman <henv@hot.ee> Date: Sun May 7 18:06:04 2023 +0300 Merge 'origin/master' into hipblas commit 0aefa6a Merge: baeb482 1b0fd45 Author: Henri Vasserman <henv@hot.ee> Date: Sun May 7 12:24:41 2023 +0300 Merge 'origin/master' into hipblas commit baeb482 Author: Henri Vasserman <henv@hot.ee> Date: Sun May 7 12:24:12 2023 +0300 Revert to default copy commit 289073a Merge: 1107194 173d0e6 Author: Henri Vasserman <henv@hot.ee> Date: Sat May 6 19:59:41 2023 +0300 Merge 'origin/master' into hipblas commit 1107194 Merge: 04c0d48 a3b85b2 Author: Henri Vasserman <henv@hot.ee> Date: Sat May 6 00:38:20 2023 +0300 Merge 'origin/master' into hipblas commit 04c0d48 Author: Henri Vasserman <henv@hot.ee> Date: Thu May 4 12:31:16 2023 +0300 Move all HIP stuff to ggml-cuda.cu commit d83cfba Merge: b67cc50 799fdc1 Author: Henri Vasserman <henv@hot.ee> Date: Thu May 4 11:31:16 2023 +0300 Merge 'origin/master' into hipblas commit b67cc50 Merge: fcbc262 e216aa0 Author: Henri Vasserman <henv@hot.ee> Date: Wed May 3 15:04:51 2023 +0300 Merge 'origin/master' into hipblas commit fcbc262 Merge: c73def1 f4cef87 Author: Henri Vasserman <henv@hot.ee> Date: Mon May 1 22:45:29 2023 +0300 Merge 'origin/master' into hipblas commit c73def1 Merge: d8ea75e f0d70f1 Author: Henri Vasserman <henv@hot.ee> Date: Sun Apr 30 18:40:42 2023 +0300 Merge 'origin/master' into hipblas commit d8ea75e Merge: d194586 334637e Author: Henri Vasserman <henv@hot.ee> Date: Sat Apr 29 11:25:51 2023 +0300 Merge 'origin/master' into hipblas commit d194586 Merge: 2ab9d11 7f15c5c Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 23:03:52 2023 +0300 Merge 'origin/master' into hipblas commit 2ab9d11 Merge: 3b4a531 04aaae1 Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 16:30:05 2023 +0300 Merge 'origin/master' into hipblas commit 3b4a531 Merge: a1caa48 0b2da20 Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 10:08:41 2023 +0300 Merge 'origin/master' into hipblas commit a1caa48 Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 10:08:21 2023 +0300 add more cuda defines This is so 'slaren/cuda-f16f32' would merge. commit ecc0565 Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 28 01:58:27 2023 +0300 only .cu file needs to be complied as device commit ef51e9e Merge: d571d16 4afcc37 Author: Henri Vasserman <henv@hot.ee> Date: Wed Apr 26 12:46:26 2023 +0300 Merge branch 'ggerganov:master' into hipblas commit d571d16 Merge: 608aa33 dd0eabc Author: Henri Vasserman <henv@hot.ee> Date: Tue Apr 25 21:15:33 2023 +0300 Merge 'origin/master' into hipblas commit 608aa33 Author: Henri Vasserman <henv@hot.ee> Date: Tue Apr 25 21:15:04 2023 +0300 change default GPU arch to match CMake commit 3a004b2 Author: Henri Vasserman <henv@hot.ee> Date: Mon Apr 24 02:24:54 2023 +0300 add rpath commit db7a012 Merge: 3677235 284685f Author: Henri Vasserman <henv@hot.ee> Date: Sun Apr 23 21:49:28 2023 +0300 Merge 'origin/master' into hipblas commit 3677235 Author: Henri Vasserman <henv@hot.ee> Date: Sat Apr 22 23:28:00 2023 +0300 More build file changes commit d3e1984 Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 21 03:32:06 2023 +0300 add rpath commit 0e005f7 Author: Henri Vasserman <henv@hot.ee> Date: Fri Apr 21 02:13:00 2023 +0300 Build file changes Now HIP Clang is not required, the CMake scripts will configure the needed compiler, which can be system clang++. Also other code can still use GCC, but CMake will force the clang to link. commit 54a63c1 Author: Henri Vasserman <henv@hot.ee> Date: Thu Apr 20 22:19:22 2023 +0300 Update Makefile for the Cuda kernels commit 0fd8363 Author: Henri Vasserman <henv@hot.ee> Date: Thu Apr 20 02:04:00 2023 +0300 use hipblas based on cublas * Merge Fixes * readme merge fix * remove old ggmlv2 changes * bring ggml v2_cuda up to date with AMD changes * Revert ggml v2_cuda changes BC they werent needed This reverts commit 3385dd4. * avoid launching subprocesses to get device names for now, but other than that seems to be working --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>

slaren: Cmrp fixes

Nits found in binary renames

* a chinese word formed of 3 chinese charcters but the first 2 is not word * tokenizer-fix * E5 Pretokenizer bugfix * whitespace fix * remove extra wpm --------- Co-authored-by: Mike Fan <60965742+mike-fzy@users.noreply.github.com> Co-authored-by: Oliver Ye <OliverY@MacBook-Pro.local>

When `llama-batched-bench` is invoked _without_ setting `-npl`, "number of parallel prompts", it segfaults. The segfault is caused by invoking `max_element()` on a zero-length vector, `n_pl` This commit addresses that by first checking to see if the number of parallel prompts is zero, and if so sets the maximum sequence size to 1; otherwise, sets it to the original, the result of `max_element()`. Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf` ``` * thread ggerganov#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28 69 llama_context_params ctx_params = llama_context_params_from_gpt_params(params); 70 71 // ensure enough sequences are available -> 72 ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end()); ```

* [example] batched-bench "segmentation fault" When `llama-batched-bench` is invoked _without_ setting `-npl`, "number of parallel prompts", it segfaults. The segfault is caused by invoking `max_element()` on a zero-length vector, `n_pl` This commit addresses that by first checking to see if the number of parallel prompts is zero, and if so sets the maximum sequence size to 1; otherwise, sets it to the original, the result of `max_element()`. Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf` ``` * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28 69 llama_context_params ctx_params = llama_context_params_from_gpt_params(params); 70 71 // ensure enough sequences are available -> 72 ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end()); ``` * Update examples/batched-bench/batched-bench.cpp Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>

Fixed Line

* fstring ggerganov#1 * fstring ggerganov#2

* dictionary ggerganov#1 * dictionary ggerganov#2

#1) * Fixed a bug where debug code was included in the release, resulting in an undefined function error. * Change the path of the QNN library when building in termux environment * Revert "Change the path of the QNN library when building in termux environment" This reverts commit c6e26a3. * Changed so that GGML_QNN_DEFAULT_LIB_SEARCH_PATH can be set from command line arguments

ggerganov closed this as completed Mar 12, 2023

ggerganov mentioned this issue Mar 12, 2023

What is the meaning of hacked? #33

Closed

gjmulder added the enhancement New feature or request label Mar 15, 2023

SavageShrimp mentioned this issue Mar 20, 2023

segmentation fault Alpaca #317

Closed

bogdad mentioned this issue Mar 29, 2023

Support tensors with 64-bit number of elements in ggml #599

Closed

FNsi mentioned this issue Apr 1, 2023

Performance investigation using AMD BLIS instead of OpenBLAS on 16 core AMD Zen1 #637

Closed

4 tasks

sha0coder mentioned this issue Apr 5, 2023

[Bug] dequantize_row_q4_0 segfaults #791

Closed

Zeki-Zhang mentioned this issue Apr 29, 2023

quantization command in README.md #1227

Closed

4 tasks

mqy added a commit to mqy/llama.cpp that referenced this issue May 26, 2023

ggml.c: bugfix CBLAS profile ggerganov#1 was not executed; misc minor…

4d7c7b8

… refactors

mqy added a commit to mqy/llama.cpp that referenced this issue May 26, 2023

ggml.c: bugfix CBLAS profile ggerganov#1 was not executed; misc minor…

26eb856

… refactors

richkcho mentioned this issue May 29, 2023

Merged lora model forgets lora when converted to ggml. (with llama-cpp-python, DOES NOT repro with ./main) #1631

Closed

mqy added a commit to mqy/llama.cpp that referenced this issue May 29, 2023

ggml.c: bugfix CBLAS profile ggerganov#1 was not executed; misc minor…

2ea239a

… refactors

Kangmo mentioned this issue May 31, 2023

[User] error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch #1655

Closed

mqy added a commit to mqy/llama.cpp that referenced this issue May 31, 2023

mulmat-tune: fixed wrong result file name; decrease hist buf size;

c0d321f

broken change: delete original profile ggerganov#1 from q_f32 profiles

syoyo pushed a commit to syoyo/llama.cpp that referenced this issue May 31, 2023

Merge pull request ggerganov#1 from togethercomputer/support_redpajama

ecd78a6

Support redpajama

mqy added a commit to mqy/llama.cpp that referenced this issue Jun 4, 2023

mulmat-tune: fixed wrong result file name; decrease hist buf size;

c67cb1b

broken change: delete original profile ggerganov#1 from q_f32 profiles

mqy mentioned this issue Jun 8, 2023

Fine tune MUL_MAT, new threading (spin+wait/notify), speedup q_f32 BLAS by splitting COMPUTE stage #1632

Closed

AphidGit mentioned this issue Jun 13, 2023

LLaMA NUMA could be better #1437

Closed

rshenoy2000 mentioned this issue Jul 3, 2023

[User] Converting GPT4All Model fails #2080

Closed

adaaaaaa mentioned this issue Jul 5, 2023

illegal instrution #2090

Closed

This was referenced Jul 9, 2023

Implement classifier-free guidance #2135

Merged

Pool Android performance and GPU not used at all when built with OpenCL #2052

Closed

coadmonky mentioned this issue Jul 24, 2023

Decrease in Performance #2355

Closed

rooprob pushed a commit to rooprob/llama.cpp that referenced this issue Aug 2, 2023

Merge pull request ggerganov#1 from sumo43/master

deb3818

Fix streaming

funnbot pushed a commit to funnbot/llama.cpp that referenced this issue Aug 8, 2023

headers fix; add kquants_iter for hipblas and add gfx803 (ggerganov#1)

bb16eff

* kquants_iter for hipblas and add gfx803 * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16 * remove dmmv_f16 for now

shahizat mentioned this issue Jan 6, 2024

MPI issue on the Nvidia Jetson Cluster #4792

Closed

segmond mentioned this issue Jan 14, 2024

train-text-from-scratch oom (in tokenizer?) #4300

Closed

4 tasks

java63940 mentioned this issue Jan 16, 2024

Adreno gpu run crash #4973

Closed

freeone3000 mentioned this issue Mar 27, 2024

Compilation issue for CUDA #6350

Closed

qtyandhasee mentioned this issue Mar 31, 2024

use vulkan on jetson Jetson Xavier NX could not convert error #6406

Closed

schmorp mentioned this issue Apr 7, 2024

GGML_ASSERT: llama.cpp/ggml-cuda/argsort.cu:48: (ncols & (ncols - 1)) == 0 #6527

Closed

dranger003 pushed a commit to dranger003/llama.cpp that referenced this issue Apr 8, 2024

$@RefractAI$

Merge pull request ggerganov#1 from slaren/cmrp-fixes

8b6577b

slaren: Cmrp fixes

schmorp mentioned this issue Apr 10, 2024

Segmentation fault during IQ3_XS generation. #6597

Closed

crimson-knight mentioned this issue Apr 11, 2024

Bug - gguf-common.h cannot be found by metal when executing the binary from a symbolic link #6608

Closed

David-AU-github mentioned this issue Apr 12, 2024

b2447 (c47cf41) decreased output quality #6571

Closed

Lucidology mentioned this issue Apr 21, 2024

Unable to make LLAMA_CUBLAS=1 Unknown option forward-unknown-to-host-compiler #1404

Closed

steampunque mentioned this issue May 21, 2024

b2950 broke RPC mode #7427

Closed

HanClinto pushed a commit to HanClinto/llama.cpp that referenced this issue Jun 10, 2024

Merge pull request ggerganov#1 from HanClinto/bins-rename-nits

82df7f9

Nits found in binary renames

takosalad mentioned this issue Jun 24, 2024

Bug: Crashes at the end of startup during first prompt processing #8096

Closed

micsthepick mentioned this issue Jul 1, 2024

Bug: GGML assert with bf16, RTX3090 #8234

Closed

ko-alex mentioned this issue Jul 4, 2024

Bug: gemma 2 27B GGML_ASSERT n_dims <= ne0 #8246

Closed

apresence mentioned this issue Jul 10, 2024

Bug: InternLM 2.5 Chat Tool Calls: Incorrect and Inconsistent Formatting #8405

Closed

chraac mentioned this issue Jul 16, 2024

ggml-qnn: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend #6869

Closed

4 tasks

m828 mentioned this issue Jul 16, 2024

Bug: ROCm CUDA error #8504

Closed

ggerganov pushed a commit that referenced this issue Aug 6, 2024

Merge pull request #1 from harvestingmoon/minicpm-v2.5

b31f51f

Fixed Line

fan-chao mentioned this issue Aug 13, 2024

[CANN] Support Q4_0 for Ascend NPU #8822

Merged

4 tasks

slaren mentioned this issue Aug 15, 2024

Threadpool: take 2 #8672

Merged

4 tasks

znzjugod mentioned this issue Aug 30, 2024

Bug: A crash occurs when llama-bench is running on multiple cann devices. #9250

Closed

jeroen-mostert pushed a commit to jeroen-mostert/llama.cpp that referenced this issue Aug 30, 2024

Streamline with fstrings (ggerganov#1006)

ce971a0

* fstring ggerganov#1 * fstring ggerganov#2

jeroen-mostert pushed a commit to jeroen-mostert/llama.cpp that referenced this issue Aug 30, 2024

Streamline with dictionaries (ggerganov#1005)

7de1ebf

* dictionary ggerganov#1 * dictionary ggerganov#2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging tensors of larger models #1

Merging tensors of larger models #1

kir-gadjello commented Mar 10, 2023

ggerganov commented Mar 10, 2023

theontho commented Mar 11, 2023

eous commented Mar 11, 2023 •

edited

Loading

ggerganov commented Mar 12, 2023

Merging tensors of larger models #1

Merging tensors of larger models #1

Comments

kir-gadjello commented Mar 10, 2023

ggerganov commented Mar 10, 2023

theontho commented Mar 11, 2023

eous commented Mar 11, 2023 • edited Loading

ggerganov commented Mar 12, 2023

eous commented Mar 11, 2023 •

edited

Loading