Cherry 1118 #5

arthw · 2024-11-19T00:09:37Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Cherry pick to 11-18 version

* Add scaffolding for ggml logging macros * Metal backend now uses GGML logging * Cuda backend now uses GGML logging * Cann backend now uses GGML logging * Add enum tag to parameters * Use C memory allocation funcs * Fix compile error * Use GGML_LOG instead of GGML_PRINT * Rename llama_state to llama_logger_state * Prevent null format string * Fix whitespace * Remove log callbacks from ggml backends * Remove cuda log statement

* Update README.md fixed RNG seed info * changed print format to unsigned

ggml : remove test-backend-buffer ggml : fix CUDA build warnings

* rerank : use [SEP] token instead of [BOS] ggml-ci * common : sanity check for non-NULL tokens ggml-ci * ci : adjust rank score interval ggml-ci * ci : add shebang to run.sh ggml-ci

Co-authored-by: Samuel Morris <samuel.morris@artlist.io>

* Single allocation of encode_async block with non-ARC capture in ggml-metal.m * Moving Block_release to the deallocation code * Release encode block when re-setting encoding buffer count if needed * Update ggml/src/ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* ggml : add metal backend registry / device ggml-ci * metal : fix names [no ci] * metal : global registry and device instances ggml-ci * cont : alternative initialization of global objects ggml-ci * llama : adapt to backend changes ggml-ci * fixes * metal : fix indent * metal : fix build when MTLGPUFamilyApple3 is not available ggml-ci * fix merge * metal : avoid unnecessary singleton accesses ggml-ci * metal : minor fix [no ci] * metal : g_state -> g_ggml_ctx_dev_main [no ci] * metal : avoid reference of device context in the backend context ggml-ci * metal : minor [no ci] * metal : fix maxTransferRate check * metal : remove transfer rate stuff --------- Co-authored-by: slaren <slarengh@gmail.com>

Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/bcef6817a8b2aa20a5a6dbb19b43e63c5bf8619a?narHash=sha256-HO4zgY0ekfwO5bX0QH/3kJ/h4KvUDFZg8YpkNwIbg1U%3D' (2024-09-12) → 'github:hercules-ci/flake-parts/3d04084d54bedc3d6b8b736c70ef449225c361b1?narHash=sha256-K5ZLCyfO/Zj9mPFldf3iwS6oZStJcU4tSpiXTMYaaL0%3D' (2024-10-01) • Updated input 'flake-parts/nixpkgs-lib': 'https://github.com/NixOS/nixpkgs/archive/356624c12086a18f2ea2825fed34523d60ccc4e3.tar.gz?narHash=sha256-Ss8QWLXdr2JCBPcYChJhz4xJm%2Bh/xjl4G0c0XlP6a74%3D' (2024-09-01) → 'https://github.com/NixOS/nixpkgs/archive/fb192fec7cc7a4c26d51779e9bab07ce6fa5597a.tar.gz?narHash=sha256-0xHYkMkeLVQAMa7gvkddbPqpxph%2BhDzdu1XdGPJR%2BOs%3D' (2024-10-01) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/1925c603f17fc89f4c8f6bf6f631a802ad85d784?narHash=sha256-J%2BPeFKSDV%2BpHL7ukkfpVzCOO7mBSrrpJ3svwBFABbhI%3D' (2024-09-26) → 'github:NixOS/nixpkgs/bc947f541ae55e999ffdb4013441347d83b00feb?narHash=sha256-NOiTvBbRLIOe5F6RbHaAh6%2B%2BBNjsb149fGZd1T4%2BKBg%3D' (2024-10-04) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android

…ganov#9752) * ggml : add backend registry / device interfaces to BLAS backend * fix mmap usage when using host buffers

Signed-off-by: Masanari Iida <standby24x7@gmail.com>

* server : more explicit endpoint access settings * protect /props endpoint * fix tests * update server docs * fix typo * fix tests

* ggml : do not use BLAS with types without to_float * ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies * ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits it's not really internal if everybody uses it

An updated version will be added in ggerganov#9787

* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci

ggml-ci

…#9930)

ggerganov#10352

* metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci

bandoti and others added 30 commits November 18, 2024 14:43

ggml-backend : add device description to CPU backend (ggerganov#9720)

417a60f

metal : fix compute pass descriptor autorelease crash (ggerganov#9718)

e49b066

ggml: refactor cross entropy loss CPU impl. (ggml/976)

1db66e4

ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980)

88c1e20

sync : ggml

a6ddb63

metal : remove abort (skip) (ggml/0)

57cc9ed

Fixed RNG seed docs (ggerganov#9723)

0ea000d

* Update README.md fixed RNG seed info * changed print format to unsigned

ci : fine-grant permission (ggerganov#9710)

052a4f5

ggml : fixes after sync (ggml/983)

41b26c1

ggml : remove test-backend-buffer ggml : fix CUDA build warnings

ggml : fix typo in example usage ggml_gallocr_new (ggml/984)

be46474

sync : ggml

c814a97

Add Llama Assistant (ggerganov#9744)

85a4276

metal : zero-init buffer contexts (whisper/0)

4254a77

sync : ggml

3d10f6a

rerank : use [SEP] token instead of [BOS] (ggerganov#9737)

3ee6af2

* rerank : use [SEP] token instead of [BOS] ggml-ci * common : sanity check for non-NULL tokens ggml-ci * ci : adjust rank score interval ggml-ci * ci : add shebang to run.sh ggml-ci

vulkan : retry allocation with fallback flags (whisper/2451)

9bc55e4

Co-authored-by: Samuel Morris <samuel.morris@artlist.io>

sync : llama.cpp

f14574e

readme : fix typo [no ci]

2cf7ec1

contrib : simplify + minor edits [no ci]

e81a97b

Update building for Android (ggerganov#9672)

5fdda9c

* docs : clarify building Android on Termux * docs : update building Android on Termux * docs : add cross-compiling for Android * cmake : link dl explicitly for Android

ggml : add backend registry / device interfaces to BLAS backend (gger…

817e714

…ganov#9752) * ggml : add backend registry / device interfaces to BLAS backend * fix mmap usage when using host buffers

scripts : fix spelling typo in messages and comments (ggerganov#9782)

62bc3ec

Signed-off-by: Masanari Iida <standby24x7@gmail.com>

server : better security control for public deployments (ggerganov#9776)

fde264c

* server : more explicit endpoint access settings * protect /props endpoint * fix tests * update server docs * fix typo * fix tests

examples : remove llama.vim

4f12202

An updated version will be added in ggerganov#9787

perplexity : fix integer overflow (ggerganov#9783)

77279b7

* perplexity : fix integer overflow ggml-ci * perplexity : keep n_vocab as int and make appropriate casts ggml-ci

ggerganov and others added 13 commits November 18, 2024 16:38

make : add ggml-opt (#0)

2d1d5c4

ggml-ci

ggml : adapt AMX to tensor->grad removal (#0)

ac4cea5

ggml-ci

ggml : inttypes.h -> cinttypes (#0)

3e16a42

ggml-ci

ggml : fix possible buffer use after free in sched reserve (ggerganov…

05475dd

…#9930)

CMake: default to -arch=native for CUDA build (ggerganov#10320)

d84419b

CUDA: remove DMMV, consolidate F16 mult mat vec (ggerganov#10318)

2a2d6aa

ggml : fix undefined reference to 'getcpu' (ggerganov#10354)

cc1e405

ggerganov#10352

gitignore : ignore local run scripts [no ci]

57cbdbc

llama : only use default buffer types for the KV cache (ggerganov#10358)

e0e284e

CMake: fix typo in comment [no ci] (ggerganov#10360)

7af08cb

CUDA: fix MMV kernel being used for FP16 src1 (ggerganov#10357)

1af6853

docker: use GGML_NATIVE=OFF (ggerganov#10368)

0ed117a

github-actions bot added documentation Improvements or additions to documentation SYCL ggml Kompute Apple Metal Nvidia GPU testing build examples devops python script android server nix labels Nov 19, 2024

fix for windows building

a979201

arthw merged commit 8dcc98f into master Nov 19, 2024
57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry 1118 #5

Cherry 1118 #5

arthw commented Nov 19, 2024

Cherry 1118 #5

Cherry 1118 #5

Conversation

arthw commented Nov 19, 2024