Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temp #26

Merged
merged 72 commits into from
Dec 1, 2024
Merged

Temp #26

Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
b756441
metal : minor code formatting
ggerganov Nov 25, 2024
f6d12e7
tests : fix compile warning
ggerganov Nov 25, 2024
5931c1f
ggml : add support for dynamic loading of backends (#10469)
slaren Nov 25, 2024
9ca2e67
server : add speculative decoding support (#10455)
ggerganov Nov 25, 2024
a9a678a
Add download chat feature to server chat (#10481)
brucepro Nov 25, 2024
1f92225
Github: update issue templates [no ci] (#10489)
JohannesGaessler Nov 25, 2024
10bce04
llama : accept a list of devices to use to offload a model (#10497)
slaren Nov 25, 2024
80acb7b
Rename Olmo1124 to Olmo2 (#10500)
2015aroras Nov 25, 2024
106964e
metal : enable mat-vec kernels for bs <= 4 (#10491)
ggerganov Nov 25, 2024
47f931c
server : enable cache_prompt by default (#10501)
ggerganov Nov 25, 2024
9fd8c26
server : add more information about error (#10455)
ggerganov Nov 25, 2024
50d5cec
ci : build docker images only once daily (#10503)
slaren Nov 25, 2024
0cc6375
Introduce llama-run (#10291)
ericcurtin Nov 25, 2024
0eb4e12
vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484)
sparkleholic Nov 26, 2024
7066b4c
CANN: RoPE and CANCAT operator optimization (#10488)
noemotiovon Nov 26, 2024
9a4b79b
CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454)
shen-shanshan Nov 26, 2024
811872a
speculative : simplify the implementation (#10504)
ggerganov Nov 26, 2024
84e1c33
server : fix parallel speculative decoding (#10513)
ggerganov Nov 26, 2024
25669aa
ggml-cpu: cmake add arm64 cpu feature check for macos (#10487)
chaxu01 Nov 26, 2024
c6807b3
ci : add ubuntu cuda build, build with one arch on windows (#10456)
slaren Nov 26, 2024
7db3846
ci : publish the docker images created during scheduled runs (#10515)
slaren Nov 26, 2024
ab96610
cmake : enable warnings in llama (#10474)
ggerganov Nov 26, 2024
0bbd226
restore the condistion to build & update pacakge when merge (#10507)
NeoZhangJianyu Nov 26, 2024
45abe0f
server : replace behave with pytest (#10416)
ngxson Nov 26, 2024
904109e
vulkan: fix group_norm (#10496)
jeffbolznv Nov 26, 2024
249cd93
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (…
yeahdongcn Nov 26, 2024
be0e350
Fix HIP flag inconsistency & build docs (#10524)
tristandruyen Nov 26, 2024
30ec398
llama : disable warnings for 3rd party sha1 dependency (#10527)
slaren Nov 26, 2024
5a349f2
ci : remove nix workflows (#10526)
slaren Nov 26, 2024
de50973
Add OLMo 2 model in docs (#10530)
2015aroras Nov 26, 2024
c9b00a7
ci : fix cuda releases (#10532)
slaren Nov 26, 2024
4a57d36
vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459)
jeffbolznv Nov 27, 2024
71a6498
vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506)
jeffbolznv Nov 27, 2024
249a790
vulkan: further optimize q5_k mul_mat_vec (#10479)
jeffbolznv Nov 27, 2024
5b3466b
vulkan: Handle GPUs with less shared memory (#10468)
jeffbolznv Nov 27, 2024
c31ed2a
vulkan: define all quant data structures in types.comp (#10440)
jeffbolznv Nov 27, 2024
9150f8f
Do not include arm_neon.h when compiling CUDA code (ggml/1028)
frankier Nov 26, 2024
fee824a
sync : ggml
ggerganov Nov 27, 2024
9e2301f
metal : fix group_norm support condition (#0)
ggerganov Nov 27, 2024
46c69e0
ci : faster CUDA toolkit installation method and use ccache (#10537)
slaren Nov 27, 2024
3ad5451
Add some minimal optimizations for CDNA (#10498)
IMbackK Nov 27, 2024
9f91251
common : fix duplicated file name with hf_repo and hf_file (#10550)
ngxson Nov 27, 2024
b742013
CANN: ROPE operator optimization (#10540)
noemotiovon Nov 28, 2024
605fa66
CANN: Fix SOC_TYPE compile bug (#10519)
leo-pony Nov 28, 2024
c6bc739
CANN: Update cann.md to display correctly in CLion (#10538)
HRXWEB Nov 28, 2024
2025fa6
kompute : improve backend to pass test_backend_ops (#10542)
slp Nov 28, 2024
c202cef
ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541)
FanShupei Nov 28, 2024
eea986f
cmake : fix ARM feature detection (#10543)
ggerganov Nov 28, 2024
76b27d2
ggml : fix row condition for i8mm kernels (#10561)
ggerganov Nov 28, 2024
e90688e
ci : fix tag name in cuda and hip releases (#10566)
slaren Nov 28, 2024
7281cf1
docs: fix outdated usage of llama-simple (#10565)
rand-fly Nov 28, 2024
8907193
common: fix warning message when no GPU found (#10564)
JohannesGaessler Nov 28, 2024
6c59567
server : (tests) don't use thread for capturing stdout/stderr, bump o…
ngxson Nov 28, 2024
4c0a95b
llama : add missing model types
ggerganov Nov 28, 2024
dc22344
ggml : remove redundant copyright notice + update authors
ggerganov Nov 28, 2024
678d799
llava: return false instead of exit (#10546)
tinglou Nov 29, 2024
f095a64
vulkan: get the first command buffer submitted sooner (#10499)
jeffbolznv Nov 29, 2024
938f608
CANN: RoPE operator optimization (#10563)
noemotiovon Nov 29, 2024
266b851
sycl : Reroute permuted mul_mats through oneMKL (#10408)
Alcpz Nov 29, 2024
0f77aae
sycl : offload of get_rows set to 0 (#10432)
Alcpz Nov 29, 2024
4b3242b
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580)
FanShupei Nov 29, 2024
f0678c5
ggml : fix I8MM Q4_1 scaling factor conversion (#10562)
ggerganov Nov 29, 2024
a3a3048
cleanup UI link list (#10577)
slaren Nov 29, 2024
3a8e9af
imatrix : support combine-only (#10492)
robbiemu Nov 29, 2024
b782e5c
server : add more test cases (#10569)
ngxson Nov 29, 2024
7cc2d2c
ggml : move AMX to the CPU backend (#10570)
slaren Nov 29, 2024
0533e7f
vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536)
netrunnereve Nov 30, 2024
abadba0
readme : refresh (#10587)
ggerganov Nov 30, 2024
3e0ba0e
readme : remove old badge
ggerganov Nov 30, 2024
0c39f44
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…
angt Nov 30, 2024
43957ef
build: update Makefile comments for C++ version change (#10598)
wangqin0 Dec 1, 2024
cf80952
Merge branch 'master' into Temp
apicalshark Dec 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ci : add ubuntu cuda build, build with one arch on windows (ggml-org#…
slaren authored Nov 26, 2024
commit c6807b3f28cc3dbfda3ec390bcb87e69fb5634e2
15 changes: 5 additions & 10 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
@@ -3,19 +3,18 @@ Kompute:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-kompute.h
- ggml/src/ggml-kompute.cpp
- ggml/src/ggml-kompute/**
- README-kompute.md
Apple Metal:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-metal.h
- ggml/src/ggml-metal.cpp
- ggml/src/ggml-metal/**
- README-metal.md
SYCL:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-sycl.h
- ggml/src/ggml-sycl.cpp
- ggml/src/ggml-sycl/**
- docs/backend/SYCL.md
- examples/sycl/**
@@ -27,8 +26,8 @@ Nvidia GPU:
Vulkan:
- changed-files:
- any-glob-to-any-file:
- ggml/ggml_vk_generate_shaders.py
- ggml/src/ggml-vulkan*
- ggml/include/ggml-vulkan.h
- ggml/src/ggml-vulkan/**
documentation:
- changed-files:
- any-glob-to-any-file:
@@ -75,11 +74,7 @@ server:
ggml:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml*.h
- ggml/src/ggml*.c
- ggml/src/ggml*.cpp
- ggml/src/ggml*.h
- ggml-cuda/**
- ggml/**
nix:
- changed-files:
- any-glob-to-any-file:
59 changes: 58 additions & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -871,8 +871,65 @@ jobs:
path: llama-${{ steps.tag.outputs.name }}-bin-win-${{ matrix.build }}.zip
name: llama-bin-win-${{ matrix.build }}.zip

ubuntu-latest-cmake-cuda:
runs-on: ubuntu-latest
container: nvidia/cuda:12.6.2-devel-ubuntu24.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Install dependencies
env:
DEBIAN_FRONTEND: noninteractive
run: |
apt update
apt install -y cmake build-essential ninja-build libgomp1 git

- name: Build with CMake
run: |
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=89-real -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined -DLLAMA_FATAL_WARNINGS=ON
cmake --build build

windows-latest-cmake-cuda:
runs-on: windows-latest

strategy:
matrix:
cuda: ['12.6.2']
build: ['cuda']

steps:
- name: Clone
id: checkout
uses: actions/checkout@v4

- name: Install CUDA toolkit
id: cuda-toolkit
uses: Jimver/cuda-toolkit@v0.2.19
with:
cuda: ${{ matrix.cuda }}
method: 'network'
sub-packages: '["nvcc", "cudart", "cublas", "cublas_dev", "thrust", "visual_studio_integration"]'

- name: Install Ninja
id: install_ninja
run: |
choco install ninja

- name: Build
id: cmake_build
shell: cmd
run: |
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
cmake -S . -B build -G "Ninja Multi-Config" -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DBUILD_SHARED_LIBS=ON -DGGML_RPC=ON -DCMAKE_CUDA_ARCHITECTURES=89-real
cmake --build build --config Release -t ggml-cuda
cmake --build build --config Release

windows-2019-cmake-cuda:
runs-on: windows-2019
if: ${{ github.event == 'push' && github.ref == 'refs/heads/master' }}

strategy:
matrix:
@@ -1173,7 +1230,7 @@ jobs:
- macOS-latest-make
- macOS-latest-cmake
- windows-latest-cmake
- windows-latest-cmake-cuda
- windows-2019-cmake-cuda
- windows-latest-cmake-hip-release
- macOS-latest-cmake-arm64
- macOS-latest-cmake-x64
2 changes: 2 additions & 0 deletions .github/workflows/nix-ci.yml
Original file line number Diff line number Diff line change
@@ -5,8 +5,10 @@ on:
push:
branches:
- master
paths: ['.github/workflows/nix-ci.yml', '**/flake.nix', '**/flake.lock', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal']
pull_request:
types: [opened, synchronize, reopened]
paths: ['.github/workflows/nix-ci.yml', '**/flake.nix', '**/flake.lock', '**/CMakeLists.txt', '**/Makefile', '**/*.h', '**/*.hpp', '**/*.c', '**/*.cpp', '**/*.cu', '**/*.cuh', '**/*.swift', '**/*.m', '**/*.metal']

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
9 changes: 8 additions & 1 deletion .github/workflows/python-lint.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
name: flake8 Lint

on: [push, pull_request]
on:
push:
branches:
- master
paths: ['.github/workflows/python-lint.yml', '**/*.py']
pull_request:
types: [opened, synchronize, reopened]
paths: ['.github/workflows/python-lint.yml', '**/*.py']

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}