Skip to content

Conversation

ggerganov
Copy link
Member

alt #15985, #16565

The gpuAddress property is not always available. Replace it with an atomic counter.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Oct 14, 2025
jjerphan added a commit to jjerphan/llama.cpp-feedstock that referenced this pull request Oct 14, 2025
See: ggml-org/llama.cpp#16576

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
@ggerganov ggerganov merged commit fa882fd into master Oct 14, 2025
66 of 70 checks passed
@ggerganov ggerganov deleted the gg/metal-avoid-gpu-address branch October 14, 2025 17:33
jjerphan added a commit to conda-forge/llama.cpp-feedstock that referenced this pull request Oct 14, 2025
See: ggml-org/llama.cpp#16576

Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
ddh0 added a commit to ddh0/llama.cpp that referenced this pull request Oct 14, 2025
* cuda : remove legacy copy-op pointer indirection code (ggml-org#16485)

* remove legacy copy-op pointer indirection code

* further removal of copy-op indirection code

* renamed check_node_graph_compatibility_and_refresh_copy_ops function

* CUDA: add fp kernel for larger batch size MoE (ggml-org#16512)

* CUDA: kernel for larger batch sizes for MoE

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* fixup

* tests

* Move mmq_ids_helper to mmid

* cleanup

* Remove redundant checks

* CUDA: use fastdiv + ggml_cuda_mad for mmvf (ggml-org#16557)

* CUDA: use fastdiv + ggml_cuda_mad for mmvf

* use bf16 directly + fix formatting

* Add exception for HIP code

* CUDA: enable FA for FP32 KV cache (ggml-org#16546)

* vulkan: Improve build time for MSVC (ggml-org#16545)

Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel.

Enable /MP so source files are compiled in parallel.

* vulkan: Support FA with K/V in F32 (ggml-org#16543)

* CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (ggml-org#16577)

* vulkan: Add ACC_TYPE_VEC2 implementation (ggml-org#16203)

Signed-off-by: Stefan Savic <stefan.savic@huawei.com>
Co-authored-by: Stefan Savic <stefan.savic@huawei.com>

* metal : avoid using Metal's gpuAddress property (ggml-org#16576)

* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check

---------

Signed-off-by: Stefan Savic <stefan.savic@huawei.com>
Co-authored-by: Anav Prasad <anavp@nvidia.com>
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
Co-authored-by: SavicStefan <50296686+SavicStefan@users.noreply.github.com>
Co-authored-by: Stefan Savic <stefan.savic@huawei.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Oct 15, 2025
* origin/master:
Add server-driven parameter defaults and syncing (ggml-org#16515)
metal: optimise `GGML_OP_SUM` (ggml-org#16559)
server : fix img token logs (ggml-org#16595)
llama-quant: add support for mmproj (ggml-org#16592)
CUDA: Changing the CUDA scheduling strategy to spin (ggml-org#16585)
server : fix mtmd checkpoints (ggml-org#16591)
metal : avoid using Metal's gpuAddress property (ggml-org#16576)
vulkan: Add ACC_TYPE_VEC2 implementation (ggml-org#16203)
CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (ggml-org#16577)
vulkan: Support FA with K/V in F32 (ggml-org#16543)
vulkan: Improve build time for MSVC (ggml-org#16545)
CUDA: enable FA for FP32 KV cache (ggml-org#16546)
CUDA: use fastdiv + ggml_cuda_mad for mmvf (ggml-org#16557)
CUDA: add fp kernel for larger batch size MoE (ggml-org#16512)
cuda : remove legacy copy-op pointer indirection code (ggml-org#16485)
server : dynamic token limit for prompt cache (ggml-org#16560)
kyano pushed a commit to kyano/llama.cpp that referenced this pull request Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant