Skip to content

[pull] main from abetlen:main #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1,180 commits into
base: main
Choose a base branch
from
Open

[pull] main from abetlen:main #1

wants to merge 1,180 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented May 22, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

Summary by CodeRabbit

  • New Features

    • Updated installation instructions for the llama-cpp-python project for better clarity and correctness.
    • Introduced daily updates for dependencies in the dependabot configuration.
    • New workflows added for building wheels with CUDA support and for macOS.
    • Added automation for deploying a wheels index to GitHub Pages.
    • Introduced a comprehensive grammar parsing system to enhance language processing capabilities.
    • Updated the versioning system to reflect the latest release.
    • Added the LlamaProxy class for enhanced management of multiple Llama models.
    • Expanded functionality descriptions in the README, including multi-modal support and function calling capabilities.
    • Enhanced high-level API with new methods for chat completion and embeddings.
    • Introduced a tokenizer framework for processing text into tokens and vice versa.
    • Added support for arm64 architecture in wheel building.
    • Enhanced error handling and logging mechanisms in the server application.
    • Introduced new endpoints for tokenization and detokenization in the API.
    • Added a new logging module for structured error handling in the FastAPI application.
  • Bug Fixes

    • Enhanced testing coverage for tokenization and completion functionalities in the llama_cpp library.
    • Addressed various performance issues and security vulnerabilities related to model handling.
  • Chores

    • Significant restructuring of the project configuration in pyproject.toml for improved dependency management.
    • Introduced a script for automating the generation of an HTML index for releases.
    • Expanded the .gitignore file to manage ignored files more effectively.
    • Updated the commit references for the llama.cpp subproject to include the latest enhancements.

abetlen and others added 13 commits May 13, 2024 09:39
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.17.0 to 2.18.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](pypa/cibuildwheel@v2.17.0...v2.18.0)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Special tokens are already mapped from metadata by llama.cpp
…1333)

* implement min_tokens

* set default to 0

* pass min_tokens

* fix

* remove copy

* implement MinTokensLogitsProcessor

* format

* fix condition
Copy link

coderabbitai bot commented May 22, 2024

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The recent updates to the llama-cpp-python project introduce enhancements in model management, CI/CD workflows, and project configuration. Key features include the addition of a LlamaProxy class for managing multiple models, updated installation instructions, improved dependency management through GitHub Actions, and the introduction of a tokenizer framework. These changes streamline the development process and enhance usability.

Changes

File/Path Change Summary
.github/workflows/build-and-release.yaml Updated OS matrix, action versions, and added steps for building wheels for arm64 architecture.
.github/workflows/build-wheels-cuda.yaml Introduced new workflow for building Python wheels with CUDA support.
.github/workflows/build-wheels-metal.yaml Added workflow for building wheels for different macOS versions.
.github/workflows/generate-index-from-release.yaml New workflow for generating an index from releases for GitHub Pages.
.github/workflows/publish-to-test.yaml Updated to include dynamic versioning and improved dependency installation steps.
.github/workflows/publish.yaml Enhanced dependency management and updated build commands.
.github/workflows/test-pypi.yaml Updated Python version matrix and installation steps for dependencies.
.github/workflows/test.yaml Added download-model job and updated build jobs to depend on it.
CHANGELOG.md Updated to reflect version change from 0.1.73 to 0.3.8, detailing new features and fixes.
CMakeLists.txt Restructured installation process for shared libraries and updated build configurations.
Makefile Modified build and dependency management targets, introducing new build configurations.
README.md Expanded documentation for installation, features, and usage examples.
examples/notebooks/Batching.ipynb New notebook demonstrating usage of the llama_cpp library.
llama_cpp/init.py Added version declaration for the module.
llama_cpp/_ctypes_extensions.py New file for loading shared libraries using ctypes.
llama_cpp/_ggml.py New internal module for working with ggml tensors.
llama_cpp/_internals.py New file providing wrappers for model management and context handling.
llama_cpp/_logger.py New logging module for structured error handling and logging.
llama_cpp/llama_cache.py New caching classes for RAM and disk-based caching.
llama_cpp/llama_grammar.py New file implementing a grammar parser for the Llama language.
llama_cpp/llama_tokenizer.py New tokenizer framework for processing text into tokens.
llama_cpp/server/app.py Enhanced FastAPI application with new features and improved request handling.
llama_cpp/server/errors.py New error handling module for standardized error responses.
llama_cpp/server/model.py Introduced LlamaProxy class for managing multiple models.
llama_cpp/server/settings.py New configuration management for model and server settings.
pyproject.toml Restructured project configuration, moving to a PEP 517/518 compliant format.
tests/test_llama.py Updated test suite with new test functions and improved structure for testing functionalities.
tests/test_llama_grammar.py New test file for the LlamaGrammar class with various grammar definitions.
vendor/llama.cpp Updated commit reference for the subproject, indicating enhancements.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LlamaProxy
    participant Model

    User->>LlamaProxy: Request Model
    LlamaProxy->>Model: Load/Access Model
    Model-->>LlamaProxy: Return Model Instance
    LlamaProxy-->>User: Provide Model
Loading

🐇 Hop along, with joy we cheer,
New features bloom, the path is clear!
With builds and docs, all set to play,
In the world of code, we leap and sway!
So gather 'round, let's celebrate,
For changes bright, oh, isn’t it great? 🌼


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@pull pull bot added the ⤵️ pull label May 22, 2024
abetlen and others added 22 commits December 6, 2024 07:36
* fix: correct issue with handling lock during streaming

move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse

* fix: simplify exit stack management for create_chat_completion and create_completion

* fix: correct missing `async with` and format code

* fix: remove unnecessary explicit use of AsyncExitStack

fix: correct type hints for body_model

---------

Co-authored-by: Andrei <abetlen@gmail.com>
* feat: Sync with llama.cpp

Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements.

* fix: Display performance metrics by default

---------

Co-authored-by: Andrei <abetlen@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (4)
llama_cpp/llama_cache.py (1)

28-32: 🛠️ Refactor suggestion

Mark _find_longest_prefix_key as abstract.
Since _find_longest_prefix_key in BaseLlamaCache has no implementation, marking it with @abstractmethod clarifies that child classes must override it.

- def _find_longest_prefix_key(
+ @abstractmethod
+ def _find_longest_prefix_key(
🧰 Tools
🪛 Ruff (0.8.2)

28-32: BaseLlamaCache._find_longest_prefix_key is an empty method in an abstract base class, but has no abstract decorator

(B027)

llama_cpp/server/model.py (1)

43-44: 🛠️ Refactor suggestion

Combine nested if statements for clarity.

Simplify the code by merging the nested if statements into a single if. This follows the Ruff SIM102 recommendation and improves readability.

Apply this diff:

-if model == self._current_model_alias:
-    if self._current_model is not None:
-        return self._current_model
+if model == self._current_model_alias and self._current_model is not None:
+    return self._current_model
🧰 Tools
🪛 Ruff (0.8.2)

43-44: Use a single if statement instead of nested if statements

(SIM102)

llama_cpp/llama_tokenizer.py (1)

11-11: 🛠️ Refactor suggestion

Remove overshadowing import.

List from llama_cpp.llama_types is redefining List from typing. Removing or renaming the import helps avoid confusion and potential conflicts.

- from llama_cpp.llama_types import List
🧰 Tools
🪛 Ruff (0.8.2)

11-11: Redefinition of unused List from line 5

(F811)

llama_cpp/server/app.py (1)

129-130: 🛠️ Refactor suggestion

Raise an explicit exception for missing settings.

Using assert for validation in production code can hide errors if assertions are disabled. Prefer explicit exceptions for clarity and maintainability.

- assert (
-     server_settings is not None and model_settings is not None
- ), "server_settings and model_settings must be provided together"
+ if server_settings is None or model_settings is None:
+     raise ValueError("Both server_settings and model_settings must be provided together")
🧹 Nitpick comments (31)
llama_cpp/_logger.py (1)

29-43: Further refine the llama_log_callback.

  1. The global _last_log_level might introduce thread-safety concerns under heavy parallel usage. You may consider thread-local storage or synchronization if concurrency is planned.
  2. The # TODO comment suggests additional work is needed to implement “continue previous log” logic. A possible approach is buffering partial logs for level 5 messages.
  3. Printing directly to stderr is valid for strictly console-based logs, but you might prefer using logger.log(log_level, ...) for production to leverage logging handlers and formats.
-if logger.level <= GGML_LOG_LEVEL_TO_LOGGING_LEVEL[level]:
-    print(text.decode("utf-8"), end="", flush=True, file=sys.stderr)
+if logger.isEnabledFor(log_level):
+    logger.log(log_level, text.decode("utf-8"), stacklevel=2)
llama_cpp/llama_cache.py (1)

70-72: Remove unnecessary .keys() when iterating.
Iterating over self.cache_state directly is more efficient than calling .keys().

- for k in self.cache_state.keys()
+ for k in self.cache_state
🧰 Tools
🪛 Ruff (0.8.2)

71-71: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

examples/notebooks/Batching.ipynb (1)

1-464: Notebook demonstration is comprehensive.
The example thoroughly showcases model loading, tokenization, and decoding. A few optional enhancements:

  • Add commentary about potential memory constraints when loading large models.
  • Consider capturing and handling exceptions in the sampling loop to avoid partial outputs.
llama_cpp/_ctypes_extensions.py (1)

126-129: Complete or clarify the _byref function.
The _byref function implementation is empty. If offset support is needed, consider implementing it or remove the parameter to avoid confusion.

tests/test_llama_grammar.py (1)

12-12: Use or remove the unused grammar variables.
Currently, grammar is assigned but not utilized at lines 12, 50, and 76. You can either remove the assignment or re-enable the commented-out assertions to make use of the variable:

 def test_grammar_from_string():
-    grammar = llama_cpp.LlamaGrammar.from_string(tree)
-    # assert grammar._n_rules == 3
-    # assert grammar._start_rule_index == 2
-    # assert grammar.grammar is not None
+    grammar = llama_cpp.LlamaGrammar.from_string(tree)
+    assert grammar._n_rules == 3
+    assert grammar._start_rule_index == 2
+    assert grammar.grammar is not None

Also applies to: 50-50, 76-76

🧰 Tools
🪛 Ruff (0.8.2)

12-12: Local variable grammar is assigned to but never used

Remove assignment to unused variable grammar

(F841)

.github/workflows/test-pypi.yaml (1)

25-25: Remove trailing spaces to address YAMLlint errors.
These lines contain trailing spaces flagged by YAMLlint. Please remove them to resolve the formatting errors:

-    RUST_LOG=trace python -m uv pip install llama-cpp-python[all] --verbose 
+    RUST_LOG=trace python -m uv pip install llama-cpp-python[all] --verbose

Also applies to: 31-31, 35-35, 37-37, 55-55, 61-61, 67-67, 71-71, 73-73, 90-90, 97-97, 103-103, 107-107, 109-109

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 25-25: trailing spaces

(trailing-spaces)

.github/workflows/build-wheels-cuda.yaml (1)

5-7: Consider reducing GitHub Actions permissions
You have set contents: write under permissions. If read-only access or a more restricted scope suffices for your workflow, consider lowering these permissions to minimize security risks.

.github/workflows/build-wheels-metal.yaml (1)

26-26: Remove trailing spaces
Lines 26, 38, and 72 have trailing whitespace, which can trigger lint warnings and reduce clarity.

-  ...
+  ...

Also applies to: 38-38, 72-72

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 26-26: trailing spaces

(trailing-spaces)

llama_cpp/llama_grammar.py (3)

29-36: Refine exception handling
Catching broad Exceptions can obscure specific errors. Consider handling more precise exceptions (OSError, IOError, etc.) to aid debugging.

-        except Exception as err:
+        except (OSError, IOError) as err:

42-43: Check for a minor spelling error
The phrase "params_grammer" appears to be a typographical error. Changing it to "params_grammar" may clarify the message.

-        f"{cls.from_file.__name__}: error parsing grammar file: params_grammer is empty"
+        f"{cls.from_file.__name__}: error parsing grammar file: params_grammar is empty"

352-352: Follow through on the TODO for URI/Email formats
Support for "uri" and "email" string formats remains unimplemented. If you need this feature, I can help you flesh it out.

tests/test_llama.py (3)

1-1: Remove unused imports.

According to the static analysis hints, the ctypes, numpy, and scipy.special.log_softmax imports are unused. Removing them helps maintain cleanliness and reduce confusion.

Proposed patch:

-import ctypes
 import multiprocessing

-import numpy as np
-from scipy.special import log_softmax
 from huggingface_hub import hf_hub_download

Also applies to: 4-5

🧰 Tools
🪛 Ruff (0.8.2)

1-1: ctypes imported but unused

Remove unused import: ctypes

(F401)


22-35: Consider avoiding direct access to private context attributes.

The _ctx.ctx attribute is presumably private and might change in future library updates. Relying on public APIs would increase test stability.


117-219: Consider splitting into smaller test functions for clarity.

test_real_llama covers multiple behaviors (completions, grammar usage, seed setting, state saving/loading). Splitting these into smaller, more focused tests would improve readability and maintainability.

.github/workflows/test.yaml (1)

42-42: Remove trailing spaces.

Trailing spaces often trigger linting errors. Consider removing them for a cleaner codestyle.

Sample diff (showing one line as example; repeat for each flagged line):

-  submodules: "recursive"  
+  submodules: "recursive"

Also applies to: 73-73, 92-92, 107-107, 125-125, 145-145

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 42-42: trailing spaces

(trailing-spaces)

.github/workflows/publish-to-test.yaml (1)

22-22: Remove trailing spaces for lint compliance.

Trimming trailing whitespace helps avoid lint warnings and maintains clean diffs.

Example diff:

-  submodules: "recursive"  
+  submodules: "recursive"

Also applies to: 28-28, 35-35, 47-47, 53-53, 57-57

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 22-22: trailing spaces

(trailing-spaces)

llama_cpp/server/errors.py (2)

150-153: Consider using logging instead of direct stderr printing.

Instead of printing error messages to stderr, it might be cleaner and more flexible to utilize a logging library. This would allow you to configure log levels, formats, and destinations more systematically.

- print(f"Exception: {str(error)}", file=sys.stderr)
- traceback.print_exc(file=sys.stderr)
+ import logging
+ logger = logging.getLogger(__name__)
+ logger.exception("Exception occurred")

162-213: Add or expand unit tests.

The custom RouteErrorHandler introduces specialized logic for error handling, but it’s not clear if there is dedicated testing to cover:

  • Regex pattern matching for context length and model not found errors
  • General fallback cases
  • Request body parsing

Would you like help adding tests to ensure robust coverage for these scenarios?

Makefile (2)

8-10: Add version pinning or constraints to prevent unexpected upgrades.

Installing dependencies with --upgrade can lead to unexpected behavior if major versions introduce breaking changes. Pinning or constraining versions in a requirements file can help maintain predictable builds.

 deps:
   python3 -m pip install --upgrade pip
-  python3 -m pip install -e ".[all]"
+  python3 -m pip install --upgrade --requirement requirements.txt

15-23: Consider unifying debug build configuration with an environment variable.

Multiple debug-oriented CMake arguments are scattered across the build.debug target. Centralizing these debug flags can simplify maintenance and reduce duplication.

 build.debug:
   python3 -m pip install \
     --verbose \
     --config-settings=cmake.verbose=true \
     --config-settings=logging.level=INFO \
     --config-settings=install.strip=false  \
-    --config-settings=cmake.args="-DCMAKE_BUILD_TYPE=Debug;-DCMAKE_C_FLAGS='-ggdb -O0';-DCMAKE_CXX_FLAGS='-ggdb -O0'" \
+    --config-settings=cmake.args="$DEBUG_BUILD_ARGS" \
     --editable .
README.md (1)

62-62: Add a hyphen to "backend specific" when used as a modifier.

- as well as backend specific options
+ as well as backend-specific options
🧰 Tools
🪛 LanguageTool

[grammar] ~62-~62: Possible agreement error. The noun hardware seems to be uncountable; consider using: “much hardware”, “a good deal of hardware”.
Context: ...ion Configuration llama.cpp supports a number of hardware acceleration backends to speed up infer...

(MANY_NN_U)


[uncategorized] ~62-~62: Possible missing comma found.
Context: ...dware acceleration backends to speed up inference as well as backend specific options. Se...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~62-~62: When ‘backend-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ...ckends to speed up inference as well as backend specific options. See the [llama.cpp README](htt...

(SPECIFIC_HYPHEN)

.github/workflows/build-and-release.yaml (2)

37-37: Remove trailing spaces.

Line 37 has trailing whitespace, which may break stricter YAML parsers.

Apply this diff to remove the trailing space:

-        os: [ubuntu-20.04, windows-2019, macos-13] 
+        os: [ubuntu-20.04, windows-2019, macos-13]
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 37-37: trailing spaces

(trailing-spaces)


112-112: Remove trailing spaces.

Line 112 has trailing whitespace, which may cause lint failures.

Apply this diff:

-          RUST_LOG: trace        
+          RUST_LOG: trace
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 112-112: trailing spaces

(trailing-spaces)

llama_cpp/server/model.py (1)

188-188: Use a context manager when opening files.

Use a with statement to ensure the file is correctly closed after reading.

Apply this diff:

-json.load(open(settings.hf_tokenizer_config_path))
+with open(settings.hf_tokenizer_config_path) as f:
+    json.load(f)
🧰 Tools
🪛 Ruff (0.8.2)

188-188: Use a context manager for opening files

(SIM115)

llama_cpp/llava_cpp.py (1)

42-42: Validate library loading exceptions.

Although loading the shared library is handled by load_shared_library, consider wrapping it in a try-except block to raise more explicit errors for easier debugging.

CMakeLists.txt (2)

5-6: Add simple documentation for these build options.

The new LLAMA_BUILD and LLAVA_BUILD options are a great addition for controlling library builds. Consider adding inline comments or README documentation to explain their intended usage.


8-46: Streamline repeated library installation logic.

The function llama_cpp_python_install_target is nicely defined to handle installation. However, consider reducing repetition across UNIX, Apple, and Windows sections by factoring out the target property settings (RPATH logic, etc.) into helper snippets if more targets are added in the future.

pyproject.toml (1)

31-62: Check optional dependencies for correct grouping and coverage.

Restructuring dependencies under [project.optional-dependencies] is straightforward. Double-check that each group (server, test, dev, and all) fully mirrors what was previously in Poetry to avoid missing or redundant packages.

llama_cpp/_internals.py (3)

359-446: Reflect on the multi-sampler design.

All sample_* methods raise NotImplementedError for strategies like top-k, top-p, or grammar sampling. If these functionalities are critical for advanced usage, consider either removing the stubs until implemented or returning a helpful error message describing the current limitations.

🧰 Tools
🪛 Ruff (0.8.2)

359-359: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


365-365: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


371-371: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


378-378: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


385-385: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


391-391: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


401-401: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


419-419: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


433-433: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


440-440: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


465-525: Review LlamaBatch memory usage.

You are creating a llama_batch instance for up to n_seq_max sequences. If n_seq_max is large, consider whether partial allocations (or on-demand expansions) might be more efficient. Also, double-check that the final token is always marked with logits=True only when appropriate.


720-720: Remove redundant import statements.

“from typing import List, Callable, Optional, Union” at this line conflicts with earlier imports and triggers redefinition warnings. Consider removing the repeated references to keep the import list clean.

-from typing import List, Callable, Optional, Union
+from typing import Callable, Union
🧰 Tools
🪛 Ruff (0.8.2)

720-720: Redefinition of unused List from line 8

Remove definition: List

(F811)


720-720: Redefinition of unused Optional from line 10

Remove definition: Optional

(F811)


720-720: typing.Union imported but unused

Remove unused import: typing.Union

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9769e57 and 344c106.

📒 Files selected for processing (31)
  • .github/workflows/build-and-release.yaml (1 hunks)
  • .github/workflows/build-wheels-cuda.yaml (1 hunks)
  • .github/workflows/build-wheels-metal.yaml (1 hunks)
  • .github/workflows/generate-index-from-release.yaml (1 hunks)
  • .github/workflows/publish-to-test.yaml (1 hunks)
  • .github/workflows/publish.yaml (1 hunks)
  • .github/workflows/test-pypi.yaml (1 hunks)
  • .github/workflows/test.yaml (1 hunks)
  • CHANGELOG.md (1 hunks)
  • CMakeLists.txt (1 hunks)
  • Makefile (3 hunks)
  • README.md (3 hunks)
  • examples/notebooks/Batching.ipynb (1 hunks)
  • llama_cpp/__init__.py (1 hunks)
  • llama_cpp/_ctypes_extensions.py (1 hunks)
  • llama_cpp/_ggml.py (1 hunks)
  • llama_cpp/_internals.py (1 hunks)
  • llama_cpp/_logger.py (1 hunks)
  • llama_cpp/llama_cache.py (1 hunks)
  • llama_cpp/llama_grammar.py (1 hunks)
  • llama_cpp/llama_tokenizer.py (1 hunks)
  • llama_cpp/llama_types.py (3 hunks)
  • llama_cpp/llava_cpp.py (1 hunks)
  • llama_cpp/server/app.py (1 hunks)
  • llama_cpp/server/errors.py (1 hunks)
  • llama_cpp/server/model.py (1 hunks)
  • llama_cpp/server/settings.py (1 hunks)
  • pyproject.toml (1 hunks)
  • tests/test_llama.py (1 hunks)
  • tests/test_llama_grammar.py (1 hunks)
  • vendor/llama.cpp (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • vendor/llama.cpp
  • llama_cpp/init.py
  • llama_cpp/_ggml.py
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test-pypi.yaml

36-36: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)


108-108: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)

.github/workflows/publish.yaml

40-40: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)

.github/workflows/publish-to-test.yaml

52-52: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)

.github/workflows/build-and-release.yaml

118-118: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)

🪛 YAMLlint (1.35.1)
.github/workflows/test-pypi.yaml

[error] 25-25: trailing spaces

(trailing-spaces)


[error] 31-31: trailing spaces

(trailing-spaces)


[error] 35-35: trailing spaces

(trailing-spaces)


[error] 37-37: trailing spaces

(trailing-spaces)


[error] 55-55: trailing spaces

(trailing-spaces)


[error] 61-61: trailing spaces

(trailing-spaces)


[error] 67-67: trailing spaces

(trailing-spaces)


[error] 71-71: trailing spaces

(trailing-spaces)


[error] 73-73: trailing spaces

(trailing-spaces)


[error] 90-90: trailing spaces

(trailing-spaces)


[error] 97-97: trailing spaces

(trailing-spaces)


[error] 103-103: trailing spaces

(trailing-spaces)


[error] 107-107: trailing spaces

(trailing-spaces)


[error] 109-109: trailing spaces

(trailing-spaces)

.github/workflows/test.yaml

[error] 42-42: trailing spaces

(trailing-spaces)


[error] 73-73: trailing spaces

(trailing-spaces)


[error] 92-92: trailing spaces

(trailing-spaces)


[error] 107-107: trailing spaces

(trailing-spaces)


[error] 125-125: trailing spaces

(trailing-spaces)


[error] 145-145: trailing spaces

(trailing-spaces)

.github/workflows/build-wheels-metal.yaml

[error] 26-26: trailing spaces

(trailing-spaces)


[error] 38-38: trailing spaces

(trailing-spaces)


[error] 72-72: trailing spaces

(trailing-spaces)

.github/workflows/publish-to-test.yaml

[error] 22-22: trailing spaces

(trailing-spaces)


[error] 28-28: trailing spaces

(trailing-spaces)


[error] 35-35: trailing spaces

(trailing-spaces)


[error] 47-47: trailing spaces

(trailing-spaces)


[error] 53-53: trailing spaces

(trailing-spaces)


[error] 57-57: trailing spaces

(trailing-spaces)

.github/workflows/build-and-release.yaml

[error] 37-37: trailing spaces

(trailing-spaces)


[error] 112-112: trailing spaces

(trailing-spaces)

🪛 Ruff (0.8.2)
llama_cpp/_ctypes_extensions.py

69-69: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

tests/test_llama_grammar.py

12-12: Local variable grammar is assigned to but never used

Remove assignment to unused variable grammar

(F841)


50-50: Local variable grammar is assigned to but never used

Remove assignment to unused variable grammar

(F841)


76-76: Local variable grammar is assigned to but never used

Remove assignment to unused variable grammar

(F841)

llama_cpp/server/model.py

43-44: Use a single if statement instead of nested if statements

(SIM102)


188-188: Use a context manager for opening files

(SIM115)

llama_cpp/llama_cache.py

14-14: from .llama_types import * used; unable to detect undefined names

(F403)


28-32: BaseLlamaCache._find_longest_prefix_key is an empty method in an abstract base class, but has no abstract decorator

(B027)


71-71: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

tests/test_llama.py

1-1: ctypes imported but unused

Remove unused import: ctypes

(F401)


4-4: numpy imported but unused

Remove unused import: numpy

(F401)


5-5: scipy.special.log_softmax imported but unused

Remove unused import: scipy.special.log_softmax

(F401)

llama_cpp/_internals.py

19-19: from .llama_types import * used; unable to detect undefined names

(F403)


334-334: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


352-352: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


359-359: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


365-365: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


371-371: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


378-378: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


385-385: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


391-391: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


401-401: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


419-419: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


433-433: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


440-440: _LlamaTokenDataArray may be undefined, or defined from star imports

(F405)


720-720: Redefinition of unused List from line 8

Remove definition: List

(F811)


720-720: Redefinition of unused Optional from line 10

Remove definition: Optional

(F811)


720-720: typing.Union imported but unused

Remove unused import: typing.Union

(F401)


722-722: Redefinition of unused llama_cpp from line 23

(F811)


727-727: typing may be undefined, or defined from star imports

(F405)

llama_cpp/server/app.py

242-242: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


368-368: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


410-479: Do not perform function call Body in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


536-536: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


563-563: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


578-578: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


593-593: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

llama_cpp/llama_tokenizer.py

11-11: Redefinition of unused List from line 5

(F811)


113-116: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🪛 LanguageTool
CHANGELOG.md

[grammar] ~24-~24: The operating system from Apple is written “macOS”.
Context: ...f05f - fix(ci): Fix release by updating macos runner image to non-deprecated version ...

(MAC_OS)


[grammar] ~29-~29: The operating system from Apple is written “macOS”.
Context: ...## [0.3.4] - fix(ci): Build wheels for macos 13-15, cuda 12.1-12.4 by @abetlen in ca...

(MAC_OS)


[grammar] ~132-~132: The operating system from Apple is written “macOS”.
Context: ...4c25dd9f6a421cacec7604c6 - fix(ci): Fix MacOS release, use macos-12 image instead of ...

(MAC_OS)


[grammar] ~138-~138: Did you mean “too False to”?
Context: ...ate default config value for embeddings to False to fix error in text generation where logi...

(TOO_ADJECTIVE_TO)


[typographical] ~288-~288: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal by @abetlen, @jllllll, and ...

(DASH_RULE)


[duplication] ~297-~297: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~297-~297: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...

(MAC_OS)


[duplication] ~299-~299: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~450-~450: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~642-~642: “Windows” (operating system by Microsoft) is a proper noun and needs to be capitalized.
Context: ...ed runtime dlls to package directory on windows by @abetlen in 8d75016...

(A_WINDOWS)


[uncategorized] ~721-~721: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...

(BASED_HYPHEN)


[grammar] ~762-~762: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...

(MD_BE)


[style] ~762-~762: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...

(MISSING_IT_THERE)


[uncategorized] ~812-~812: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Truncate max_tokens in create_completion so requested tokens doesn't exceed context...

(COMMA_COMPOUND_SENTENCE_2)


[grammar] ~819-~819: The operating system from Apple is written “macOS”.
Context: ...th .so and .dylib for libllama on MacOS ## [v0.1.58] - (llama.cpp) Metal Sili...

(MAC_OS)

README.md

[grammar] ~39-~39: The operating system from Apple is written “macOS”.
Context: ...Windows: Visual Studio or MinGW - MacOS: Xcode To install the package, run: `...

(MAC_OS)


[grammar] ~62-~62: Possible agreement error. The noun hardware seems to be uncountable; consider using: “much hardware”, “a good deal of hardware”.
Context: ...ion Configuration llama.cpp supports a number of hardware acceleration backends to speed up infer...

(MANY_NN_U)


[uncategorized] ~62-~62: Possible missing comma found.
Context: ...dware acceleration backends to speed up inference as well as backend specific options. Se...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~62-~62: When ‘backend-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ...ckends to speed up inference as well as backend specific options. See the [llama.cpp README](htt...

(SPECIFIC_HYPHEN)


[grammar] ~237-~237: The operating system from Apple is written “macOS”.
Context: ....

### MacOS Notes Detailed MacOS Metal GPU install documentation is avai...

(MAC_OS)


[grammar] ~237-~237: The operating system from Apple is written “macOS”.
Context: ...mentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedoc...

(MAC_OS)


[uncategorized] ~291-~291: Did you mean: “By default,”?
Context: ...ll create_completion print(output) ``` By default llama-cpp-python generates completion...

(BY_DEFAULT_COMMA)


[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Llama.create_completion) methods of the Llama class. ### Pulling models from Huggin...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[uncategorized] ~330-~330: Did you mean: “By default,”?
Context: ...="*q8_0.gguf", verbose=False ) ``` By default [from_pretrained](https://llama-cpp-p...

(BY_DEFAULT_COMMA)


[uncategorized] ~338-~338: The abbreviation “i.e.” (= that is) requires two periods.
Context: ...this using pre-registered chat formats (ie. chatml, llama-2, gemma, etc) or b...

(I_E)


[style] ~338-~338: In American English, abbreviations like “etc.” require a period.
Context: ...mats (ie. chatml, llama-2, gemma, etc) or by providing a custom chat handler ...

(ETC_PERIOD)


[duplication] ~340-~340: Possible typo: you repeated a word.
Context: ... custom chat handler object. The model will will format the messages into a single promp...

(ENGLISH_WORD_REPEAT_RULE)


[uncategorized] ~367-~367: Possible missing comma found.
Context: ...Llama.create_chat_completion_openai_v1) method which will return pydantic models inste...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~372-~372: Possible missing comma found.
Context: ...s to only valid JSON or a specific JSON Schema use the response_format argument in [...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~398-~398: Possible missing comma found.
Context: ...the response further to a specific JSON Schema add the schema to the schema property...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~425-~425: Do not mix variants of the same word (‘pre-train’ and ‘pretrain’) within a single text.
Context: ...s is possible through the functionary pre-trained models chat format or through the gener...

(EN_WORD_COHERENCY)


[style] ~475-~475: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...://huggingface.co/meetkai). Functionary is able to intelligently call functions and also a...

(BE_ABLE_TO)


[uncategorized] ~477-~477: Possible missing article found.
Context: ...the default llama.cpp tokenizer used in Llama class. The tokenizer files are already ...

(AI_HYDRA_LEO_MISSING_THE)


[uncategorized] ~490-~490: Possible missing comma found.
Context: ...ide the default system messages used in Functionary as they are added automatically in the ...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~603-~603: Possible missing comma found.
Context: ...llama-cpp-python supports speculative decoding which allows the model to generate comp...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~621-~621: Possible missing comma found.
Context: ...) ``` ### Embeddings To generate text embeddings use [create_embedding](https://llama-...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~637-~637: A comma may be missing after the conjunctive/linking adverb ‘Thus’.
Context: ...s, one for each token in each sequence. Thus the dimensionality of the return type w...

(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)


[uncategorized] ~639-~639: Possible missing comma found.
Context: ... oriented model to yield sequence level embeddings is currently not possible, but you can ...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~645-~645: Consider using an alternative to strengthen your wording.
Context: ...our requirements. For instance, if you want to work with larger contexts, you can e...

(WANT_KEEN)


[style] ~654-~654: In American English, abbreviations like “etc.” require a period.
Context: ...e client (language libraries, services, etc). To install the server package and ge...

(ETC_PERIOD)


[uncategorized] ~681-~681: Possible missing article found.
Context: ...will format the prompt according to how model expects it. You can find the prompt for...

(AI_HYDRA_LEO_MISSING_THE)


[style] ~736-~736: Consider using a different verb to strengthen your wording.
Context: ...ama-cpp-python.readthedocs.io/). If you find any issues with the documentation, plea...

(FIND_ENCOUNTER)


[style] ~785-~785: This expression is wordy and can make your writing hard to follow, try rephrasing to be more direct.
Context: ...install from source as described above. The reason for this is that llama.cpp is built with compiler opti...

(THE_REASON_FOR_THIS)


[style] ~787-~787: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...bling these optimizations or supporting a large number of pre-built binaries for each platform. ...

(LARGE_NUMBER_OF)


[formatting] ~788-~788: Use a comma after introducing a concessive statement.
Context: ... pre-built binaries for each platform. That being said there are some pre-built binaries avail...

(EVEN_SO_PREMIUM)


[uncategorized] ~791-~791: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...binaries and wheels for common platforms and I'm happy to accept any useful contribu...

(COMMA_COMPOUND_SENTENCE)

🪛 markdownlint-cli2 (0.17.2)
README.md

2-2: Images should have alternate text (alt text)
null

(MD045, no-alt-text)


5-5: Multiple spaces after hash on atx style heading
null

(MD019, no-multiple-space-atx)


13-13: No empty links
null

(MD042, no-empty-links)


20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


24-24: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


35-35: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


36-36: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


37-37: Unordered list indentation
Expected: 2; Actual: 6

(MD007, ul-indent)


38-38: Unordered list indentation
Expected: 2; Actual: 6

(MD007, ul-indent)


39-39: Unordered list indentation
Expected: 2; Actual: 6

(MD007, ul-indent)


51-51: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)


124-124: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)


161-161: Emphasis used instead of a heading
null

(MD036, no-emphasis-as-heading)


341-341: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


342-342: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


343-343: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


344-344: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)

🔇 Additional comments (82)
llama_cpp/_logger.py (6)

1-4: Clean and necessary imports.
These imports are essential for handling standard IO, C function callbacks, and Python logging. No issues found here.


5-6: Consistent module import.
Importing llama_cpp at the top-level is aligned with Python's standard practices. Looks good.


15-22: Clear mapping of ggml to logging levels.
The dictionary correctly maps the ggml log levels to Python logging constants. This setup is straightforward and should handle the defined log levels well.


24-25: Logger naming.
Using "llama-cpp-python" as the logger name is descriptive and consistent. No issues.


26-27: Initialization of _last_log_level.
Storing the initial log level in a global variable is fine here, but be mindful of concurrency or re-entrancy if the library grows in scope.


46-48: Toggle verbosity with set_verbose.
Switching between logging.DEBUG and logging.ERROR is a clear, minimal approach to adjusting verbosity.

CHANGELOG.md (8)

132-132: Capitalize “macOS”.
This line refers to “MacOS” instead of “macOS.” Properly spelling Apple’s operating system helps maintain consistency.

- fix(ci): Fix MacOS release, use macos-12 image instead of ...
+ fix(ci): Fix macOS release, use macos-12 image instead of ...
🧰 Tools
🪛 LanguageTool

[grammar] ~132-~132: The operating system from Apple is written “macOS”.
Context: ...4c25dd9f6a421cacec7604c6 - fix(ci): Fix MacOS release, use macos-12 image instead of ...

(MAC_OS)


297-297: Eliminate repeated word “by”.
The phrase “hosted by by” is duplicated. Remove the extra “by” to correct the sentence.

- - fix: Changed local API doc references to hosted by by @lawfordp2017 in #1317
+ - fix: Changed local API doc references to hosted by @lawfordp2017 in #1317
🧰 Tools
🪛 LanguageTool

[duplication] ~297-~297: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~297-~297: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...

(MAC_OS)


299-299: Avoid repeated “by by”.
Ensure that repeated words are removed in all related references.

- fix: Changed local API doc references to hosted by by @lawfordp2017 in #1317
+ fix: Changed local API doc references to hosted by @lawfordp2017 in #1317
🧰 Tools
🪛 LanguageTool

[duplication] ~299-~299: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...

(ENGLISH_WORD_REPEAT_RULE)


450-450: Remove repeated word “Add”.
Repeating “Add” (“Add add_generation_prompt”) is likely a typo.

- feat: Add add_generation_prompt option for jinja2chatformatter
+ feat: Add generation_prompt option for jinja2chatformatter
🧰 Tools
🪛 LanguageTool

[duplication] ~450-~450: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...

(ENGLISH_WORD_REPEAT_RULE)


642-642: Capitalize “Windows”.
“windows” should be capitalized to “Windows” since it is a proper noun.

- Install required runtime dlls to package directory on windows
+ Install required runtime dlls to package directory on Windows
🧰 Tools
🪛 LanguageTool

[grammar] ~642-~642: “Windows” (operating system by Microsoft) is a proper noun and needs to be capitalized.
Context: ...ed runtime dlls to package directory on windows by @abetlen in 8d75016...

(A_WINDOWS)


721-721: Hyphenate “Grammar-based”.
The expression “Grammar based” should be hyphenated for clarity.

- Grammar based sampling via LlamaGrammar
+ Grammar-based sampling via LlamaGrammar
🧰 Tools
🪛 LanguageTool

[uncategorized] ~721-~721: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...

(BASED_HYPHEN)


762-762: Refine sentence structure.
“can are now interrupted” is grammatically incorrect. Use “can now be interrupted prematurely”.

- Streaming requests can are now interrupted pre-maturely
+ Streaming requests can now be interrupted prematurely
🧰 Tools
🪛 LanguageTool

[grammar] ~762-~762: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...

(MD_BE)


[style] ~762-~762: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...

(MISSING_IT_THERE)


819-819: Capitalize “macOS”.
“MacOS” should be spelled “macOS.”

- Support both `.so` and `.dylib` for `libllama` on MacOS
+ Support both `.so` and `.dylib` for `libllama` on macOS
🧰 Tools
🪛 LanguageTool

[grammar] ~819-~819: The operating system from Apple is written “macOS”.
Context: ...th .so and .dylib for libllama on MacOS ## [v0.1.58] - (llama.cpp) Metal Sili...

(MAC_OS)

llama_cpp/llama_cache.py (2)

14-14: Avoid wildcard imports.
Using from .llama_types import * can lead to naming conflicts and less maintainable code. Consider explicitly importing only the required names.

- from .llama_types import *
+ from .llama_types import LlamaState, LlamaParams  # or the specific types needed
🧰 Tools
🪛 Ruff (0.8.2)

14-14: from .llama_types import * used; unable to detect undefined names

(F403)


145-155: Use logging instead of debug prints.
These print statements appear to be for debugging. Consider removing them or switching to the configured logger for consistency in logging behavior.

- print("LlamaDiskCache.__setitem__: called", file=sys.stderr)
- print("LlamaDiskCache.__setitem__: delete", file=sys.stderr)
- print("LlamaDiskCache.__setitem__: set", file=sys.stderr)
- print("LlamaDiskCache.__setitem__: trim", file=sys.stderr)
+ import logging
+ logging.debug("LlamaDiskCache.__setitem__: called")
# ... and so on
llama_cpp/llama_types.py (8)

1-8: Docstring additions look good.
These lines provide helpful context about the file’s purpose and align with best practices for maintainable code.


14-17: Consider a more structured JSON representation.
Replacing the recursive JsonType with Any-based union offers flexibility, but it may lose nested type safety. If you need deeper type constraints, consider a partial schema or typed config classes to balance flexibility with validation.


25-28: Verify that embedding shape variations are intentional.
Using Union[List[float], List[List[float]]] can introduce complexity if consumers expect a single dimension. Confirm that multi-dimensional support is indeed required.


72-77: Clarify removal timeline for deprecated fields.
You marked function_call as DEPRECATED. Outline specific removal plans or confirm if backward compatibility is still needed.


82-82: Offer assistance for the TODO comment.
This TODO indicates that parameters needs more specific constraints.

Would you like assistance in refining the parameters field to be more specific?


166-170: Offer assistance for the repeated TODO comment.
This line mirrors the existing TODO regarding type specificity for parameters.

Let me know if you need help defining a stricter schema for function parameters.


272-273: Another TODO: refine ChatCompletionFunctionParameters.
This repeats the same concern about lacking specificity in the type definition.

Happy to provide a more detailed approach or open an issue to track tighter parameter typing.


303-317: No concerns with aliasing removed types.
The alias definitions appear consistent with the rest of the module and maintain backward compatibility.

.github/workflows/generate-index-from-release.yaml (1)

47-48: Verify the CUDA version patterns.
The commented patterns for cu125 and cu126 reference -cu124. This mismatch may be unintentional:

-# ./scripts/releases-to-pep-503.sh index/whl/cu125 '^[v]?[0-9]+\.[0-9]+\.[0-9]+-cu124$'
-# ./scripts/releases-to-pep-503.sh index/whl/cu126 '^[v]?[0-9]+\.[0-9]+\.[0-9]+-cu124$'
+# ./scripts/releases-to-pep-503.sh index/whl/cu125 '^[v]?[0-9]+\.[0-9]+\.[0-9]+-cu125$'
+# ./scripts/releases-to-pep-503.sh index/whl/cu126 '^[v]?[0-9]+\.[0-9]+\.[0-9]+-cu126$'
.github/workflows/test-pypi.yaml (1)

36-36: Shell name “cmd” usage is valid only for Windows.
Actionlint flags these lines because “cmd” is unavailable on macOS or Linux. However, the step is guarded by if: runner.os == 'Windows', rendering this a false positive. You may ignore or disable the linter rule for these lines.

Also applies to: 108-108

🧰 Tools
🪛 actionlint (1.7.4)

36-36: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)

llama_cpp/server/settings.py (2)

13-14: Avoid modifying protected namespaces directly.
This line modifies BaseSettings.model_config["protected_namespaces"], which can lead to unexpected global behavior. Consider subclassing BaseSettings or another approach.


188-199: Ensure proper validation for dynamic defaults.
Revisit the set_dynamic_defaults method to confirm threads are set as intended. Adding assertions or logging can help verify correctness.

.github/workflows/build-wheels-cuda.yaml (2)

16-31: Revisit cross-platform matrix definition
This matrix construction relies on PowerShell commands. As previously noted, you may wish to switch to a more universally supported shell or approach for cross-platform builds.


78-87: Verify reliability of remote integration source
This step downloads Visual Studio integration files from a third-party repository (Jimver/cuda-toolkit). If that link changes or becomes unavailable, builds will fail. Confirm that this dependency is acceptable for your long-term needs.

.github/workflows/publish.yaml (4)

13-15: Confirm submodule recursion necessity
Submodules can slow down checkout if they are not strictly required. Please verify that you need them before continuing with a recursive checkout.


18-20: Validate Python 3.9 choice
Ensure Python 3.9 is the intended target environment. If you expect users on older or newer versions, consider a matrix strategy to broaden coverage.


31-41: Ignore actionlint’s warning about 'cmd' shell
Since this block only runs on Windows runners, the usage of shell: cmd is valid. The static analysis complaint does not apply here.

🧰 Tools
🪛 actionlint (1.7.4)

40-40: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)


44-44: Use of modern build invocation is approved
Switching to python -m build --sdist aligns with current Python packaging best practices.

.github/workflows/build-wheels-metal.yaml (2)

14-14: Check viability of “macos-15” GitHub runner
At present, GitHub Actions commonly supports macos-12, macos-13, etc. Verify that this runner exists or that you’re intentionally referencing a future environment.


27-44: Re-evaluate Windows-specific install steps in a macOS matrix
Your matrix includes only macOS runners, but there's a conditional block for Windows. Confirm whether these steps will ever run, and remove them if unnecessary.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 38-38: trailing spaces

(trailing-spaces)

llama_cpp/llama_grammar.py (1)

452-466: Strengthen error handling for remote schema fetching
When retrieving external schemas via network, consider potential failures or invalid content. Retry mechanisms, timeouts, and robust exception handling help maintain reliability.

tests/test_llama.py (6)

18-20: Version check test looks good.

This straightforward check ensures that llama_cpp.__version__ is defined.


36-41: Tokenization test without BOS works as expected.

The flow here is correct. Having a separate test for the case when BOS is excluded is helpful in ensuring thorough coverage.


43-51: EOS token checks look good.

Verifying that the </s> token is (or isn't) interpreted as the EOS token fosters confidence in correct handling of special tokens.


52-57: Empty text tokenization scenario is properly tested.

This edge case testing with an empty string is valuable to ensure consistent behavior.


59-64: Fixture approach for model path is appropriate.

Using llama_cpp_model_path ensures the tests remain clean and separate environment-specific logic (like model download) from the test logic itself.


67-116: Model decode flow is well-structured.

Fetching tokens, decoding, and sampling in a step-by-step manner thoroughly validates the pipeline.

.github/workflows/test.yaml (6)

10-12: New environment variables look appropriate.

Defining REPO_ID and MODEL_FILE at the workflow level keeps them consistent across all jobs.


32-62: Linux build matrix changes look consistent.

Adding Python 3.12 and relying on the existing steps is validated. Good job.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 42-42: trailing spaces

(trailing-spaces)


63-96: Windows build matrix changes look consistent.

Same matrix update as Linux. The caching strategy is aligned with the rest of the workflow.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 73-73: trailing spaces

(trailing-spaces)


[error] 92-92: trailing spaces

(trailing-spaces)


97-137: macOS build matrix with updated Python versions is valid.

Ensuring consistent environment steps across platforms improves maintainability.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 107-107: trailing spaces

(trailing-spaces)


[error] 125-125: trailing spaces

(trailing-spaces)


138-172: New macOS Metal build job is well-structured.

Enabling the Metal backend for macOS is a welcome addition for performance gains.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 145-145: trailing spaces

(trailing-spaces)


15-31:

❓ Verification inconclusive

Confirm the model is downloaded correctly.

Introducing a dedicated download-model job promotes reuse and centralizes the model caching logic. However, consider verifying the model retrieval process in case of version changes or repository updates.


🏁 Script executed:

#!/bin/bash

# Verify that the specified model file was indeed cached after download-model job.
# This script checks for the existence of the model file in the huggingface cache folder.
fd -tf "$MODEL_FILE" ~/.cache/huggingface/hub

Length of output: 174


Action: Verify Model Download and Cache Behavior

The dedicated download-model job helps centralize the caching logic, but the automated verification script revealed issues. Specifically, the error indicates that either the MODEL_FILE variable is empty or the cache directory (~/.cache/huggingface/hub) is missing/incorrectly referenced. This uncertainty could lead to false negatives in model caching verification.

  • Ensure that the MODEL_FILE environment variable is properly set in the workflow.
  • Confirm that the cache directory exists (or is created) before it is used.
  • Please conduct a manual verification of the model download and caching process—especially if repository updates or version changes occur—to ensure reliable behavior.
.github/workflows/publish-to-test.yaml (6)

5-11: Support for dev_version input is a good enhancement.

Providing a dynamic dev version helps manage pre-release distributions smoothly.


19-22: Upgraded actions/checkout with submodules set to recursive.

Consistent with other workflow changes. Allows submodules to be fetched reliably.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 22-22: trailing spaces

(trailing-spaces)


29-35: Appending the dev version to __version__ is appropriate.

This step ensures reflective versioning for ephemeral or staging releases.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 35-35: trailing spaces

(trailing-spaces)


36-53: Windows shell usage is selectively valid under the Windows runner.

The static analysis warning about invalid shell name does not apply here because runner.os == 'Windows' is enforced.

🧰 Tools
🪛 actionlint (1.7.4)

52-52: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)

🪛 YAMLlint (1.35.1)

[error] 47-47: trailing spaces

(trailing-spaces)


[error] 53-53: trailing spaces

(trailing-spaces)


56-57: Switch to python -m build --sdist is aligned with modern packaging.

This approach is more flexible and recommended over direct setup script invocation.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 57-57: trailing spaces

(trailing-spaces)


62-62: Publishing to TestPyPI is configured correctly.

Using repository-url: https://test.pypi.org/legacy/ is the recommended approach for test releases.

llama_cpp/server/errors.py (1)

125-135: Consider adding coverage for embedding requests.

Currently, the error handling logic handles text/chat completions but not embedding requests in the specialized formatters. Although context length exceeded errors may not apply to embedding requests, ensuring consistent coverage for embeddings (or explicitly documenting why it’s omitted) can help avoid confusion and potential unhandled edge cases.

Makefile (1)

45-47: Double-check Vulkan vs Kompute usage.

You’ve introduced separate build targets for Vulkan (build.vulkan) and Kompute (build.kompute), both of which rely on Vulkan. Ensure that users are aware of the difference between these configurations and that any duplicative logic is minimized.

Also applies to: 48-50

README.md (5)

13-13: Fix empty link for GitHub Downloads badge.

- [![Github All Releases](https://img.shields.io/github/downloads/abetlen/llama-cpp-python/total.svg?label=Github%20Downloads)]()
+ [![Github All Releases](https://img.shields.io/github/downloads/abetlen/llama-cpp-python/total.svg?label=Github%20Downloads)](https://github.com/abetlen/llama-cpp-python/releases)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

13-13: No empty links
null

(MD042, no-empty-links)


20-27: Fix unordered list indentation.

-    - OpenAI-like API
-    - [LangChain compatibility](https://python.langchain.com/docs/integrations/llms/llamacpp)
-    - [LlamaIndex compatibility](https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html)
-    - [Local Copilot replacement](https://llama-cpp-python.readthedocs.io/en/latest/server/#code-completion)
-    - [Function Calling support](https://llama-cpp-python.readthedocs.io/en/latest/server/#function-calling)
-    - [Vision API support](https://llama-cpp-python.readthedocs.io/en/latest/server/#multimodal-models)
-    - [Multiple Models](https://llama-cpp-python.readthedocs.io/en/latest/server/#configuration-and-multi-model-support)
+  - OpenAI-like API
+  - [LangChain compatibility](https://python.langchain.com/docs/integrations/llms/llamacpp)
+  - [LlamaIndex compatibility](https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html)
+  - [Local Copilot replacement](https://llama-cpp-python.readthedocs.io/en/latest/server/#code-completion)
+  - [Function Calling support](https://llama-cpp-python.readthedocs.io/en/latest/server/#function-calling)
+  - [Vision API support](https://llama-cpp-python.readthedocs.io/en/latest/server/#multimodal-models)
+  - [Multiple Models](https://llama-cpp-python.readthedocs.io/en/latest/server/#configuration-and-multi-model-support)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


24-24: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


25-25: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


39-39: Spell "macOS" correctly.

-       - MacOS: Xcode
+       - macOS: Xcode
🧰 Tools
🪛 LanguageTool

[grammar] ~39-~39: The operating system from Apple is written “macOS”.
Context: ...Windows: Visual Studio or MinGW - MacOS: Xcode To install the package, run: `...

(MAC_OS)

🪛 markdownlint-cli2 (0.17.2)

39-39: Unordered list indentation
Expected: 2; Actual: 6

(MD007, ul-indent)


291-291: Add a comma after introductory phrase "By default".

- By default `llama-cpp-python` generates completions
+ By default, `llama-cpp-python` generates completions
🧰 Tools
🪛 LanguageTool

[uncategorized] ~291-~291: Did you mean: “By default,”?
Context: ...ll create_completion print(output) ``` By default llama-cpp-python generates completion...

(BY_DEFAULT_COMMA)


340-340: Remove repeated word "will".

- The model will will format the messages
+ The model will format the messages
🧰 Tools
🪛 LanguageTool

[duplication] ~340-~340: Possible typo: you repeated a word.
Context: ... custom chat handler object. The model will will format the messages into a single promp...

(ENGLISH_WORD_REPEAT_RULE)

.github/workflows/build-and-release.yaml (2)

14-14: Pinning specific OS versions is good practice.

Switching from ubuntu-latest, windows-latest, and macos-latest to pinned versions (ubuntu-20.04, windows-2019, and macos-13) helps ensure more consistent builds and avoids unexpected changes when default images update.


118-118:

✅ Verification successful

Validate shell usage on non-Windows runners.

The static analysis tool flags usage of shell: cmd as invalid on macOS or Linux, but here it is conditionally applied only if runner.os == 'Windows'. Please confirm no accidental usage occurs outside Windows.

Run the following script to check if any job uses shell: cmd outside the Windows condition:


🏁 Script executed:

#!/bin/bash
# Searching for 'shell: cmd' usage in the entire repository to ensure it's only used for Windows
rg -A 3 'shell:\s*cmd'

Length of output: 24


I'll run an additional check to inspect the surrounding context of the "shell: cmd" usage in the workflow file. The initial script returned no output, which makes it hard to be 100% confident that "shell: cmd" isn’t accidentally applied outside a Windows runner. Let's verify by printing the context around line 118 in the file to ensure that the "shell: cmd" line is indeed within a conditional block for Windows only.

Below is a new shell script to display that context:

Please run the above script and manually verify that:

  • The "shell: cmd" line at or near line 118 is enclosed within a block that only applies when runner.os == 'Windows'.
  • No occurrence of "shell: cmd" appears without this Windows condition.

Once you confirm that the context is correct, you can safely resolve this review comment.

Action Required: Verify that 'shell: cmd' is only used under Windows conditions in the workflow file.


🏁 Script executed:

#!/bin/bash
# Display context around the "shell: cmd" line in the GitHub workflow file to verify its conditional usage
sed -n '110,130p' .github/workflows/build-and-release.yaml

Length of output: 703


Verified: "shell: cmd" usage is conditionally applied only for Windows.

The context check confirms that the line is properly nested under the conditional if: runner.os == 'Windows', ensuring it will not execute on non-Windows runners. No accidental usage was found elsewhere in the code.

🧰 Tools
🪛 actionlint (1.7.4)

118-118: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"

(shell-name)

llama_cpp/llava_cpp.py (1)

110-114: Confirm memory freeing success.

llava_image_embed_free frees the embedding, but it doesn’t return a result status. Ensure subsequent calls do not use the freed pointer. Consider adding post-checks or setting the pointer to None in higher-level code.

Would you like a script to search for misuse of freed embeddings?

CMakeLists.txt (4)

1-1: Confirm minimum required CMake version alignment.

Raising the minimum CMake version to 3.21 may exclude some users with older setups. Verify that any 3.21-specific features used here cannot be supported in earlier versions if backward-compatibility is a priority.

Would you like a shell script to scan any additional CMake files in the repository to check for advanced CMake features that might still work in older versions?


115-133: Check DLL installation on Windows.

This block installs runtime DLLs for llama and ggml. Confirm that:

  1. All dependent DLLs are copied (including those from transitive dependencies).
  2. The final location in ${SKBUILD_PLATLIB_DIR}/llama_cpp/lib is the correct path for packaging.

135-176: Ensure LLAVA build logic is fully isolated.

Within the LLAVA_BUILD block, you define CUDA/Metal checks, compile definitions, and subdirectory references. If someone disables LLAVA_BUILD, confirm that no references to llava or llava_shared appear elsewhere in the build that might cause linking errors.


48-114:

✅ Verification successful

Verify subdirectory build order and shared library consistency.

When LLAMA_BUILD is enabled, multiple shared libraries (llama, ggml, etc.) are built and installed. Ensure that:

  1. The subdirectory vendor/llama.cpp is added before building or referencing these targets.
  2. All needed macros (like GGML_* definitions) are appropriately set before including related subdirectories.

🏁 Script executed:

#!/bin/bash
# Searching for the add_subdirectory call(s) to confirm build order
rg -A 5 'add_subdirectory' 

Length of output: 1142


Subdirectory Order and Macro Setup Verified

The add_subdirectory(vendor/llama.cpp) call is correctly positioned before all the llama_cpp_python_install_target invocations, ensuring that shared library targets are built in the proper order. Additionally, the necessary macros (e.g., GGML_* definitions) are set prior to including related subdirectories.

No further changes are needed.

pyproject.toml (3)

1-4: Migrate any legacy build configurations if removing Poetry.

Switching to [build-system] with scikit-build-core is a solid approach. If you previously used Poetry for extra tasks (e.g., version bumping or packaging extras), ensure those tasks are replicated or replaced in the new setup so no functionalities are lost.


5-29: Verify project metadata consistency.

The newly introduced [project] section is well-structured. Confirm that the new dynamic version approach (using version under [tool.scikit-build.metadata.version]) correctly syncs with your release strategy.


63-82: Align scikit-build settings with the new build logic.

You have set wheel.packages, cmake.minimum-version, and sdist.include. Make sure these reflect the final structure of the Python package and that any necessary submodules or vendor directories are included if needed.

llama_cpp/_internals.py (4)

19-234: Validate memory management in LlamaModel.

The logic to free resources in __del__ and the close() method is correct. However, verify that the ExitStack and the callbacks are invoked in the correct order, especially if any part of model creation fails (e.g., partial initialization). This ensures no leftover references and minimal risk of memory leaks.

🧰 Tools
🪛 Ruff (0.8.2)

19-19: from .llama_types import * used; unable to detect undefined names

(F403)


297-303: Missing state-management implementations.

These TODO comments match a previous review comment about copy_state_data, set_state_data, llama_load_session_file, and llama_save_session_file. Implement them or clarify if they’re postponed for future versions.


758-880: Check custom sampler lifecycle.

CustomSampler objects are appended to self.custom_samplers, then removed in reverse order from the chain before freeing. This approach is solid but be sure no references to apply_wrapper or free_wrapper remain after sampler removal. This prevents potential memory or reference leaks in long-running processes.


1-18: 🛠️ Refactor suggestion

Avoid star imports to prevent potential naming conflicts.

from .llama_types import * can obscure undefined symbols like _LlamaTokenDataArray. It also complicates static analysis. Consider importing only what you specifically need or creating a well-defined namespace.

-from .llama_types import *
+from . import llama_types

Likely an incorrect or invalid review comment.

llama_cpp/llama_tokenizer.py (1)

113-116: Use raise ... from None to clarify the exception chain.

This is a repeated suggestion from a previous comment.

-except ImportError:
-    raise ImportError(
+except ImportError as err:
+    raise ImportError(
         "The `transformers` library is required ..."
     ) from None
🧰 Tools
🪛 Ruff (0.8.2)

113-116: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

llama_cpp/server/app.py (5)

50-55: Encapsulate the global _server_settings.

Similar to past suggestions, global variables can lead to side effects. Consider a class-based or module-based approach for maintainability.


62-71: Encapsulate the global _llama_proxy.

Using a global proxy can complicate concurrency and testing. Reuse or encapsulate it in a class or factory method to reduce shared state risks.


92-98: Encapsulate the global _ping_message_factory.

Relying on a global factory can result in hidden dependencies. Consider passing it explicitly to where it’s needed.


241-259: Confirm usage of Depends in parameter defaults.

Static analysis flags this (B008), but it's common in FastAPI. If you prefer to silence the warning, consider injecting dependencies inside the function body or configuring Ruff to ignore this rule.

🧰 Tools
🪛 Ruff (0.8.2)

242-242: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


529-550: Implementation for retrieving models looks correct.

Returning a list of models aligns with the required schema. No further remarks.

🧰 Tools
🪛 Ruff (0.8.2)

536-536: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

Comment on lines +68 to +69
except Exception as e:
raise RuntimeError(f"Failed to load shared library '{lib_path}': {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Adopt Python 3 exception chaining.
To clarify the original exception context, raise the new exception with from e or from None:

-            except Exception as e:
-                raise RuntimeError(f"Failed to load shared library '{lib_path}': {e}")
+            except Exception as e:
+                raise RuntimeError(f"Failed to load shared library '{lib_path}'") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
raise RuntimeError(f"Failed to load shared library '{lib_path}': {e}")
except Exception as e:
raise RuntimeError(f"Failed to load shared library '{lib_path}'") from e
🧰 Tools
🪛 Ruff (0.8.2)

69-69: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines +56 to +58
def __getitem__(self, model: str):
return self._model_settings_dict[model].model_dump()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle potential missing model keys.

Accessing self._model_settings_dict[model] may raise a KeyError if the given model alias does not exist. Consider adding a fallback or an explicit error message.

E.g.:

 def __getitem__(self, model: str):
-    return self._model_settings_dict[model].model_dump()
+    if model in self._model_settings_dict:
+        return self._model_settings_dict[model].model_dump()
+    else:
+        raise KeyError(f"Model alias '{model}' not found.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __getitem__(self, model: str):
return self._model_settings_dict[model].model_dump()
def __getitem__(self, model: str):
if model in self._model_settings_dict:
return self._model_settings_dict[model].model_dump()
else:
raise KeyError(f"Model alias '{model}' not found.")

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
CHANGELOG.md (7)

28-33: Capitalize “macOS” Correctly

Several changelog entries that refer to Apple’s operating system should consistently use “macOS” (with a lowercase “a”). Please review all such instances in this section to ensure adherence to this standard.

🧰 Tools
🪛 LanguageTool

[grammar] ~28-~28: The operating system from Apple is written “macOS”.
Context: ...f05f - fix(ci): Fix release by updating macos runner image to non-deprecated version ...

(MAC_OS)


[grammar] ~33-~33: The operating system from Apple is written “macOS”.
Context: ...## [0.3.4] - fix(ci): Build wheels for macos 13-15, cuda 12.1-12.4 by @abetlen in ca...

(MAC_OS)


292-292: Typographical Improvement: Use En Dash for Numerical Ranges

In the entry describing binary wheels (e.g., “CUDA (12.1 - 12.3)”), consider replacing the hyphen with an en dash to denote numerical ranges (i.e. “12.1–12.3”). This small change improves readability.

🧰 Tools
🪛 LanguageTool

[typographical] ~292-~292: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal by @abetlen, @jllllll, and ...

(DASH_RULE)


301-303: Duplicate Word Removal

The changelog entry:

  "Changed local API doc references to hosted by by @lawfordp2017 in abetlen#1317"

contains a duplicate “by”. Please remove the extra word so that it reads correctly (i.e. “…hosted by @lawfordp2017…”).

🧰 Tools
🪛 LanguageTool

[duplication] ~301-~301: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~301-~301: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...

(MAC_OS)


[duplication] ~303-~303: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...

(ENGLISH_WORD_REPEAT_RULE)


454-454: Eliminate Redundant Wording in Feature Entry

The feature description:

  "Add add_generation_prompt option for jinja2chatformatter"

has a duplicated word “add.” Please revise it to “Add generation_prompt option for jinja2chatformatter” for clarity.

🧰 Tools
🪛 LanguageTool

[duplication] ~454-~454: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...

(ENGLISH_WORD_REPEAT_RULE)


725-725: Hyphenate Compound Adjective Correctly

In the entry for version 0.1.78, the phrase “Grammar based sampling via LlamaGrammar…” appears. It is recommended to hyphenate compound adjectives; change this to “Grammar-based sampling via LlamaGrammar…” to enhance readability.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~725-~725: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...

(BASED_HYPHEN)


766-766: Grammatical Correction in Streaming Description

The changelog currently reads:
  "Streaming requests can are now interrupted pre-maturely when a concurrent request is made."

A clearer phrasing would be:
  "Streaming requests can now be interrupted prematurely when a concurrent request is made."

Please update this entry accordingly.

🧰 Tools
🪛 LanguageTool

[grammar] ~766-~766: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...

(MD_BE)


[style] ~766-~766: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...

(MISSING_IT_THERE)


816-816: Punctuation Enhancement for Clarity

In the entry that begins with “Truncate max_tokens in create_completion…”, consider inserting a comma before “so” to improve the sentence structure. For example:
  "Truncate max_tokens in create_completion, so requested tokens don’t exceed context size."

🧰 Tools
🪛 LanguageTool

[uncategorized] ~816-~816: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Truncate max_tokens in create_completion so requested tokens doesn't exceed context...

(COMMA_COMPOUND_SENTENCE_2)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e232fae and 37eb5f0.

📒 Files selected for processing (2)
  • CHANGELOG.md (1 hunks)
  • llama_cpp/__init__.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • llama_cpp/init.py
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[grammar] ~28-~28: The operating system from Apple is written “macOS”.
Context: ...f05f - fix(ci): Fix release by updating macos runner image to non-deprecated version ...

(MAC_OS)


[grammar] ~33-~33: The operating system from Apple is written “macOS”.
Context: ...## [0.3.4] - fix(ci): Build wheels for macos 13-15, cuda 12.1-12.4 by @abetlen in ca...

(MAC_OS)


[grammar] ~136-~136: The operating system from Apple is written “macOS”.
Context: ...4c25dd9f6a421cacec7604c6 - fix(ci): Fix MacOS release, use macos-12 image instead of ...

(MAC_OS)


[grammar] ~142-~142: Did you mean “too False to”?
Context: ...ate default config value for embeddings to False to fix error in text generation where logi...

(TOO_ADJECTIVE_TO)


[typographical] ~292-~292: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal by @abetlen, @jllllll, and ...

(DASH_RULE)


[duplication] ~301-~301: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~301-~301: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...

(MAC_OS)


[duplication] ~303-~303: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~454-~454: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~646-~646: “Windows” (operating system by Microsoft) is a proper noun and needs to be capitalized.
Context: ...ed runtime dlls to package directory on windows by @abetlen in 8d75016...

(A_WINDOWS)


[uncategorized] ~725-~725: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...

(BASED_HYPHEN)


[grammar] ~766-~766: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...

(MD_BE)


[style] ~766-~766: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...

(MISSING_IT_THERE)


[uncategorized] ~816-~816: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Truncate max_tokens in create_completion so requested tokens doesn't exceed context...

(COMMA_COMPOUND_SENTENCE_2)


[grammar] ~823-~823: The operating system from Apple is written “macOS”.
Context: ...th .so and .dylib for libllama on MacOS ## [v0.1.58] - (llama.cpp) Metal Sili...

(MAC_OS)

🔇 Additional comments (4)
CHANGELOG.md (4)

10-12: New Version Header for 0.3.8

The new changelog section for version 0.3.8 is clearly introduced. Please confirm that the commit reference hash (7841fc723e059d1fd9640e5c0ef19050fcc7c698) is correct and that the version‐header format follows your project’s changelog style guidelines.


14-18: Changelog Entries for 0.3.7

The entries for version 0.3.7 are well structured; both the feature and fix items are clearly listed with appropriate commit references. No issues observed.


20-24: Changelog Section for 0.3.6

The version 0.3.6 section is formatted consistently. If possible, consider adding a bit more detail about the fixes (e.g. context for the CUDA workflow fix) to aid readers in understanding the impact of these changes.


25-840: Overall Changelog Consistency and Formatting

The changelog is impressively comprehensive, covering all updates from version 0.3.8 down to earlier releases. Please take a moment to ensure overall consistency in style:

  • Operating System Names: Verify that “macOS” is used consistently (with a lowercase “a”) rather than variants like “MacOS” or “macos.”
  • Numerical Ranges: When indicating ranges (e.g. “12.1 - 12.3”), consider replacing the hyphen with an en dash (“12.1–12.3”) for enhanced typographical clarity.
  • Duplicate Word Issues: There are a couple of entries with duplicated words (see details in later comments).
  • Minor Grammatical Issues: For example, the description of streaming interruptions could be refined for clarity.

Standardizing these details will improve clarity and maintain a high documentation quality.

🧰 Tools
🪛 LanguageTool

[grammar] ~28-~28: The operating system from Apple is written “macOS”.
Context: ...f05f - fix(ci): Fix release by updating macos runner image to non-deprecated version ...

(MAC_OS)


[grammar] ~33-~33: The operating system from Apple is written “macOS”.
Context: ...## [0.3.4] - fix(ci): Build wheels for macos 13-15, cuda 12.1-12.4 by @abetlen in ca...

(MAC_OS)


[grammar] ~136-~136: The operating system from Apple is written “macOS”.
Context: ...4c25dd9f6a421cacec7604c6 - fix(ci): Fix MacOS release, use macos-12 image instead of ...

(MAC_OS)


[grammar] ~142-~142: Did you mean “too False to”?
Context: ...ate default config value for embeddings to False to fix error in text generation where logi...

(TOO_ADJECTIVE_TO)


[typographical] ~292-~292: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal by @abetlen, @jllllll, and ...

(DASH_RULE)


[duplication] ~301-~301: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~301-~301: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...

(MAC_OS)


[duplication] ~303-~303: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...

(ENGLISH_WORD_REPEAT_RULE)


[duplication] ~454-~454: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...

(ENGLISH_WORD_REPEAT_RULE)


[grammar] ~646-~646: “Windows” (operating system by Microsoft) is a proper noun and needs to be capitalized.
Context: ...ed runtime dlls to package directory on windows by @abetlen in 8d75016...

(A_WINDOWS)


[uncategorized] ~725-~725: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...

(BASED_HYPHEN)


[grammar] ~766-~766: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...

(MD_BE)


[style] ~766-~766: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...

(MISSING_IT_THERE)


[uncategorized] ~816-~816: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Truncate max_tokens in create_completion so requested tokens doesn't exceed context...

(COMMA_COMPOUND_SENTENCE_2)


[grammar] ~823-~823: The operating system from Apple is written “macOS”.
Context: ...th .so and .dylib for libllama on MacOS ## [v0.1.58] - (llama.cpp) Metal Sili...

(MAC_OS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.