-
Notifications
You must be signed in to change notification settings - Fork 0
[pull] main from abetlen:main #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.17.0 to 2.18.0. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.17.0...v2.18.0) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Special tokens are already mapped from metadata by llama.cpp
…1333) * implement min_tokens * set default to 0 * pass min_tokens * fix * remove copy * implement MinTokensLogitsProcessor * format * fix condition
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the WalkthroughThe recent updates to the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant LlamaProxy
participant Model
User->>LlamaProxy: Request Model
LlamaProxy->>Model: Load/Access Model
Model-->>LlamaProxy: Return Model Instance
LlamaProxy-->>User: Provide Model
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Co-authored-by: Andrei <abetlen@gmail.com>
updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Andrei <abetlen@gmail.com>
* fix: correct issue with handling lock during streaming move locking for streaming into get_event_publisher call so it is locked and unlocked in the correct task for the streaming reponse * fix: simplify exit stack management for create_chat_completion and create_completion * fix: correct missing `async with` and format code * fix: remove unnecessary explicit use of AsyncExitStack fix: correct type hints for body_model --------- Co-authored-by: Andrei <abetlen@gmail.com>
* feat: Sync with llama.cpp Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements. * fix: Display performance metrics by default --------- Co-authored-by: Andrei <abetlen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (4)
llama_cpp/llama_cache.py (1)
28-32
: 🛠️ Refactor suggestionMark
_find_longest_prefix_key
as abstract.
Since_find_longest_prefix_key
inBaseLlamaCache
has no implementation, marking it with@abstractmethod
clarifies that child classes must override it.- def _find_longest_prefix_key( + @abstractmethod + def _find_longest_prefix_key(🧰 Tools
🪛 Ruff (0.8.2)
28-32:
BaseLlamaCache._find_longest_prefix_key
is an empty method in an abstract base class, but has no abstract decorator(B027)
llama_cpp/server/model.py (1)
43-44
: 🛠️ Refactor suggestionCombine nested
if
statements for clarity.Simplify the code by merging the nested
if
statements into a singleif
. This follows the Ruff SIM102 recommendation and improves readability.Apply this diff:
-if model == self._current_model_alias: - if self._current_model is not None: - return self._current_model +if model == self._current_model_alias and self._current_model is not None: + return self._current_model🧰 Tools
🪛 Ruff (0.8.2)
43-44: Use a single
if
statement instead of nestedif
statements(SIM102)
llama_cpp/llama_tokenizer.py (1)
11-11
: 🛠️ Refactor suggestionRemove overshadowing import.
List
fromllama_cpp.llama_types
is redefiningList
fromtyping
. Removing or renaming the import helps avoid confusion and potential conflicts.- from llama_cpp.llama_types import List
🧰 Tools
🪛 Ruff (0.8.2)
11-11: Redefinition of unused
List
from line 5(F811)
llama_cpp/server/app.py (1)
129-130
: 🛠️ Refactor suggestionRaise an explicit exception for missing settings.
Using
assert
for validation in production code can hide errors if assertions are disabled. Prefer explicit exceptions for clarity and maintainability.- assert ( - server_settings is not None and model_settings is not None - ), "server_settings and model_settings must be provided together" + if server_settings is None or model_settings is None: + raise ValueError("Both server_settings and model_settings must be provided together")
🧹 Nitpick comments (31)
llama_cpp/_logger.py (1)
29-43
: Further refine thellama_log_callback
.
- The global
_last_log_level
might introduce thread-safety concerns under heavy parallel usage. You may consider thread-local storage or synchronization if concurrency is planned.- The
# TODO
comment suggests additional work is needed to implement “continue previous log” logic. A possible approach is buffering partial logs for level 5 messages.- Printing directly to
stderr
is valid for strictly console-based logs, but you might prefer usinglogger.log(log_level, ...)
for production to leverage logging handlers and formats.-if logger.level <= GGML_LOG_LEVEL_TO_LOGGING_LEVEL[level]: - print(text.decode("utf-8"), end="", flush=True, file=sys.stderr) +if logger.isEnabledFor(log_level): + logger.log(log_level, text.decode("utf-8"), stacklevel=2)llama_cpp/llama_cache.py (1)
70-72
: Remove unnecessary.keys()
when iterating.
Iterating overself.cache_state
directly is more efficient than calling.keys()
.- for k in self.cache_state.keys() + for k in self.cache_state🧰 Tools
🪛 Ruff (0.8.2)
71-71: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
examples/notebooks/Batching.ipynb (1)
1-464
: Notebook demonstration is comprehensive.
The example thoroughly showcases model loading, tokenization, and decoding. A few optional enhancements:
- Add commentary about potential memory constraints when loading large models.
- Consider capturing and handling exceptions in the sampling loop to avoid partial outputs.
llama_cpp/_ctypes_extensions.py (1)
126-129
: Complete or clarify the_byref
function.
The_byref
function implementation is empty. If offset support is needed, consider implementing it or remove the parameter to avoid confusion.tests/test_llama_grammar.py (1)
12-12
: Use or remove the unusedgrammar
variables.
Currently,grammar
is assigned but not utilized at lines 12, 50, and 76. You can either remove the assignment or re-enable the commented-out assertions to make use of the variable:def test_grammar_from_string(): - grammar = llama_cpp.LlamaGrammar.from_string(tree) - # assert grammar._n_rules == 3 - # assert grammar._start_rule_index == 2 - # assert grammar.grammar is not None + grammar = llama_cpp.LlamaGrammar.from_string(tree) + assert grammar._n_rules == 3 + assert grammar._start_rule_index == 2 + assert grammar.grammar is not NoneAlso applies to: 50-50, 76-76
🧰 Tools
🪛 Ruff (0.8.2)
12-12: Local variable
grammar
is assigned to but never usedRemove assignment to unused variable
grammar
(F841)
.github/workflows/test-pypi.yaml (1)
25-25
: Remove trailing spaces to address YAMLlint errors.
These lines contain trailing spaces flagged by YAMLlint. Please remove them to resolve the formatting errors:- RUST_LOG=trace python -m uv pip install llama-cpp-python[all] --verbose + RUST_LOG=trace python -m uv pip install llama-cpp-python[all] --verboseAlso applies to: 31-31, 35-35, 37-37, 55-55, 61-61, 67-67, 71-71, 73-73, 90-90, 97-97, 103-103, 107-107, 109-109
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 25-25: trailing spaces
(trailing-spaces)
.github/workflows/build-wheels-cuda.yaml (1)
5-7
: Consider reducing GitHub Actions permissions
You have setcontents: write
underpermissions
. If read-only access or a more restricted scope suffices for your workflow, consider lowering these permissions to minimize security risks..github/workflows/build-wheels-metal.yaml (1)
26-26
: Remove trailing spaces
Lines 26, 38, and 72 have trailing whitespace, which can trigger lint warnings and reduce clarity.- ... + ...Also applies to: 38-38, 72-72
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 26-26: trailing spaces
(trailing-spaces)
llama_cpp/llama_grammar.py (3)
29-36
: Refine exception handling
Catching broad Exceptions can obscure specific errors. Consider handling more precise exceptions (OSError
,IOError
, etc.) to aid debugging.- except Exception as err: + except (OSError, IOError) as err:
42-43
: Check for a minor spelling error
The phrase "params_grammer" appears to be a typographical error. Changing it to "params_grammar" may clarify the message.- f"{cls.from_file.__name__}: error parsing grammar file: params_grammer is empty" + f"{cls.from_file.__name__}: error parsing grammar file: params_grammar is empty"
352-352
: Follow through on the TODO for URI/Email formats
Support for "uri" and "email" string formats remains unimplemented. If you need this feature, I can help you flesh it out.tests/test_llama.py (3)
1-1
: Remove unused imports.According to the static analysis hints, the
ctypes
,numpy
, andscipy.special.log_softmax
imports are unused. Removing them helps maintain cleanliness and reduce confusion.Proposed patch:
-import ctypes import multiprocessing -import numpy as np -from scipy.special import log_softmax from huggingface_hub import hf_hub_downloadAlso applies to: 4-5
🧰 Tools
🪛 Ruff (0.8.2)
1-1:
ctypes
imported but unusedRemove unused import:
ctypes
(F401)
22-35
: Consider avoiding direct access to private context attributes.The
_ctx.ctx
attribute is presumably private and might change in future library updates. Relying on public APIs would increase test stability.
117-219
: Consider splitting into smaller test functions for clarity.
test_real_llama
covers multiple behaviors (completions, grammar usage, seed setting, state saving/loading). Splitting these into smaller, more focused tests would improve readability and maintainability..github/workflows/test.yaml (1)
42-42
: Remove trailing spaces.Trailing spaces often trigger linting errors. Consider removing them for a cleaner codestyle.
Sample diff (showing one line as example; repeat for each flagged line):
- submodules: "recursive" + submodules: "recursive"Also applies to: 73-73, 92-92, 107-107, 125-125, 145-145
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 42-42: trailing spaces
(trailing-spaces)
.github/workflows/publish-to-test.yaml (1)
22-22
: Remove trailing spaces for lint compliance.Trimming trailing whitespace helps avoid lint warnings and maintains clean diffs.
Example diff:
- submodules: "recursive" + submodules: "recursive"Also applies to: 28-28, 35-35, 47-47, 53-53, 57-57
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 22-22: trailing spaces
(trailing-spaces)
llama_cpp/server/errors.py (2)
150-153
: Consider using logging instead of direct stderr printing.Instead of printing error messages to stderr, it might be cleaner and more flexible to utilize a logging library. This would allow you to configure log levels, formats, and destinations more systematically.
- print(f"Exception: {str(error)}", file=sys.stderr) - traceback.print_exc(file=sys.stderr) + import logging + logger = logging.getLogger(__name__) + logger.exception("Exception occurred")
162-213
: Add or expand unit tests.The custom
RouteErrorHandler
introduces specialized logic for error handling, but it’s not clear if there is dedicated testing to cover:
- Regex pattern matching for context length and model not found errors
- General fallback cases
- Request body parsing
Would you like help adding tests to ensure robust coverage for these scenarios?
Makefile (2)
8-10
: Add version pinning or constraints to prevent unexpected upgrades.Installing dependencies with
--upgrade
can lead to unexpected behavior if major versions introduce breaking changes. Pinning or constraining versions in a requirements file can help maintain predictable builds.deps: python3 -m pip install --upgrade pip - python3 -m pip install -e ".[all]" + python3 -m pip install --upgrade --requirement requirements.txt
15-23
: Consider unifying debug build configuration with an environment variable.Multiple debug-oriented CMake arguments are scattered across the
build.debug
target. Centralizing these debug flags can simplify maintenance and reduce duplication.build.debug: python3 -m pip install \ --verbose \ --config-settings=cmake.verbose=true \ --config-settings=logging.level=INFO \ --config-settings=install.strip=false \ - --config-settings=cmake.args="-DCMAKE_BUILD_TYPE=Debug;-DCMAKE_C_FLAGS='-ggdb -O0';-DCMAKE_CXX_FLAGS='-ggdb -O0'" \ + --config-settings=cmake.args="$DEBUG_BUILD_ARGS" \ --editable .README.md (1)
62-62
: Add a hyphen to "backend specific" when used as a modifier.- as well as backend specific options + as well as backend-specific options🧰 Tools
🪛 LanguageTool
[grammar] ~62-~62: Possible agreement error. The noun hardware seems to be uncountable; consider using: “much hardware”, “a good deal of hardware”.
Context: ...ion Configurationllama.cpp
supports a number of hardware acceleration backends to speed up infer...(MANY_NN_U)
[uncategorized] ~62-~62: Possible missing comma found.
Context: ...dware acceleration backends to speed up inference as well as backend specific options. Se...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~62-~62: When ‘backend-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ...ckends to speed up inference as well as backend specific options. See the [llama.cpp README](htt...(SPECIFIC_HYPHEN)
.github/workflows/build-and-release.yaml (2)
37-37
: Remove trailing spaces.Line 37 has trailing whitespace, which may break stricter YAML parsers.
Apply this diff to remove the trailing space:
- os: [ubuntu-20.04, windows-2019, macos-13] + os: [ubuntu-20.04, windows-2019, macos-13]🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 37-37: trailing spaces
(trailing-spaces)
112-112
: Remove trailing spaces.Line 112 has trailing whitespace, which may cause lint failures.
Apply this diff:
- RUST_LOG: trace + RUST_LOG: trace🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 112-112: trailing spaces
(trailing-spaces)
llama_cpp/server/model.py (1)
188-188
: Use a context manager when opening files.Use a
with
statement to ensure the file is correctly closed after reading.Apply this diff:
-json.load(open(settings.hf_tokenizer_config_path)) +with open(settings.hf_tokenizer_config_path) as f: + json.load(f)🧰 Tools
🪛 Ruff (0.8.2)
188-188: Use a context manager for opening files
(SIM115)
llama_cpp/llava_cpp.py (1)
42-42
: Validate library loading exceptions.Although loading the shared library is handled by
load_shared_library
, consider wrapping it in a try-except block to raise more explicit errors for easier debugging.CMakeLists.txt (2)
5-6
: Add simple documentation for these build options.The new
LLAMA_BUILD
andLLAVA_BUILD
options are a great addition for controlling library builds. Consider adding inline comments or README documentation to explain their intended usage.
8-46
: Streamline repeated library installation logic.The function
llama_cpp_python_install_target
is nicely defined to handle installation. However, consider reducing repetition across UNIX, Apple, and Windows sections by factoring out the target property settings (RPATH logic, etc.) into helper snippets if more targets are added in the future.pyproject.toml (1)
31-62
: Check optional dependencies for correct grouping and coverage.Restructuring dependencies under
[project.optional-dependencies]
is straightforward. Double-check that each group (server
,test
,dev
, andall
) fully mirrors what was previously in Poetry to avoid missing or redundant packages.llama_cpp/_internals.py (3)
359-446
: Reflect on the multi-sampler design.All
sample_*
methods raiseNotImplementedError
for strategies like top-k, top-p, or grammar sampling. If these functionalities are critical for advanced usage, consider either removing the stubs until implemented or returning a helpful error message describing the current limitations.🧰 Tools
🪛 Ruff (0.8.2)
359-359:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
365-365:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
371-371:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
378-378:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
385-385:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
391-391:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
401-401:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
419-419:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
433-433:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
440-440:
_LlamaTokenDataArray
may be undefined, or defined from star imports(F405)
465-525
: ReviewLlamaBatch
memory usage.You are creating a
llama_batch
instance for up ton_seq_max
sequences. Ifn_seq_max
is large, consider whether partial allocations (or on-demand expansions) might be more efficient. Also, double-check that the final token is always marked withlogits=True
only when appropriate.
720-720
: Remove redundant import statements.“from typing import List, Callable, Optional, Union” at this line conflicts with earlier imports and triggers redefinition warnings. Consider removing the repeated references to keep the import list clean.
-from typing import List, Callable, Optional, Union +from typing import Callable, Union🧰 Tools
🪛 Ruff (0.8.2)
720-720: Redefinition of unused
List
from line 8Remove definition:
List
(F811)
720-720: Redefinition of unused
Optional
from line 10Remove definition:
Optional
(F811)
720-720:
typing.Union
imported but unusedRemove unused import:
typing.Union
(F401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (31)
.github/workflows/build-and-release.yaml
(1 hunks).github/workflows/build-wheels-cuda.yaml
(1 hunks).github/workflows/build-wheels-metal.yaml
(1 hunks).github/workflows/generate-index-from-release.yaml
(1 hunks).github/workflows/publish-to-test.yaml
(1 hunks).github/workflows/publish.yaml
(1 hunks).github/workflows/test-pypi.yaml
(1 hunks).github/workflows/test.yaml
(1 hunks)CHANGELOG.md
(1 hunks)CMakeLists.txt
(1 hunks)Makefile
(3 hunks)README.md
(3 hunks)examples/notebooks/Batching.ipynb
(1 hunks)llama_cpp/__init__.py
(1 hunks)llama_cpp/_ctypes_extensions.py
(1 hunks)llama_cpp/_ggml.py
(1 hunks)llama_cpp/_internals.py
(1 hunks)llama_cpp/_logger.py
(1 hunks)llama_cpp/llama_cache.py
(1 hunks)llama_cpp/llama_grammar.py
(1 hunks)llama_cpp/llama_tokenizer.py
(1 hunks)llama_cpp/llama_types.py
(3 hunks)llama_cpp/llava_cpp.py
(1 hunks)llama_cpp/server/app.py
(1 hunks)llama_cpp/server/errors.py
(1 hunks)llama_cpp/server/model.py
(1 hunks)llama_cpp/server/settings.py
(1 hunks)pyproject.toml
(1 hunks)tests/test_llama.py
(1 hunks)tests/test_llama_grammar.py
(1 hunks)vendor/llama.cpp
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- vendor/llama.cpp
- llama_cpp/init.py
- llama_cpp/_ggml.py
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test-pypi.yaml
36-36: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
108-108: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
.github/workflows/publish.yaml
40-40: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
.github/workflows/publish-to-test.yaml
52-52: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
.github/workflows/build-and-release.yaml
118-118: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
🪛 YAMLlint (1.35.1)
.github/workflows/test-pypi.yaml
[error] 25-25: trailing spaces
(trailing-spaces)
[error] 31-31: trailing spaces
(trailing-spaces)
[error] 35-35: trailing spaces
(trailing-spaces)
[error] 37-37: trailing spaces
(trailing-spaces)
[error] 55-55: trailing spaces
(trailing-spaces)
[error] 61-61: trailing spaces
(trailing-spaces)
[error] 67-67: trailing spaces
(trailing-spaces)
[error] 71-71: trailing spaces
(trailing-spaces)
[error] 73-73: trailing spaces
(trailing-spaces)
[error] 90-90: trailing spaces
(trailing-spaces)
[error] 97-97: trailing spaces
(trailing-spaces)
[error] 103-103: trailing spaces
(trailing-spaces)
[error] 107-107: trailing spaces
(trailing-spaces)
[error] 109-109: trailing spaces
(trailing-spaces)
.github/workflows/test.yaml
[error] 42-42: trailing spaces
(trailing-spaces)
[error] 73-73: trailing spaces
(trailing-spaces)
[error] 92-92: trailing spaces
(trailing-spaces)
[error] 107-107: trailing spaces
(trailing-spaces)
[error] 125-125: trailing spaces
(trailing-spaces)
[error] 145-145: trailing spaces
(trailing-spaces)
.github/workflows/build-wheels-metal.yaml
[error] 26-26: trailing spaces
(trailing-spaces)
[error] 38-38: trailing spaces
(trailing-spaces)
[error] 72-72: trailing spaces
(trailing-spaces)
.github/workflows/publish-to-test.yaml
[error] 22-22: trailing spaces
(trailing-spaces)
[error] 28-28: trailing spaces
(trailing-spaces)
[error] 35-35: trailing spaces
(trailing-spaces)
[error] 47-47: trailing spaces
(trailing-spaces)
[error] 53-53: trailing spaces
(trailing-spaces)
[error] 57-57: trailing spaces
(trailing-spaces)
.github/workflows/build-and-release.yaml
[error] 37-37: trailing spaces
(trailing-spaces)
[error] 112-112: trailing spaces
(trailing-spaces)
🪛 Ruff (0.8.2)
llama_cpp/_ctypes_extensions.py
69-69: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
tests/test_llama_grammar.py
12-12: Local variable grammar
is assigned to but never used
Remove assignment to unused variable grammar
(F841)
50-50: Local variable grammar
is assigned to but never used
Remove assignment to unused variable grammar
(F841)
76-76: Local variable grammar
is assigned to but never used
Remove assignment to unused variable grammar
(F841)
llama_cpp/server/model.py
43-44: Use a single if
statement instead of nested if
statements
(SIM102)
188-188: Use a context manager for opening files
(SIM115)
llama_cpp/llama_cache.py
14-14: from .llama_types import *
used; unable to detect undefined names
(F403)
28-32: BaseLlamaCache._find_longest_prefix_key
is an empty method in an abstract base class, but has no abstract decorator
(B027)
71-71: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
tests/test_llama.py
1-1: ctypes
imported but unused
Remove unused import: ctypes
(F401)
4-4: numpy
imported but unused
Remove unused import: numpy
(F401)
5-5: scipy.special.log_softmax
imported but unused
Remove unused import: scipy.special.log_softmax
(F401)
llama_cpp/_internals.py
19-19: from .llama_types import *
used; unable to detect undefined names
(F403)
334-334: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
352-352: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
359-359: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
365-365: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
371-371: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
378-378: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
385-385: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
391-391: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
401-401: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
419-419: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
433-433: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
440-440: _LlamaTokenDataArray
may be undefined, or defined from star imports
(F405)
720-720: Redefinition of unused List
from line 8
Remove definition: List
(F811)
720-720: Redefinition of unused Optional
from line 10
Remove definition: Optional
(F811)
720-720: typing.Union
imported but unused
Remove unused import: typing.Union
(F401)
722-722: Redefinition of unused llama_cpp
from line 23
(F811)
727-727: typing
may be undefined, or defined from star imports
(F405)
llama_cpp/server/app.py
242-242: Do not perform function call Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
368-368: Do not perform function call Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
410-479: Do not perform function call Body
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
536-536: Do not perform function call Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
563-563: Do not perform function call Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
578-578: Do not perform function call Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
593-593: Do not perform function call Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
llama_cpp/llama_tokenizer.py
11-11: Redefinition of unused List
from line 5
(F811)
113-116: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🪛 LanguageTool
CHANGELOG.md
[grammar] ~24-~24: The operating system from Apple is written “macOS”.
Context: ...f05f - fix(ci): Fix release by updating macos runner image to non-deprecated version ...
(MAC_OS)
[grammar] ~29-~29: The operating system from Apple is written “macOS”.
Context: ...## [0.3.4] - fix(ci): Build wheels for macos 13-15, cuda 12.1-12.4 by @abetlen in ca...
(MAC_OS)
[grammar] ~132-~132: The operating system from Apple is written “macOS”.
Context: ...4c25dd9f6a421cacec7604c6 - fix(ci): Fix MacOS release, use macos-12 image instead of ...
(MAC_OS)
[grammar] ~138-~138: Did you mean “too False to”?
Context: ...ate default config value for embeddings to False to fix error in text generation where logi...
(TOO_ADJECTIVE_TO)
[typographical] ~288-~288: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal by @abetlen, @jllllll, and ...
(DASH_RULE)
[duplication] ~297-~297: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...
(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~297-~297: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...
(MAC_OS)
[duplication] ~299-~299: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~450-~450: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...
(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~642-~642: “Windows” (operating system by Microsoft) is a proper noun and needs to be capitalized.
Context: ...ed runtime dlls to package directory on windows by @abetlen in 8d75016...
(A_WINDOWS)
[uncategorized] ~721-~721: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...
(BASED_HYPHEN)
[grammar] ~762-~762: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...
(MD_BE)
[style] ~762-~762: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...
(MISSING_IT_THERE)
[uncategorized] ~812-~812: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Truncate max_tokens in create_completion so requested tokens doesn't exceed context...
(COMMA_COMPOUND_SENTENCE_2)
[grammar] ~819-~819: The operating system from Apple is written “macOS”.
Context: ...th .so
and .dylib
for libllama
on MacOS ## [v0.1.58] - (llama.cpp) Metal Sili...
(MAC_OS)
README.md
[grammar] ~39-~39: The operating system from Apple is written “macOS”.
Context: ...Windows: Visual Studio or MinGW - MacOS: Xcode To install the package, run: `...
(MAC_OS)
[grammar] ~62-~62: Possible agreement error. The noun hardware seems to be uncountable; consider using: “much hardware”, “a good deal of hardware”.
Context: ...ion Configuration llama.cpp
supports a number of hardware acceleration backends to speed up infer...
(MANY_NN_U)
[uncategorized] ~62-~62: Possible missing comma found.
Context: ...dware acceleration backends to speed up inference as well as backend specific options. Se...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~62-~62: When ‘backend-specific’ is used as a modifier, it is usually spelled with a hyphen.
Context: ...ckends to speed up inference as well as backend specific options. See the [llama.cpp README](htt...
(SPECIFIC_HYPHEN)
[grammar] ~237-~237: The operating system from Apple is written “macOS”.
Context: ....
(MAC_OS)
[grammar] ~237-~237: The operating system from Apple is written “macOS”.
Context: ...mentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedoc...
(MAC_OS)
[uncategorized] ~291-~291: Did you mean: “By default,”?
Context: ...ll create_completion print(output) ``` By default llama-cpp-python
generates completion...
(BY_DEFAULT_COMMA)
[style] ~316-~316: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...Llama.create_completion) methods of the Llama
class. ### Pulling models from Huggin...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[uncategorized] ~330-~330: Did you mean: “By default,”?
Context: ...="*q8_0.gguf", verbose=False ) ``` By default [from_pretrained
](https://llama-cpp-p...
(BY_DEFAULT_COMMA)
[uncategorized] ~338-~338: The abbreviation “i.e.” (= that is) requires two periods.
Context: ...this using pre-registered chat formats (ie. chatml
, llama-2
, gemma
, etc) or b...
(I_E)
[style] ~338-~338: In American English, abbreviations like “etc.” require a period.
Context: ...mats (ie. chatml
, llama-2
, gemma
, etc) or by providing a custom chat handler ...
(ETC_PERIOD)
[duplication] ~340-~340: Possible typo: you repeated a word.
Context: ... custom chat handler object. The model will will format the messages into a single promp...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~367-~367: Possible missing comma found.
Context: ...Llama.create_chat_completion_openai_v1) method which will return pydantic models inste...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~372-~372: Possible missing comma found.
Context: ...s to only valid JSON or a specific JSON Schema use the response_format
argument in [...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~398-~398: Possible missing comma found.
Context: ...the response further to a specific JSON Schema add the schema to the schema
property...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~425-~425: Do not mix variants of the same word (‘pre-train’ and ‘pretrain’) within a single text.
Context: ...s is possible through the functionary
pre-trained models chat format or through the gener...
(EN_WORD_COHERENCY)
[style] ~475-~475: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...://huggingface.co/meetkai). Functionary is able to intelligently call functions and also a...
(BE_ABLE_TO)
[uncategorized] ~477-~477: Possible missing article found.
Context: ...the default llama.cpp tokenizer used in Llama class. The tokenizer files are already ...
(AI_HYDRA_LEO_MISSING_THE)
[uncategorized] ~490-~490: Possible missing comma found.
Context: ...ide the default system messages used in Functionary as they are added automatically in the ...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~603-~603: Possible missing comma found.
Context: ...llama-cpp-python
supports speculative decoding which allows the model to generate comp...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~621-~621: Possible missing comma found.
Context: ...) ``` ### Embeddings To generate text embeddings use [create_embedding
](https://llama-...
(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~637-~637: A comma may be missing after the conjunctive/linking adverb ‘Thus’.
Context: ...s, one for each token in each sequence. Thus the dimensionality of the return type w...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~639-~639: Possible missing comma found.
Context: ... oriented model to yield sequence level embeddings is currently not possible, but you can ...
(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~645-~645: Consider using an alternative to strengthen your wording.
Context: ...our requirements. For instance, if you want to work with larger contexts, you can e...
(WANT_KEEN)
[style] ~654-~654: In American English, abbreviations like “etc.” require a period.
Context: ...e client (language libraries, services, etc). To install the server package and ge...
(ETC_PERIOD)
[uncategorized] ~681-~681: Possible missing article found.
Context: ...will format the prompt according to how model expects it. You can find the prompt for...
(AI_HYDRA_LEO_MISSING_THE)
[style] ~736-~736: Consider using a different verb to strengthen your wording.
Context: ...ama-cpp-python.readthedocs.io/). If you find any issues with the documentation, plea...
(FIND_ENCOUNTER)
[style] ~785-~785: This expression is wordy and can make your writing hard to follow, try rephrasing to be more direct.
Context: ...install from source as described above. The reason for this is that llama.cpp
is built with compiler opti...
(THE_REASON_FOR_THIS)
[style] ~787-~787: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...bling these optimizations or supporting a large number of pre-built binaries for each platform. ...
(LARGE_NUMBER_OF)
[formatting] ~788-~788: Use a comma after introducing a concessive statement.
Context: ... pre-built binaries for each platform. That being said there are some pre-built binaries avail...
(EVEN_SO_PREMIUM)
[uncategorized] ~791-~791: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...binaries and wheels for common platforms and I'm happy to accept any useful contribu...
(COMMA_COMPOUND_SENTENCE)
🪛 markdownlint-cli2 (0.17.2)
README.md
2-2: Images should have alternate text (alt text)
null
(MD045, no-alt-text)
5-5: Multiple spaces after hash on atx style heading
null
(MD019, no-multiple-space-atx)
13-13: No empty links
null
(MD042, no-empty-links)
20-20: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
26-26: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
35-35: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
36-36: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
37-37: Unordered list indentation
Expected: 2; Actual: 6
(MD007, ul-indent)
38-38: Unordered list indentation
Expected: 2; Actual: 6
(MD007, ul-indent)
39-39: Unordered list indentation
Expected: 2; Actual: 6
(MD007, ul-indent)
51-51: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
124-124: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
161-161: Emphasis used instead of a heading
null
(MD036, no-emphasis-as-heading)
341-341: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
342-342: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
343-343: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
344-344: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🔇 Additional comments (82)
llama_cpp/_logger.py (6)
1-4
: Clean and necessary imports.
These imports are essential for handling standard IO, C function callbacks, and Python logging. No issues found here.
5-6
: Consistent module import.
Importingllama_cpp
at the top-level is aligned with Python's standard practices. Looks good.
15-22
: Clear mapping of ggml to logging levels.
The dictionary correctly maps the ggml log levels to Python logging constants. This setup is straightforward and should handle the defined log levels well.
24-25
: Logger naming.
Using"llama-cpp-python"
as the logger name is descriptive and consistent. No issues.
26-27
: Initialization of_last_log_level
.
Storing the initial log level in a global variable is fine here, but be mindful of concurrency or re-entrancy if the library grows in scope.
46-48
: Toggle verbosity withset_verbose
.
Switching betweenlogging.DEBUG
andlogging.ERROR
is a clear, minimal approach to adjusting verbosity.CHANGELOG.md (8)
132-132
: Capitalize “macOS”.
This line refers to “MacOS” instead of “macOS.” Properly spelling Apple’s operating system helps maintain consistency.- fix(ci): Fix MacOS release, use macos-12 image instead of ... + fix(ci): Fix macOS release, use macos-12 image instead of ...🧰 Tools
🪛 LanguageTool
[grammar] ~132-~132: The operating system from Apple is written “macOS”.
Context: ...4c25dd9f6a421cacec7604c6 - fix(ci): Fix MacOS release, use macos-12 image instead of ...(MAC_OS)
297-297
: Eliminate repeated word “by”.
The phrase “hosted by by” is duplicated. Remove the extra “by” to correct the sentence.- - fix: Changed local API doc references to hosted by by @lawfordp2017 in #1317 + - fix: Changed local API doc references to hosted by @lawfordp2017 in #1317🧰 Tools
🪛 LanguageTool
[duplication] ~297-~297: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~297-~297: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...(MAC_OS)
299-299
: Avoid repeated “by by”.
Ensure that repeated words are removed in all related references.- fix: Changed local API doc references to hosted by by @lawfordp2017 in #1317 + fix: Changed local API doc references to hosted by @lawfordp2017 in #1317🧰 Tools
🪛 LanguageTool
[duplication] ~299-~299: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...(ENGLISH_WORD_REPEAT_RULE)
450-450
: Remove repeated word “Add”.
Repeating “Add” (“Add add_generation_prompt”) is likely a typo.- feat: Add add_generation_prompt option for jinja2chatformatter + feat: Add generation_prompt option for jinja2chatformatter🧰 Tools
🪛 LanguageTool
[duplication] ~450-~450: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...(ENGLISH_WORD_REPEAT_RULE)
642-642
: Capitalize “Windows”.
“windows” should be capitalized to “Windows” since it is a proper noun.- Install required runtime dlls to package directory on windows + Install required runtime dlls to package directory on Windows🧰 Tools
721-721
: Hyphenate “Grammar-based”.
The expression “Grammar based” should be hyphenated for clarity.- Grammar based sampling via LlamaGrammar + Grammar-based sampling via LlamaGrammar🧰 Tools
🪛 LanguageTool
[uncategorized] ~721-~721: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...(BASED_HYPHEN)
762-762
: Refine sentence structure.
“can are now interrupted” is grammatically incorrect. Use “can now be interrupted prematurely”.- Streaming requests can are now interrupted pre-maturely + Streaming requests can now be interrupted prematurely🧰 Tools
🪛 LanguageTool
[grammar] ~762-~762: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...(MD_BE)
[style] ~762-~762: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...(MISSING_IT_THERE)
819-819
: Capitalize “macOS”.
“MacOS” should be spelled “macOS.”- Support both `.so` and `.dylib` for `libllama` on MacOS + Support both `.so` and `.dylib` for `libllama` on macOS🧰 Tools
🪛 LanguageTool
[grammar] ~819-~819: The operating system from Apple is written “macOS”.
Context: ...th.so
and.dylib
forlibllama
on MacOS ## [v0.1.58] - (llama.cpp) Metal Sili...(MAC_OS)
llama_cpp/llama_cache.py (2)
14-14
: Avoid wildcard imports.
Usingfrom .llama_types import *
can lead to naming conflicts and less maintainable code. Consider explicitly importing only the required names.- from .llama_types import * + from .llama_types import LlamaState, LlamaParams # or the specific types needed🧰 Tools
🪛 Ruff (0.8.2)
14-14:
from .llama_types import *
used; unable to detect undefined names(F403)
145-155
: Use logging instead of debug prints.
These print statements appear to be for debugging. Consider removing them or switching to the configured logger for consistency in logging behavior.- print("LlamaDiskCache.__setitem__: called", file=sys.stderr) - print("LlamaDiskCache.__setitem__: delete", file=sys.stderr) - print("LlamaDiskCache.__setitem__: set", file=sys.stderr) - print("LlamaDiskCache.__setitem__: trim", file=sys.stderr) + import logging + logging.debug("LlamaDiskCache.__setitem__: called") # ... and so onllama_cpp/llama_types.py (8)
1-8
: Docstring additions look good.
These lines provide helpful context about the file’s purpose and align with best practices for maintainable code.
14-17
: Consider a more structured JSON representation.
Replacing the recursiveJsonType
withAny
-based union offers flexibility, but it may lose nested type safety. If you need deeper type constraints, consider a partial schema or typed config classes to balance flexibility with validation.
25-28
: Verify that embedding shape variations are intentional.
UsingUnion[List[float], List[List[float]]]
can introduce complexity if consumers expect a single dimension. Confirm that multi-dimensional support is indeed required.
72-77
: Clarify removal timeline for deprecated fields.
You markedfunction_call
as DEPRECATED. Outline specific removal plans or confirm if backward compatibility is still needed.
82-82
: Offer assistance for the TODO comment.
This TODO indicates thatparameters
needs more specific constraints.Would you like assistance in refining the
parameters
field to be more specific?
166-170
: Offer assistance for the repeated TODO comment.
This line mirrors the existing TODO regarding type specificity forparameters
.Let me know if you need help defining a stricter schema for function parameters.
272-273
: Another TODO: refineChatCompletionFunctionParameters
.
This repeats the same concern about lacking specificity in the type definition.Happy to provide a more detailed approach or open an issue to track tighter parameter typing.
303-317
: No concerns with aliasing removed types.
The alias definitions appear consistent with the rest of the module and maintain backward compatibility..github/workflows/generate-index-from-release.yaml (1)
47-48
: Verify the CUDA version patterns.
The commented patterns forcu125
andcu126
reference-cu124
. This mismatch may be unintentional:-# ./scripts/releases-to-pep-503.sh index/whl/cu125 '^[v]?[0-9]+\.[0-9]+\.[0-9]+-cu124$' -# ./scripts/releases-to-pep-503.sh index/whl/cu126 '^[v]?[0-9]+\.[0-9]+\.[0-9]+-cu124$' +# ./scripts/releases-to-pep-503.sh index/whl/cu125 '^[v]?[0-9]+\.[0-9]+\.[0-9]+-cu125$' +# ./scripts/releases-to-pep-503.sh index/whl/cu126 '^[v]?[0-9]+\.[0-9]+\.[0-9]+-cu126$'.github/workflows/test-pypi.yaml (1)
36-36
: Shell name “cmd” usage is valid only for Windows.
Actionlint flags these lines because “cmd” is unavailable on macOS or Linux. However, the step is guarded byif: runner.os == 'Windows'
, rendering this a false positive. You may ignore or disable the linter rule for these lines.Also applies to: 108-108
🧰 Tools
🪛 actionlint (1.7.4)
36-36: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
llama_cpp/server/settings.py (2)
13-14
: Avoid modifying protected namespaces directly.
This line modifiesBaseSettings.model_config["protected_namespaces"]
, which can lead to unexpected global behavior. Consider subclassingBaseSettings
or another approach.
188-199
: Ensure proper validation for dynamic defaults.
Revisit theset_dynamic_defaults
method to confirm threads are set as intended. Adding assertions or logging can help verify correctness..github/workflows/build-wheels-cuda.yaml (2)
16-31
: Revisit cross-platform matrix definition
This matrix construction relies on PowerShell commands. As previously noted, you may wish to switch to a more universally supported shell or approach for cross-platform builds.
78-87
: Verify reliability of remote integration source
This step downloads Visual Studio integration files from a third-party repository (Jimver/cuda-toolkit
). If that link changes or becomes unavailable, builds will fail. Confirm that this dependency is acceptable for your long-term needs..github/workflows/publish.yaml (4)
13-15
: Confirm submodule recursion necessity
Submodules can slow down checkout if they are not strictly required. Please verify that you need them before continuing with a recursive checkout.
18-20
: Validate Python 3.9 choice
Ensure Python 3.9 is the intended target environment. If you expect users on older or newer versions, consider a matrix strategy to broaden coverage.
31-41
: Ignore actionlint’s warning about 'cmd' shell
Since this block only runs on Windows runners, the usage ofshell: cmd
is valid. The static analysis complaint does not apply here.🧰 Tools
🪛 actionlint (1.7.4)
40-40: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
44-44
: Use of modern build invocation is approved
Switching topython -m build --sdist
aligns with current Python packaging best practices..github/workflows/build-wheels-metal.yaml (2)
14-14
: Check viability of “macos-15” GitHub runner
At present, GitHub Actions commonly supports macos-12, macos-13, etc. Verify that this runner exists or that you’re intentionally referencing a future environment.
27-44
: Re-evaluate Windows-specific install steps in a macOS matrix
Your matrix includes only macOS runners, but there's a conditional block for Windows. Confirm whether these steps will ever run, and remove them if unnecessary.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 38-38: trailing spaces
(trailing-spaces)
llama_cpp/llama_grammar.py (1)
452-466
: Strengthen error handling for remote schema fetching
When retrieving external schemas via network, consider potential failures or invalid content. Retry mechanisms, timeouts, and robust exception handling help maintain reliability.tests/test_llama.py (6)
18-20
: Version check test looks good.This straightforward check ensures that
llama_cpp.__version__
is defined.
36-41
: Tokenization test without BOS works as expected.The flow here is correct. Having a separate test for the case when BOS is excluded is helpful in ensuring thorough coverage.
43-51
: EOS token checks look good.Verifying that the
</s>
token is (or isn't) interpreted as the EOS token fosters confidence in correct handling of special tokens.
52-57
: Empty text tokenization scenario is properly tested.This edge case testing with an empty string is valuable to ensure consistent behavior.
59-64
: Fixture approach for model path is appropriate.Using
llama_cpp_model_path
ensures the tests remain clean and separate environment-specific logic (like model download) from the test logic itself.
67-116
: Model decode flow is well-structured.Fetching tokens, decoding, and sampling in a step-by-step manner thoroughly validates the pipeline.
.github/workflows/test.yaml (6)
10-12
: New environment variables look appropriate.Defining
REPO_ID
andMODEL_FILE
at the workflow level keeps them consistent across all jobs.
32-62
: Linux build matrix changes look consistent.Adding Python 3.12 and relying on the existing steps is validated. Good job.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 42-42: trailing spaces
(trailing-spaces)
63-96
: Windows build matrix changes look consistent.Same matrix update as Linux. The caching strategy is aligned with the rest of the workflow.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 73-73: trailing spaces
(trailing-spaces)
[error] 92-92: trailing spaces
(trailing-spaces)
97-137
: macOS build matrix with updated Python versions is valid.Ensuring consistent environment steps across platforms improves maintainability.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 107-107: trailing spaces
(trailing-spaces)
[error] 125-125: trailing spaces
(trailing-spaces)
138-172
: New macOS Metal build job is well-structured.Enabling the Metal backend for macOS is a welcome addition for performance gains.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 145-145: trailing spaces
(trailing-spaces)
15-31
:❓ Verification inconclusive
Confirm the model is downloaded correctly.
Introducing a dedicated
download-model
job promotes reuse and centralizes the model caching logic. However, consider verifying the model retrieval process in case of version changes or repository updates.
🏁 Script executed:
#!/bin/bash # Verify that the specified model file was indeed cached after download-model job. # This script checks for the existence of the model file in the huggingface cache folder. fd -tf "$MODEL_FILE" ~/.cache/huggingface/hubLength of output: 174
Action: Verify Model Download and Cache Behavior
The dedicated
download-model
job helps centralize the caching logic, but the automated verification script revealed issues. Specifically, the error indicates that either theMODEL_FILE
variable is empty or the cache directory (~/.cache/huggingface/hub
) is missing/incorrectly referenced. This uncertainty could lead to false negatives in model caching verification.
- Ensure that the
MODEL_FILE
environment variable is properly set in the workflow.- Confirm that the cache directory exists (or is created) before it is used.
- Please conduct a manual verification of the model download and caching process—especially if repository updates or version changes occur—to ensure reliable behavior.
.github/workflows/publish-to-test.yaml (6)
5-11
: Support fordev_version
input is a good enhancement.Providing a dynamic dev version helps manage pre-release distributions smoothly.
19-22
: Upgradedactions/checkout
with submodules set to recursive.Consistent with other workflow changes. Allows submodules to be fetched reliably.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 22-22: trailing spaces
(trailing-spaces)
29-35
: Appending the dev version to__version__
is appropriate.This step ensures reflective versioning for ephemeral or staging releases.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 35-35: trailing spaces
(trailing-spaces)
36-53
: Windows shell usage is selectively valid under the Windows runner.The static analysis warning about invalid shell name does not apply here because
runner.os == 'Windows'
is enforced.🧰 Tools
🪛 actionlint (1.7.4)
52-52: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
🪛 YAMLlint (1.35.1)
[error] 47-47: trailing spaces
(trailing-spaces)
[error] 53-53: trailing spaces
(trailing-spaces)
56-57
: Switch topython -m build --sdist
is aligned with modern packaging.This approach is more flexible and recommended over direct setup script invocation.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 57-57: trailing spaces
(trailing-spaces)
62-62
: Publishing to TestPyPI is configured correctly.Using
repository-url: https://test.pypi.org/legacy/
is the recommended approach for test releases.llama_cpp/server/errors.py (1)
125-135
: Consider adding coverage for embedding requests.Currently, the error handling logic handles text/chat completions but not embedding requests in the specialized formatters. Although context length exceeded errors may not apply to embedding requests, ensuring consistent coverage for embeddings (or explicitly documenting why it’s omitted) can help avoid confusion and potential unhandled edge cases.
Makefile (1)
45-47
: Double-check Vulkan vs Kompute usage.You’ve introduced separate build targets for Vulkan (
build.vulkan
) and Kompute (build.kompute
), both of which rely on Vulkan. Ensure that users are aware of the difference between these configurations and that any duplicative logic is minimized.Also applies to: 48-50
README.md (5)
13-13
: Fix empty link for GitHub Downloads badge.- []() + [](https://github.com/abetlen/llama-cpp-python/releases)🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
13-13: No empty links
null(MD042, no-empty-links)
20-27
: Fix unordered list indentation.- - OpenAI-like API - - [LangChain compatibility](https://python.langchain.com/docs/integrations/llms/llamacpp) - - [LlamaIndex compatibility](https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html) - - [Local Copilot replacement](https://llama-cpp-python.readthedocs.io/en/latest/server/#code-completion) - - [Function Calling support](https://llama-cpp-python.readthedocs.io/en/latest/server/#function-calling) - - [Vision API support](https://llama-cpp-python.readthedocs.io/en/latest/server/#multimodal-models) - - [Multiple Models](https://llama-cpp-python.readthedocs.io/en/latest/server/#configuration-and-multi-model-support) + - OpenAI-like API + - [LangChain compatibility](https://python.langchain.com/docs/integrations/llms/llamacpp) + - [LlamaIndex compatibility](https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html) + - [Local Copilot replacement](https://llama-cpp-python.readthedocs.io/en/latest/server/#code-completion) + - [Function Calling support](https://llama-cpp-python.readthedocs.io/en/latest/server/#function-calling) + - [Vision API support](https://llama-cpp-python.readthedocs.io/en/latest/server/#multimodal-models) + - [Multiple Models](https://llama-cpp-python.readthedocs.io/en/latest/server/#configuration-and-multi-model-support)🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
20-20: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
26-26: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
39-39
: Spell "macOS" correctly.- - MacOS: Xcode + - macOS: Xcode🧰 Tools
🪛 LanguageTool
[grammar] ~39-~39: The operating system from Apple is written “macOS”.
Context: ...Windows: Visual Studio or MinGW - MacOS: Xcode To install the package, run: `...(MAC_OS)
🪛 markdownlint-cli2 (0.17.2)
39-39: Unordered list indentation
Expected: 2; Actual: 6(MD007, ul-indent)
291-291
: Add a comma after introductory phrase "By default".- By default `llama-cpp-python` generates completions + By default, `llama-cpp-python` generates completions🧰 Tools
🪛 LanguageTool
[uncategorized] ~291-~291: Did you mean: “By default,”?
Context: ...ll create_completion print(output) ``` By defaultllama-cpp-python
generates completion...(BY_DEFAULT_COMMA)
340-340
: Remove repeated word "will".- The model will will format the messages + The model will format the messages🧰 Tools
🪛 LanguageTool
[duplication] ~340-~340: Possible typo: you repeated a word.
Context: ... custom chat handler object. The model will will format the messages into a single promp...(ENGLISH_WORD_REPEAT_RULE)
.github/workflows/build-and-release.yaml (2)
14-14
: Pinning specific OS versions is good practice.Switching from
ubuntu-latest
,windows-latest
, andmacos-latest
to pinned versions (ubuntu-20.04
,windows-2019
, andmacos-13
) helps ensure more consistent builds and avoids unexpected changes when default images update.
118-118
:✅ Verification successful
Validate shell usage on non-Windows runners.
The static analysis tool flags usage of
shell: cmd
as invalid on macOS or Linux, but here it is conditionally applied only ifrunner.os == 'Windows'
. Please confirm no accidental usage occurs outside Windows.Run the following script to check if any job uses
shell: cmd
outside the Windows condition:
🏁 Script executed:
#!/bin/bash # Searching for 'shell: cmd' usage in the entire repository to ensure it's only used for Windows rg -A 3 'shell:\s*cmd'Length of output: 24
I'll run an additional check to inspect the surrounding context of the "shell: cmd" usage in the workflow file. The initial script returned no output, which makes it hard to be 100% confident that "shell: cmd" isn’t accidentally applied outside a Windows runner. Let's verify by printing the context around line 118 in the file to ensure that the "shell: cmd" line is indeed within a conditional block for Windows only.
Below is a new shell script to display that context:
Please run the above script and manually verify that:
- The "shell: cmd" line at or near line 118 is enclosed within a block that only applies when
runner.os == 'Windows'
.- No occurrence of "shell: cmd" appears without this Windows condition.
Once you confirm that the context is correct, you can safely resolve this review comment.
Action Required: Verify that 'shell: cmd' is only used under Windows conditions in the workflow file.
🏁 Script executed:
#!/bin/bash # Display context around the "shell: cmd" line in the GitHub workflow file to verify its conditional usage sed -n '110,130p' .github/workflows/build-and-release.yamlLength of output: 703
Verified: "shell: cmd" usage is conditionally applied only for Windows.
The context check confirms that the line is properly nested under the conditional
if: runner.os == 'Windows'
, ensuring it will not execute on non-Windows runners. No accidental usage was found elsewhere in the code.🧰 Tools
🪛 actionlint (1.7.4)
118-118: shell name "cmd" is invalid on macOS or Linux. available names are "bash", "pwsh", "python", "sh"
(shell-name)
llama_cpp/llava_cpp.py (1)
110-114
: Confirm memory freeing success.
llava_image_embed_free
frees the embedding, but it doesn’t return a result status. Ensure subsequent calls do not use the freed pointer. Consider adding post-checks or setting the pointer toNone
in higher-level code.Would you like a script to search for misuse of freed embeddings?
CMakeLists.txt (4)
1-1
: Confirm minimum required CMake version alignment.Raising the minimum CMake version to 3.21 may exclude some users with older setups. Verify that any 3.21-specific features used here cannot be supported in earlier versions if backward-compatibility is a priority.
Would you like a shell script to scan any additional CMake files in the repository to check for advanced CMake features that might still work in older versions?
115-133
: Check DLL installation on Windows.This block installs runtime DLLs for
llama
andggml
. Confirm that:
- All dependent DLLs are copied (including those from transitive dependencies).
- The final location in
${SKBUILD_PLATLIB_DIR}/llama_cpp/lib
is the correct path for packaging.
135-176
: Ensure LLAVA build logic is fully isolated.Within the
LLAVA_BUILD
block, you define CUDA/Metal checks, compile definitions, and subdirectory references. If someone disablesLLAVA_BUILD
, confirm that no references tollava
orllava_shared
appear elsewhere in the build that might cause linking errors.
48-114
:✅ Verification successful
Verify subdirectory build order and shared library consistency.
When
LLAMA_BUILD
is enabled, multiple shared libraries (llama
,ggml
, etc.) are built and installed. Ensure that:
- The subdirectory
vendor/llama.cpp
is added before building or referencing these targets.- All needed macros (like
GGML_*
definitions) are appropriately set before including related subdirectories.
🏁 Script executed:
#!/bin/bash # Searching for the add_subdirectory call(s) to confirm build order rg -A 5 'add_subdirectory'Length of output: 1142
Subdirectory Order and Macro Setup Verified
The
add_subdirectory(vendor/llama.cpp)
call is correctly positioned before all thellama_cpp_python_install_target
invocations, ensuring that shared library targets are built in the proper order. Additionally, the necessary macros (e.g.,GGML_*
definitions) are set prior to including related subdirectories.No further changes are needed.
pyproject.toml (3)
1-4
: Migrate any legacy build configurations if removing Poetry.Switching to
[build-system]
withscikit-build-core
is a solid approach. If you previously used Poetry for extra tasks (e.g., version bumping or packaging extras), ensure those tasks are replicated or replaced in the new setup so no functionalities are lost.
5-29
: Verify project metadata consistency.The newly introduced
[project]
section is well-structured. Confirm that the new dynamic version approach (usingversion
under[tool.scikit-build.metadata.version]
) correctly syncs with your release strategy.
63-82
: Align scikit-build settings with the new build logic.You have set
wheel.packages
,cmake.minimum-version
, andsdist.include
. Make sure these reflect the final structure of the Python package and that any necessary submodules or vendor directories are included if needed.llama_cpp/_internals.py (4)
19-234
: Validate memory management inLlamaModel
.The logic to free resources in
__del__
and theclose()
method is correct. However, verify that theExitStack
and the callbacks are invoked in the correct order, especially if any part of model creation fails (e.g., partial initialization). This ensures no leftover references and minimal risk of memory leaks.🧰 Tools
🪛 Ruff (0.8.2)
19-19:
from .llama_types import *
used; unable to detect undefined names(F403)
297-303
: Missing state-management implementations.These
TODO
comments match a previous review comment aboutcopy_state_data
,set_state_data
,llama_load_session_file
, andllama_save_session_file
. Implement them or clarify if they’re postponed for future versions.
758-880
: Check custom sampler lifecycle.
CustomSampler
objects are appended toself.custom_samplers
, then removed in reverse order from the chain before freeing. This approach is solid but be sure no references toapply_wrapper
orfree_wrapper
remain after sampler removal. This prevents potential memory or reference leaks in long-running processes.
1-18
: 🛠️ Refactor suggestionAvoid star imports to prevent potential naming conflicts.
from .llama_types import *
can obscure undefined symbols like_LlamaTokenDataArray
. It also complicates static analysis. Consider importing only what you specifically need or creating a well-defined namespace.-from .llama_types import * +from . import llama_typesLikely an incorrect or invalid review comment.
llama_cpp/llama_tokenizer.py (1)
113-116
: Useraise ... from None
to clarify the exception chain.This is a repeated suggestion from a previous comment.
-except ImportError: - raise ImportError( +except ImportError as err: + raise ImportError( "The `transformers` library is required ..." ) from None🧰 Tools
🪛 Ruff (0.8.2)
113-116: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
llama_cpp/server/app.py (5)
50-55
: Encapsulate the global_server_settings
.Similar to past suggestions, global variables can lead to side effects. Consider a class-based or module-based approach for maintainability.
62-71
: Encapsulate the global_llama_proxy
.Using a global proxy can complicate concurrency and testing. Reuse or encapsulate it in a class or factory method to reduce shared state risks.
92-98
: Encapsulate the global_ping_message_factory
.Relying on a global factory can result in hidden dependencies. Consider passing it explicitly to where it’s needed.
241-259
: Confirm usage ofDepends
in parameter defaults.Static analysis flags this (B008), but it's common in FastAPI. If you prefer to silence the warning, consider injecting dependencies inside the function body or configuring Ruff to ignore this rule.
🧰 Tools
🪛 Ruff (0.8.2)
242-242: Do not perform function call
Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
529-550
: Implementation for retrieving models looks correct.Returning a list of models aligns with the required schema. No further remarks.
🧰 Tools
🪛 Ruff (0.8.2)
536-536: Do not perform function call
Depends
in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
except Exception as e: | ||
raise RuntimeError(f"Failed to load shared library '{lib_path}': {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Adopt Python 3 exception chaining.
To clarify the original exception context, raise the new exception with from e
or from None
:
- except Exception as e:
- raise RuntimeError(f"Failed to load shared library '{lib_path}': {e}")
+ except Exception as e:
+ raise RuntimeError(f"Failed to load shared library '{lib_path}'") from e
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
except Exception as e: | |
raise RuntimeError(f"Failed to load shared library '{lib_path}': {e}") | |
except Exception as e: | |
raise RuntimeError(f"Failed to load shared library '{lib_path}'") from e |
🧰 Tools
🪛 Ruff (0.8.2)
69-69: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
def __getitem__(self, model: str): | ||
return self._model_settings_dict[model].model_dump() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle potential missing model keys.
Accessing self._model_settings_dict[model]
may raise a KeyError
if the given model alias does not exist. Consider adding a fallback or an explicit error message.
E.g.:
def __getitem__(self, model: str):
- return self._model_settings_dict[model].model_dump()
+ if model in self._model_settings_dict:
+ return self._model_settings_dict[model].model_dump()
+ else:
+ raise KeyError(f"Model alias '{model}' not found.")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def __getitem__(self, model: str): | |
return self._model_settings_dict[model].model_dump() | |
def __getitem__(self, model: str): | |
if model in self._model_settings_dict: | |
return self._model_settings_dict[model].model_dump() | |
else: | |
raise KeyError(f"Model alias '{model}' not found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
CHANGELOG.md (7)
28-33
: Capitalize “macOS” CorrectlySeveral changelog entries that refer to Apple’s operating system should consistently use “macOS” (with a lowercase “a”). Please review all such instances in this section to ensure adherence to this standard.
🧰 Tools
🪛 LanguageTool
[grammar] ~28-~28: The operating system from Apple is written “macOS”.
Context: ...f05f - fix(ci): Fix release by updating macos runner image to non-deprecated version ...(MAC_OS)
[grammar] ~33-~33: The operating system from Apple is written “macOS”.
Context: ...## [0.3.4] - fix(ci): Build wheels for macos 13-15, cuda 12.1-12.4 by @abetlen in ca...(MAC_OS)
292-292
: Typographical Improvement: Use En Dash for Numerical RangesIn the entry describing binary wheels (e.g., “CUDA (12.1 - 12.3)”), consider replacing the hyphen with an en dash to denote numerical ranges (i.e. “12.1–12.3”). This small change improves readability.
🧰 Tools
301-303
: Duplicate Word RemovalThe changelog entry:
"Changed local API doc references to hosted by by @lawfordp2017 in abetlen#1317"
contains a duplicate “by”. Please remove the extra word so that it reads correctly (i.e. “…hosted by @lawfordp2017…”).
🧰 Tools
🪛 LanguageTool
[duplication] ~301-~301: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~301-~301: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...(MAC_OS)
[duplication] ~303-~303: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...(ENGLISH_WORD_REPEAT_RULE)
454-454
: Eliminate Redundant Wording in Feature EntryThe feature description:
"Add add_generation_prompt option for jinja2chatformatter"
has a duplicated word “add.” Please revise it to “Add generation_prompt option for jinja2chatformatter” for clarity.
🧰 Tools
🪛 LanguageTool
[duplication] ~454-~454: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...(ENGLISH_WORD_REPEAT_RULE)
725-725
: Hyphenate Compound Adjective CorrectlyIn the entry for version 0.1.78, the phrase “Grammar based sampling via LlamaGrammar…” appears. It is recommended to hyphenate compound adjectives; change this to “Grammar-based sampling via LlamaGrammar…” to enhance readability.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~725-~725: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...(BASED_HYPHEN)
766-766
: Grammatical Correction in Streaming DescriptionThe changelog currently reads:
"Streaming requests can are now interrupted pre-maturely when a concurrent request is made."A clearer phrasing would be:
"Streaming requests can now be interrupted prematurely when a concurrent request is made."Please update this entry accordingly.
🧰 Tools
🪛 LanguageTool
[grammar] ~766-~766: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...(MD_BE)
[style] ~766-~766: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...(MISSING_IT_THERE)
816-816
: Punctuation Enhancement for ClarityIn the entry that begins with “Truncate max_tokens in create_completion…”, consider inserting a comma before “so” to improve the sentence structure. For example:
"Truncate max_tokens in create_completion, so requested tokens don’t exceed context size."🧰 Tools
🪛 LanguageTool
[uncategorized] ~816-~816: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Truncate max_tokens in create_completion so requested tokens doesn't exceed context...(COMMA_COMPOUND_SENTENCE_2)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md
(1 hunks)llama_cpp/__init__.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- llama_cpp/init.py
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[grammar] ~28-~28: The operating system from Apple is written “macOS”.
Context: ...f05f - fix(ci): Fix release by updating macos runner image to non-deprecated version ...
(MAC_OS)
[grammar] ~33-~33: The operating system from Apple is written “macOS”.
Context: ...## [0.3.4] - fix(ci): Build wheels for macos 13-15, cuda 12.1-12.4 by @abetlen in ca...
(MAC_OS)
[grammar] ~136-~136: The operating system from Apple is written “macOS”.
Context: ...4c25dd9f6a421cacec7604c6 - fix(ci): Fix MacOS release, use macos-12 image instead of ...
(MAC_OS)
[grammar] ~142-~142: Did you mean “too False to”?
Context: ...ate default config value for embeddings to False to fix error in text generation where logi...
(TOO_ADJECTIVE_TO)
[typographical] ~292-~292: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal by @abetlen, @jllllll, and ...
(DASH_RULE)
[duplication] ~301-~301: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...
(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~301-~301: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...
(MAC_OS)
[duplication] ~303-~303: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...
(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~454-~454: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...
(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~646-~646: “Windows” (operating system by Microsoft) is a proper noun and needs to be capitalized.
Context: ...ed runtime dlls to package directory on windows by @abetlen in 8d75016...
(A_WINDOWS)
[uncategorized] ~725-~725: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...
(BASED_HYPHEN)
[grammar] ~766-~766: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...
(MD_BE)
[style] ~766-~766: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...
(MISSING_IT_THERE)
[uncategorized] ~816-~816: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Truncate max_tokens in create_completion so requested tokens doesn't exceed context...
(COMMA_COMPOUND_SENTENCE_2)
[grammar] ~823-~823: The operating system from Apple is written “macOS”.
Context: ...th .so
and .dylib
for libllama
on MacOS ## [v0.1.58] - (llama.cpp) Metal Sili...
(MAC_OS)
🔇 Additional comments (4)
CHANGELOG.md (4)
10-12
: New Version Header for 0.3.8The new changelog section for version 0.3.8 is clearly introduced. Please confirm that the commit reference hash (
7841fc723e059d1fd9640e5c0ef19050fcc7c698
) is correct and that the version‐header format follows your project’s changelog style guidelines.
14-18
: Changelog Entries for 0.3.7The entries for version 0.3.7 are well structured; both the feature and fix items are clearly listed with appropriate commit references. No issues observed.
20-24
: Changelog Section for 0.3.6The version 0.3.6 section is formatted consistently. If possible, consider adding a bit more detail about the fixes (e.g. context for the CUDA workflow fix) to aid readers in understanding the impact of these changes.
25-840
: Overall Changelog Consistency and FormattingThe changelog is impressively comprehensive, covering all updates from version 0.3.8 down to earlier releases. Please take a moment to ensure overall consistency in style:
- Operating System Names: Verify that “macOS” is used consistently (with a lowercase “a”) rather than variants like “MacOS” or “macos.”
- Numerical Ranges: When indicating ranges (e.g. “12.1 - 12.3”), consider replacing the hyphen with an en dash (“12.1–12.3”) for enhanced typographical clarity.
- Duplicate Word Issues: There are a couple of entries with duplicated words (see details in later comments).
- Minor Grammatical Issues: For example, the description of streaming interruptions could be refined for clarity.
Standardizing these details will improve clarity and maintain a high documentation quality.
🧰 Tools
🪛 LanguageTool
[grammar] ~28-~28: The operating system from Apple is written “macOS”.
Context: ...f05f - fix(ci): Fix release by updating macos runner image to non-deprecated version ...(MAC_OS)
[grammar] ~33-~33: The operating system from Apple is written “macOS”.
Context: ...## [0.3.4] - fix(ci): Build wheels for macos 13-15, cuda 12.1-12.4 by @abetlen in ca...(MAC_OS)
[grammar] ~136-~136: The operating system from Apple is written “macOS”.
Context: ...4c25dd9f6a421cacec7604c6 - fix(ci): Fix MacOS release, use macos-12 image instead of ...(MAC_OS)
[grammar] ~142-~142: Did you mean “too False to”?
Context: ...ate default config value for embeddings to False to fix error in text generation where logi...(TOO_ADJECTIVE_TO)
[typographical] ~292-~292: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal by @abetlen, @jllllll, and ...(DASH_RULE)
[duplication] ~301-~301: Possible typo: you repeated a word.
Context: ...11 - fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - fea...(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~301-~301: The operating system from Apple is written “macOS”.
Context: ...ix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 by @bretello in abetlen#1289 - feat: Add...(MAC_OS)
[duplication] ~303-~303: Possible typo: you repeated a word.
Context: ...nged local API doc references to hosted by by @lawfordp2017 in abetlen#1317 ## [0.2.57] - ...(ENGLISH_WORD_REPEAT_RULE)
[duplication] ~454-~454: Possible typo: you repeated a word.
Context: ...91cf5dddbc4b6041aead4accc7c7a2d - feat: Add add_generation_prompt option for jinja2chat...(ENGLISH_WORD_REPEAT_RULE)
[grammar] ~646-~646: “Windows” (operating system by Microsoft) is a proper noun and needs to be capitalized.
Context: ...ed runtime dlls to package directory on windows by @abetlen in 8d75016...(A_WINDOWS)
[uncategorized] ~725-~725: This expression is usually spelled with a hyphen.
Context: ...iring new model format) ## [0.1.78] - Grammar based sampling via LlamaGrammar which can be ...(BASED_HYPHEN)
[grammar] ~766-~766: This verb might not be in the correct form.
Context: ...[0.1.69] - (server) Streaming requests can are now interrupted pre-maturely when a con...(MD_BE)
[style] ~766-~766: To form a complete sentence, be sure to include a subject.
Context: ...rely when a concurrent request is made. Can be controlled with the `interrupt_reque...(MISSING_IT_THERE)
[uncategorized] ~816-~816: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Truncate max_tokens in create_completion so requested tokens doesn't exceed context...(COMMA_COMPOUND_SENTENCE_2)
[grammar] ~823-~823: The operating system from Apple is written “macOS”.
Context: ...th.so
and.dylib
forlibllama
on MacOS ## [v0.1.58] - (llama.cpp) Metal Sili...(MAC_OS)
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )
Summary by CodeRabbit
New Features
llama-cpp-python
project for better clarity and correctness.dependabot
configuration.LlamaProxy
class for enhanced management of multiple Llama models.Bug Fixes
llama_cpp
library.Chores
pyproject.toml
for improved dependency management..gitignore
file to manage ignored files more effectively.llama.cpp
subproject to include the latest enhancements.