-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(converter): fix llama.cpp build and conversion paths #39
Conversation
- Update binary name from 'quantize' to 'llama-quantize' - Add proper path handling for convert_hf_to_gguf.py - Fix missing convert_script_path attribute - Improve error handling and debug messages - Add Metal support for Apple Silicon This commit ensures proper building and execution of the model conversion pipeline on Apple Silicon (M-series) machines, with correct path resolution for all required binaries and scripts.
This reverts commit 64634d8.
- Update binary name from 'quantize' to 'llama-quantize' - Add proper path handling for convert_hf_to_gguf.py - Fix missing convert_script_path attribute - Improve error handling and debug messages - Add Metal support for Apple Silicon This commit ensures proper building and execution of the model conversion pipeline on Apple Silicon (M-series) machines, with correct path resolution for all required binaries and scripts.
Reviewer's Guide by SourceryThis PR implements a model conversion pipeline for LLaMA models, specifically targeting Apple Silicon compatibility. The implementation includes a new LLMConverter class that handles the end-to-end process of downloading, converting, quantizing, and uploading LLaMA models to the Hugging Face Hub, with proper Metal support for Apple Silicon machines. Class diagram for the new LLMConverter classclassDiagram
class LLMConverter {
- String model_id
- String model_name
- List~String~ quantization_methods
- String hf_token
- String username
- Path base_path
- Path convert_script_path
+ __init__(model_id: String, quantization_methods: List~String~, hf_token: String, username: String)
+ setup_llamacpp() void
+ download_model() void
+ convert_to_fp16() Path
+ quantize_model(fp16_path: Path) void
+ upload_to_hub() void
+ run() void
}
note for LLMConverter "Handles downloading, converting, quantizing, and uploading LLaMA models"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider moving the hardcoded configuration values (MODEL_ID, QUANTIZATION_METHODS, USERNAME) from main() to a configuration file or environment variables for better reusability and maintainability.
Here's what I looked at during the review
- 🟡 General issues: 5 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
print("Conversion completed successfully!") | ||
|
||
except subprocess.CalledProcessError as e: | ||
print(f"Error during execution: {e}", file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Consider adding cleanup of temporary files in error paths
When errors occur, any partially downloaded models or incomplete conversions should be cleaned up to avoid leaving invalid artifacts and wasting disk space.
print(f"Error during execution: {e}", file=sys.stderr) | |
if os.path.exists(output_path): | |
os.remove(output_path) | |
print(f"Error during execution: {e}", file=sys.stderr) |
# Configuration | ||
MODEL_ID = "leonvanbokhorst/Llama-3.2-1B-Instruct-Complaint" | ||
QUANTIZATION_METHODS = ["q4_k_m", "q5_k_m"] | ||
HF_TOKEN = os.getenv("HF_TOKEN") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Add validation for HF_TOKEN environment variable
The HF_TOKEN should be checked for None and raise a clear error if not set, rather than failing later with a potentially confusing error message.
): | ||
self.model_id = model_id | ||
self.model_name = model_id.split("/")[-1] | ||
self.quantization_methods = [m.strip() for m in quantization_methods] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Validate quantization methods before starting the conversion process
Consider maintaining a list of valid quantization methods and validating against it early to fail fast before downloading large models.
self.quantization_methods = [m.strip() for m in quantization_methods] | |
VALID_QUANTIZATION_METHODS = {"q2_k", "q3_k_l", "q3_k_m", "q3_k_s", "q4_0", "q4_1", "q4_k_m", "q4_k_s", "q5_0", "q5_1", "q5_k_m", "q5_k_s", "q6_k", "q8_0"} | |
invalid_methods = set(m.strip() for m in quantization_methods) - VALID_QUANTIZATION_METHODS | |
if invalid_methods: | |
raise ValueError(f"Invalid quantization methods: {invalid_methods}") | |
self.quantization_methods = [m.strip() for m in quantization_methods] |
|
||
def download_model(self) -> None: | ||
"""Download model from Hugging Face.""" | ||
subprocess.run(["git", "lfs", "install"], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Add check for git-lfs availability
Check if git-lfs is installed and provide a clear error message if it's not available.
subprocess.run(["git", "lfs", "install"], check=True) | |
try: | |
subprocess.run(["git", "lfs", "install"], check=True) | |
except FileNotFoundError: | |
raise RuntimeError("git-lfs is not installed. Please install git-lfs before proceeding.") |
print(f"Installing requirements from: {requirements_file}") | ||
subprocess.run(["pip", "install", "-r", str(requirements_file)], check=True) | ||
|
||
def download_model(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Add disk space verification before starting downloads
Check available disk space against expected model size requirements to avoid failed downloads or conversions due to insufficient space.
def download_model(self) -> None:
"""Download model from Hugging Face."""
free_space = shutil.disk_usage("/").free
required_space = 15 * 1024 * 1024 * 1024 # 15GB minimum
if free_space < required_space:
raise RuntimeError(f"Insufficient disk space. Need {required_space/1024**3:.1f}GB, have {free_space/1024**3:.1f}GB free")
from huggingface_hub import create_repo, HfApi | ||
|
||
|
||
class LLMConverter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider restructuring the monolithic LLMConverter class into focused utility functions.
The code would be simpler and more maintainable if restructured into focused utility functions rather than a monolithic class. This allows independent testing and reuse of components. Here's an example:
def setup_llamacpp(base_path: Path) -> tuple[Path, Path]:
"""Setup and build llama.cpp, returning paths to key executables."""
llamacpp_path = base_path / "llama.cpp"
# ... build steps ...
return llamacpp_path / "llama-quantize", llamacpp_path / "convert_hf_to_gguf.py"
def download_model(model_id: str) -> Path:
"""Download model and return its path."""
model_name = model_id.split("/")[-1]
if not Path(model_name).exists():
subprocess.run(["git", "clone", f"https://huggingface.co/{model_id}"], check=True)
return Path(model_name)
def convert_and_quantize(
model_path: Path,
convert_script: Path,
quantize_binary: Path,
methods: List[str]
) -> List[Path]:
"""Convert to fp16 and quantize, returning paths to generated files."""
# Conversion and quantization logic
return quantized_paths
def upload_files(
repo_name: str,
username: str,
folder_path: Path,
token: str
) -> None:
"""Upload generated files to HF Hub."""
api = HfApi()
# ... upload logic ...
def convert_model(
model_id: str,
quantization_methods: List[str],
username: str,
hf_token: str
) -> None:
"""Main orchestration function."""
base_path = Path.cwd()
quantize_binary, convert_script = setup_llamacpp(base_path)
model_path = download_model(model_id)
quantized_files = convert_and_quantize(
model_path, convert_script, quantize_binary, quantization_methods
)
upload_files(f"{model_path.name}-GGUF", username, model_path, hf_token)
This approach:
- Makes each component independently testable
- Removes shared state and reduces coupling
- Allows reuse of individual components
- Makes the flow of data explicit through function parameters
- Maintains the same functionality but with clearer boundaries
def run(self): | ||
"""Execute the full conversion pipeline.""" | ||
try: | ||
print("Setting up llama.cpp...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Extract code out into method (extract-method
)
This commit ensures proper building and execution of the model conversion
pipeline on Apple Silicon (M-series) machines, with correct path resolution
for all required binaries and scripts.
Summary by Sourcery
Fix the build and conversion paths for llama.cpp, including updating the binary name to 'llama-quantize', adding Metal support for Apple Silicon, and improving error handling and path resolution for the conversion script.
New Features:
Bug Fixes:
Enhancements: