Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(converter): fix llama.cpp build and conversion paths #39

Merged
merged 3 commits into from
Nov 6, 2024

Conversation

leonvanbokhorst
Copy link
Owner

@leonvanbokhorst leonvanbokhorst commented Nov 6, 2024

  • Update binary name from 'quantize' to 'llama-quantize'
  • Add proper path handling for convert_hf_to_gguf.py
  • Fix missing convert_script_path attribute
  • Improve error handling and debug messages
  • Add Metal support for Apple Silicon

This commit ensures proper building and execution of the model conversion
pipeline on Apple Silicon (M-series) machines, with correct path resolution
for all required binaries and scripts.

Summary by Sourcery

Fix the build and conversion paths for llama.cpp, including updating the binary name to 'llama-quantize', adding Metal support for Apple Silicon, and improving error handling and path resolution for the conversion script.

New Features:

  • Add Metal support for Apple Silicon in the llama.cpp build process.

Bug Fixes:

  • Fix path handling for convert_hf_to_gguf.py script to ensure correct execution.

Enhancements:

  • Update binary name from 'quantize' to 'llama-quantize' for clarity and consistency.
  • Improve error handling and debug messages in the model conversion pipeline.

- Update binary name from 'quantize' to 'llama-quantize'
- Add proper path handling for convert_hf_to_gguf.py
- Fix missing convert_script_path attribute
- Improve error handling and debug messages
- Add Metal support for Apple Silicon

This commit ensures proper building and execution of the model conversion
pipeline on Apple Silicon (M-series) machines, with correct path resolution
for all required binaries and scripts.
- Update binary name from 'quantize' to 'llama-quantize'
- Add proper path handling for convert_hf_to_gguf.py
- Fix missing convert_script_path attribute
- Improve error handling and debug messages
- Add Metal support for Apple Silicon

This commit ensures proper building and execution of the model conversion
pipeline on Apple Silicon (M-series) machines, with correct path resolution
for all required binaries and scripts.
Copy link
Contributor

sourcery-ai bot commented Nov 6, 2024

Reviewer's Guide by Sourcery

This PR implements a model conversion pipeline for LLaMA models, specifically targeting Apple Silicon compatibility. The implementation includes a new LLMConverter class that handles the end-to-end process of downloading, converting, quantizing, and uploading LLaMA models to the Hugging Face Hub, with proper Metal support for Apple Silicon machines.

Class diagram for the new LLMConverter class

classDiagram
    class LLMConverter {
        - String model_id
        - String model_name
        - List~String~ quantization_methods
        - String hf_token
        - String username
        - Path base_path
        - Path convert_script_path
        + __init__(model_id: String, quantization_methods: List~String~, hf_token: String, username: String)
        + setup_llamacpp() void
        + download_model() void
        + convert_to_fp16() Path
        + quantize_model(fp16_path: Path) void
        + upload_to_hub() void
        + run() void
    }
    note for LLMConverter "Handles downloading, converting, quantizing, and uploading LLaMA models"
Loading

File-Level Changes

Change Details Files
Implementation of the LLaMA model conversion pipeline
  • Created LLMConverter class to manage the conversion workflow
  • Added Metal support compilation flags for Apple Silicon
  • Implemented model download from Hugging Face
  • Added FP16 conversion functionality
  • Implemented model quantization with multiple methods
  • Added Hugging Face Hub upload capability
src/12_llm_gguf_conversion.py
Enhanced build and execution environment setup
  • Updated quantize binary name to 'llama-quantize'
  • Added proper path resolution for convert_hf_to_gguf.py
  • Implemented comprehensive error handling and debug messages
  • Added requirements installation from llama.cpp
src/12_llm_gguf_conversion.py
Version control configuration update
  • Modified gitignore settings
.gitignore

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@leonvanbokhorst leonvanbokhorst merged commit d5edec6 into main Nov 6, 2024
1 check passed
@leonvanbokhorst leonvanbokhorst deleted the gguf-converter branch November 6, 2024 06:42
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider moving the hardcoded configuration values (MODEL_ID, QUANTIZATION_METHODS, USERNAME) from main() to a configuration file or environment variables for better reusability and maintainability.
Here's what I looked at during the review
  • 🟡 General issues: 5 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

print("Conversion completed successfully!")

except subprocess.CalledProcessError as e:
print(f"Error during execution: {e}", file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider adding cleanup of temporary files in error paths

When errors occur, any partially downloaded models or incomplete conversions should be cleaned up to avoid leaving invalid artifacts and wasting disk space.

Suggested change
print(f"Error during execution: {e}", file=sys.stderr)
if os.path.exists(output_path):
os.remove(output_path)
print(f"Error during execution: {e}", file=sys.stderr)

# Configuration
MODEL_ID = "leonvanbokhorst/Llama-3.2-1B-Instruct-Complaint"
QUANTIZATION_METHODS = ["q4_k_m", "q5_k_m"]
HF_TOKEN = os.getenv("HF_TOKEN")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Add validation for HF_TOKEN environment variable

The HF_TOKEN should be checked for None and raise a clear error if not set, rather than failing later with a potentially confusing error message.

):
self.model_id = model_id
self.model_name = model_id.split("/")[-1]
self.quantization_methods = [m.strip() for m in quantization_methods]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Validate quantization methods before starting the conversion process

Consider maintaining a list of valid quantization methods and validating against it early to fail fast before downloading large models.

Suggested change
self.quantization_methods = [m.strip() for m in quantization_methods]
VALID_QUANTIZATION_METHODS = {"q2_k", "q3_k_l", "q3_k_m", "q3_k_s", "q4_0", "q4_1", "q4_k_m", "q4_k_s", "q5_0", "q5_1", "q5_k_m", "q5_k_s", "q6_k", "q8_0"}
invalid_methods = set(m.strip() for m in quantization_methods) - VALID_QUANTIZATION_METHODS
if invalid_methods:
raise ValueError(f"Invalid quantization methods: {invalid_methods}")
self.quantization_methods = [m.strip() for m in quantization_methods]


def download_model(self) -> None:
"""Download model from Hugging Face."""
subprocess.run(["git", "lfs", "install"], check=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Add check for git-lfs availability

Check if git-lfs is installed and provide a clear error message if it's not available.

Suggested change
subprocess.run(["git", "lfs", "install"], check=True)
try:
subprocess.run(["git", "lfs", "install"], check=True)
except FileNotFoundError:
raise RuntimeError("git-lfs is not installed. Please install git-lfs before proceeding.")

print(f"Installing requirements from: {requirements_file}")
subprocess.run(["pip", "install", "-r", str(requirements_file)], check=True)

def download_model(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Add disk space verification before starting downloads

Check available disk space against expected model size requirements to avoid failed downloads or conversions due to insufficient space.

    def download_model(self) -> None:
        """Download model from Hugging Face."""
        free_space = shutil.disk_usage("/").free
        required_space = 15 * 1024 * 1024 * 1024  # 15GB minimum
        if free_space < required_space:
            raise RuntimeError(f"Insufficient disk space. Need {required_space/1024**3:.1f}GB, have {free_space/1024**3:.1f}GB free")

from huggingface_hub import create_repo, HfApi


class LLMConverter:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider restructuring the monolithic LLMConverter class into focused utility functions.

The code would be simpler and more maintainable if restructured into focused utility functions rather than a monolithic class. This allows independent testing and reuse of components. Here's an example:

def setup_llamacpp(base_path: Path) -> tuple[Path, Path]:
    """Setup and build llama.cpp, returning paths to key executables."""
    llamacpp_path = base_path / "llama.cpp"
    # ... build steps ...
    return llamacpp_path / "llama-quantize", llamacpp_path / "convert_hf_to_gguf.py"

def download_model(model_id: str) -> Path:
    """Download model and return its path."""
    model_name = model_id.split("/")[-1]
    if not Path(model_name).exists():
        subprocess.run(["git", "clone", f"https://huggingface.co/{model_id}"], check=True)
    return Path(model_name)

def convert_and_quantize(
    model_path: Path,
    convert_script: Path,
    quantize_binary: Path,
    methods: List[str]
) -> List[Path]:
    """Convert to fp16 and quantize, returning paths to generated files."""
    # Conversion and quantization logic
    return quantized_paths

def upload_files(
    repo_name: str,
    username: str,
    folder_path: Path,
    token: str
) -> None:
    """Upload generated files to HF Hub."""
    api = HfApi()
    # ... upload logic ...

def convert_model(
    model_id: str,
    quantization_methods: List[str],
    username: str,
    hf_token: str
) -> None:
    """Main orchestration function."""
    base_path = Path.cwd()
    quantize_binary, convert_script = setup_llamacpp(base_path)
    model_path = download_model(model_id)
    quantized_files = convert_and_quantize(
        model_path, convert_script, quantize_binary, quantization_methods
    )
    upload_files(f"{model_path.name}-GGUF", username, model_path, hf_token)

This approach:

  • Makes each component independently testable
  • Removes shared state and reduces coupling
  • Allows reuse of individual components
  • Makes the flow of data explicit through function parameters
  • Maintains the same functionality but with clearer boundaries

def run(self):
"""Execute the full conversion pipeline."""
try:
print("Setting up llama.cpp...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Extract code out into method (extract-method)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant