Skip to content

Evaluate Command: Config Design and Functionality for Non-Training Model Evaluation #285

@bigximik

Description

@bigximik

🎯 Goal (What & Why)

With the merging of PR #264, we will have a separate evaluate command that uses the training config to evaluate a trained model by loading the latest checkpoint.

However, we also need the capability to evaluate arbitrary models — potentially with automatic downloading from the Hugging Face Hub for all supported models.

Here, let's discuss the features we need for such a command and what the configuration file should look like.

🚀 Execution Plan

(This section may start as an incomplete draft but must be defined before implementation begins.)

Step 1: What is the smallest working version?

(Describe the simplest way to implement this feature with minimal effort.)

Step 2: What additional optimizations are possible (but optional)?

(List potential refinements that can be added in later PRs if needed.)

📌 Acceptance Criteria (Must-Haves for Completion)

  • The feature must be functional and tested.
  • The implementation must be documented in practical terms.
  • The PR must include a performance/impact summary.
  • No refactors unless directly necessary for feature completion.

🛠️ Project Management

  • Assign the project to the Fast-LLM project.
  • Set the Estimate field (in days) in the GitHub project.
  • Use the Size field to categorize the PR size (Small/Medium/Large).
  • Assign an owner when opening the issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions