Run lm-eval-harness benchmarks during validation

# 🎯 **Goal (What & Why)**

Enable Fast-LLM to run structured evaluations using [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness). 
This allows benchmarking Fast-LLM models across many standard tasks using the in-memory model during validation, leveraging the existing HuggingFace-compatible interface improved in #217.

Note that the current `HuggingfaceGPTModelForCausalLM.from_pretrained(...)` API always reloads the model from disk. This breaks the intended workflow, where we keep the model sharded and in memory across all GPUs. We want to integrate with `lm-eval-harness` while **reusing the model already in memory**, avoiding redundant loading, avoiding eviction, and reducing complexity.

# 🚀 **Execution Plan**

### **Step 1: Add `from_existing_model()` constructor**
Add a new constructor method to `HuggingfaceGPTModelForCausalLM` that allows wrapping an existing `GPTModel` instance, e.g.
```python
@classmethod
def from_existing_model(cls, model: GPTModel) -> HuggingfaceGPTModelForCausalLM:
    config = HuggingfaceGPTModelConfig(fast_llm_config=model.config)
    obj = cls(config)
    obj._fast_llm_model = model
    return obj
```

Notes:
- `HuggingfaceGPTModelConfig` already holds a `GPTModelConfig`, so no need to explicitly construct it if we already have a `GPTModel`.
- We need to assign fields like `.runner` and `.schedule` because they'll be used during generation.

### **Step 2: Implement a `TemplateLM` subclass for Fast-LLM**
Create a subclass of `lm_eval.api.model.TemplateLM` that wraps an instance of `HuggingfaceGPTModelForCausalLM` and provides the required methods:
- `tok_encode()`
- `loglikelihood()`, `loglikelihood_rolling()`
- `generate_until()`
- `eot_token_id`

Use the HuggingFace tokenizer that pairs with the Fast-LLM model. Assume greedy decoding only. No need to support chat templates or SFT-specific tokenization quirks yet.

### **Step 3: Integration test**
- Load a small model like [`HuggingFaceTB/SmolLM2-135M-Instruct`](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct).
- Wrap the in-memory Fast-LLM model using `from_existing_model(...)`.
- Use `lm_eval.simple_evaluate(...)` to run one or more generative tasks (e.g., `hellaswag`, `arc_challenge`, `winogrande`).
- Validate that results match expectations.

### **Step 4: Extend Fast-LLM's validation config to support lm-eval-harness tasks**
- Extend the Fast-LLM config to accept a list of generative evaluation tasks using lm-eval-harness.
  - Fields to support:
    - `tasks`: list of task names (e.g. `["hellaswag", "arc_challenge"]`)
    - `num_fewshot`: number of few-shot examples to use per task.
- Implement logic that:
  - Runs the lm-eval-harness **only on global rank 0**.
  - Constructs the `TemplateLM` wrapper for the in-memory Fast-LLM model.
  - Calls `simple_evaluate(...)` with the configured tasks.
  - Relies on Fast-LLM’s `forward()` for token-level inference, which is already distributed across GPUs and hosts.
- Add support for logging results (e.g. to stdout and WandB), and disable lm-eval progress bars because Fast-LLM typically runs in a headless interface.

# 📌 **Acceptance Criteria** (Must-Haves for Completion)

* Must be able to wrap an in-memory `GPTModel` in a `HuggingfaceGPTModelForCausalLM` via `from_existing_model()` without disk I/O.
* Must implement a subclass of `TemplateLM` that:
  - Uses Fast-LLM's HuggingFace-compatible model (`HuggingfaceGPTModelForCausalLM`) for all inference.
  - Implements `generate_until`, `loglikelihood`, and `loglikelihood_rolling`.
  - Uses the correct tokenizer, PAD token ID, and EOS token ID.
* Must support calling `lm_eval.simple_evaluate(...)` using the wrapped model and produce correct results.
* Must extend Fast-LLM's validation/evaluation configuration to support:
  - Specifying lm-eval-harness tasks by name.
  - Setting `num_fewshot`.
* Must ensure lm-eval-harness runs **only on global rank 0**, while `model.forward()` is transparently distributed using Fast-LLM’s runner logic.
* Must include:
  - A working test that evaluates at least one lm-eval task on a small model (`SmolLM2-135M-Instruct` or similar).
  - Logging of evaluation results (stdout and WandB).
* Implementation must be documented:
  - Configs in docs that show how to run lm-eval's generative benchmarks.

# 📎 **Relevant Links**
- [lm-eval-harness interface guide](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md)
- `TemplateLM` interface:  
  https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/api/model.py#L253  
- Fast-LLM HF model entry point:  
  https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/models/gpt/huggingface.py

# 🛠️ **Project Management**
- [x] **Assign the project to the Fast-LLM project.**
- [x] **Set the `Estimate` field (in days) in the GitHub project.**
- [x] **Use the `Size` field to categorize the PR size (Small/Medium/Large).**
- [x] **Assign an owner when opening the issue.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run lm-eval-harness benchmarks during validation #199

🎯 Goal (What & Why)

🚀 Execution Plan

Step 1: Add `from_existing_model()` constructor

Step 2: Implement a `TemplateLM` subclass for Fast-LLM

Step 3: Integration test

Step 4: Extend Fast-LLM's validation config to support lm-eval-harness tasks

📌 Acceptance Criteria (Must-Haves for Completion)

📎 Relevant Links

🛠️ Project Management

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run lm-eval-harness benchmarks during validation #199

Description

🎯 Goal (What & Why)

🚀 Execution Plan

Step 1: Add from_existing_model() constructor

Step 2: Implement a TemplateLM subclass for Fast-LLM

Step 3: Integration test

Step 4: Extend Fast-LLM's validation config to support lm-eval-harness tasks

📌 Acceptance Criteria (Must-Haves for Completion)

📎 Relevant Links

🛠️ Project Management

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Step 1: Add `from_existing_model()` constructor

Step 2: Implement a `TemplateLM` subclass for Fast-LLM