Support additional evaluation frameworks

# 🎯 **Goal (What & Why)**

The goal is to support the most important evaluation suites for our experiments. Ideally, we would define a unified API—an improved version of the `lm_eval_harness` model wrapper from [#282](https://github.com/ServiceNow/Fast-LLM/pull/282)—and let contributors extend the integration as needed.

The main challenge is that, in order to enable evaluation during training across different frameworks, we need to pass the model **in memory** to the target evaluation framework, rather than saving and reloading it from disk. This approach avoids changes in memory allocation and allows training to seamlessly resume after an evaluation step. We have already implemented such a wrapper for `lm_eval_harness` in [#282](https://github.com/ServiceNow/Fast-LLM/pull/282).

Next steps:

- Identify several key evaluation frameworks important to us.
- Evaluate whether we can design a unified interface to integrate them as described above.
- If a common interface isn't feasible, we may need to integrate each framework individually.

# 🚀 **Execution Plan**
> _(This section may start as an incomplete draft but must be defined before implementation begins.)_ 

### **Step 1: What is the smallest working version?**
> _(Describe the simplest way to implement this feature with minimal effort.)_  

### **Step 2: What additional optimizations are possible (but optional)?**  
> _(List potential refinements that can be added in later PRs if needed.)_  

# 📌 **Acceptance Criteria** (Must-Haves for Completion)
* The feature must be **functional and tested**.  
* The implementation must be **documented in practical terms**.  
* The PR must include a **performance/impact summary**.  
* **No refactors unless directly necessary** for feature completion.  

# 🛠️ **Project Management**
- [x] **Assign the project to the Fast-LLM project.**
- [ ] **Set the `Estimate` field (in days) in the GitHub project.**
- [ ] **Use the `Size` field to categorize the PR size (Small/Medium/Large).**
- [ ] **Assign an owner when opening the issue.**  


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support additional evaluation frameworks #283

🎯 Goal (What & Why)

🚀 Execution Plan

Step 1: What is the smallest working version?

Step 2: What additional optimizations are possible (but optional)?

📌 Acceptance Criteria (Must-Haves for Completion)

🛠️ Project Management

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support additional evaluation frameworks #283

Description

🎯 Goal (What & Why)

🚀 Execution Plan

Step 1: What is the smallest working version?

Step 2: What additional optimizations are possible (but optional)?

📌 Acceptance Criteria (Must-Haves for Completion)

🛠️ Project Management

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions