Skip to content

Conversation

@humaira-rf
Copy link
Collaborator

@humaira-rf humaira-rf commented Nov 13, 2025

Changes

  • updated notebooks to display dataframe from the results dict
  • Added other experimentation knobs to the results dict

Testing

  • Tested on all three notebooks with downsampled data and just 2 configs

Screenshots

a. hyrbid - gsm8k:
Screenshot 2025-11-13 at 3 47 50 PM

b. Fully openai - scifact:(couldn't screenshot the full df)
Screenshot 2025-11-13 at 3 52 35 PM

c. Fully Local: fiqa
Screenshot 2025-11-13 at 3 41 28 PM
Screenshot 2025-11-13 at 3 41 31 PM


Note

Adds pipeline metadata to final metrics with consistent ordering and updates tutorials to display results as a DataFrame.

  • Evals Controller (rapidfireai/evals/scheduling/controller.py):
    • Final metrics pipeline: _compute_final_metrics_for_pipelines now accepts optional pipeline_id_to_info and injects pipeline metadata (e.g., model_name, search_type, rag_k, top_n, chunk_size, chunk_overlap, sampling_params, prompt_manager_k, model_config).
    • Ordering and output: Reorders cumulative metrics to run_id, model_name, hyperparams, Samples Processed, then remaining metrics; returns ordered_metrics and uses it for progress display.
    • Integration: Builds pipeline_id_to_info from pipeline_info and passes it to final-metrics computation.
  • Tutorial Notebooks:
    • Replace sample printouts with conversion of results into a pandas DataFrame (results_df) in rf-tutorial-gsm8k-fewshot.ipynb, rf-tutorial-rag-fiqa.ipynb, and rf-tutorial-scifact-full-evaluation.ipynb.
    • Minor notebook metadata/formatting tweaks (ids, kernelspec).

Written by Cursor Bugbot for commit efdc979. This will update automatically on new commits. Configure here.

@humaira-rf humaira-rf requested a review from arun-rfai November 13, 2025 22:18
Copy link
Collaborator

@arun-rfai arun-rfai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest leaving the dict return type of run_evals as is. Conver the dict to dataframe in the notebook cell itself, limiting it only metrics columns and the config knobs akin to the second table printed in run_evals.

Also, pipeline_id -> run_id.

@humaira-rf humaira-rf requested a review from arun-rfai November 14, 2025 00:49
Copy link
Collaborator

@arun-rfai arun-rfai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arun-rfai arun-rfai merged commit 3570b6d into main Nov 14, 2025
1 check passed
@arun-rfai arun-rfai deleted the feature/results_to_df branch November 14, 2025 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants