Skip to content

Conversation

@david-rfai
Copy link
Collaborator

@david-rfai david-rfai commented Nov 24, 2025

Changes

  • Re-Support run_fit() on Google Colab
  • Support run_evals() on Google Colab
  • Add rapidfireai jupyter command, including outputting tunneling recommendations based on if in VSCode or not
  • Jupyter default URL is /tree
  • Automatic check if running in Google Colab for simpler install/init process
  • Add some checks to rapidfireai doctor to give warning or errors if some required items not installed/compatible
  • Add additional Python packages to rapidifreai doctor
  • Add Torch version and Torch CUDA version to rapidfireai doctor
  • Support for evals of newer versions of Torch based on Cuda version
  • Add rapidfireai --test-notebooks to copy test notebook to tutorial_notebooks folder
  • New FIT test notebook runs under 5 minutes
  • Move OPENAI_API_KEY to first line of Notebooks
  • Set default Ray console port to 8855, and allow customizing the port RF_RAY_PORT

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • I have tested this change manually
  • I have tested this change in the following environments:
    • AWS Ubuntu, evals
    • AWS Linux , evals
    • Colab, evals
    • PBD, evals
    • AWS Ubuntu, fit
    • AWS Linux, fit
    • Colab, fit
    • PBD, fit

Screenshots (if applicable)

image

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Performance Impact

If this PR affects performance, describe the impact and any optimizations made.

Related Issues

Fixes #(issue number)
Closes #(issue number)
Related to #(issue number)


Note

Adds full Google Colab support for fit and evals with a new rapidfireai jupyter command, enhanced doctor/init, dispatcher/CORS and Ray dashboard config, and updated packaging/notebooks.

  • Platform/Env Support (Colab):
    • Add rapidfireai.utils.colab and shared utils.constants; detect Colab and provide auth token helpers.
    • Adjust GPU/CPU allocation and Ray init (dashboard on 8855, RF_RAY_PORT) for Colab.
    • Early CUDA init in actors to avoid CUBLAS_STATUS_NOT_INITIALIZED.
  • CLI:
    • New rapidfireai jupyter command (sets /tree, prints port-forward tips).
    • rapidfireai doctor: show site-packages, more packages, Torch/CUDA versions, statuses.
    • init installs env-specific reqs (fit/evals, colab/local); --test-notebooks to copy test notebooks.
  • Evals/Dispatcher:
    • Dispatcher binds 0.0.0.0, permissive CORS with credentials in Colab; helpers get_dispatcher_url/get_dispatcher_headers.
    • Notebook UI uses Colab proxy and credentials; polling and clone/IC flows refined.
    • Controller/Scheduler: context build parallelism, CI-aware resource sizing; improved final/live metrics handling.
  • Fit:
    • Dispatcher/Frontend import constants from utils; startup scripts handle Colab/TensorBoard mode and PID/port cleanup.
  • Packaging/Setup:
    • Expand package data to include utils, setup requirements, tests; add optional deps groups; add setup requirements-*.txt for fit/evals.
  • Docs/Notebooks/README:
    • README: add rapidfireai jupyter, consolidated SSH forwarding, extra port-kill tips.
    • New/updated tutorial and test notebooks (Colab-friendly imports, versions, API key prompts).

Written by Cursor Bugbot for commit 0456e55. This will update automatically on new commits. Configure here.

@david-rfai david-rfai changed the title Aws evals Colab support and CLI improvements Dec 3, 2025
@david-rfai david-rfai changed the title Colab support and CLI improvements Google Colab support for both fit and evals and CLI improvements Dec 3, 2025
@david-rfai david-rfai merged commit ac7bcd1 into main Dec 3, 2025
1 check passed
@david-rfai david-rfai deleted the awsEvals branch December 3, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants