Skip to content

[BUG] Trials switch from parallel to sequential execution #132

@RichardJ1

Description

@RichardJ1

Bug Description

When processing runs, the trials execute in parallel until just before the first epoch. Then, they switch to sequential execution - the first trial runs to completion, then the second, etc.

Image Image

To Reproduce

Steps to reproduce the behavior:

  1. Connect a T4 instance on Google Colab (or Colab Enterprise) to the attached notebook.
  2. Click on the "end experiment" cell, activate the command palette (Cmd/Ctr + Shift + P), and select "Run cells before the current".
  3. Observe the training after step 12.

Expected Behavior

Trials should run in parallel throughout the experiment.

Environment

  • OS: Run on Colab Enterprise with these resources:
    richard-single-gpu-3
    Machine type: n1-standard-8
    GPU type: NVIDIA_TESLA_T4 x 1
    Environment: Python 3.12
    Region: us-central1
  • Python version: 3.12
  • RapidFire AI version: 0.12.8
  • Browser (if applicable): Firefox

Notebook

rf_colab_tensorboard_tutorial(2).ipynb

Error Logs

rapidfire.log

training.log

@pradyumna-rfai @arun-rfai

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions