Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] smoke test fixes for managed jobs #4217

Merged
merged 5 commits into from
Oct 30, 2024

Commits on Oct 30, 2024

  1. [test] don't wait for old pending jobs controller messages

    `sky jobs queue` used to output a temporary "waiting" message while the managed
    jobs controller was still being provisioned/starting. Since skypilot-org#3288 this is not
    shown, and instead the queued jobs themselves will show PENDING/STARTING.
    
    This also requires some changes to tests to permit the PENDING and STARTING
    states for managed jobs.
    cg505 committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    d460005 View commit details
    Browse the repository at this point in the history
  2. fix default aws region

    cg505 committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    ff98231 View commit details
    Browse the repository at this point in the history
  3. [test] wait for RECOVERING more quickly

    Smoke tests were failing because some managed jobs were fulling recovering back
    to the RUNNING state before the smoke test could catch the RECOVERING case (see
    e.g. skypilot-org#4192 `test_managed_jobs_cancellation_gcp`). Change tests that manually
    terminate a managed job instance, so that they will wait for the managed job to
    change away from the RUNNING state, checking every 10s.
    cg505 committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    d7e6d94 View commit details
    Browse the repository at this point in the history
  4. address PR comments

    cg505 committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    9ee72aa View commit details
    Browse the repository at this point in the history
  5. fix

    cg505 committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    3548076 View commit details
    Browse the repository at this point in the history