Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WX-1595 GCP Batch backend refactor to include the PAPI request manager #7412

Merged
merged 80 commits into from
May 29, 2024

Commits on Apr 18, 2024

  1. Refactor GCP Batch request manager

    BatchApiRequestWorkerSpec works! The goal is to port the papiv2 request manager behavior into GCP batch.
    
    We just need to rename the methods to remove the PAPI names.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    ff8543c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5cca13a View commit details
    Browse the repository at this point in the history
  3. Fix scalafmt

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    0823351 View commit details
    Browse the repository at this point in the history
  4. Temporary disable CI

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    0f5ab24 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    94d1aaf View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    af6b42d View commit details
    Browse the repository at this point in the history
  7. draft

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    fdb5c24 View commit details
    Browse the repository at this point in the history
  8. draft

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    ff23056 View commit details
    Browse the repository at this point in the history
  9. Yet another draft

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    ecc28d1 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    2afef28 View commit details
    Browse the repository at this point in the history
  11. Implement abort operation

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    a249835 View commit details
    Browse the repository at this point in the history
  12. Get has been implemented

    Now, we just need to fix the runtime errors, wiring the correct messages, etc.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    7ef43da View commit details
    Browse the repository at this point in the history
  13. Tests are compiling

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    71daebe View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    1e8bcef View commit details
    Browse the repository at this point in the history
  15. Fix GcpBatchGroupedRequests

    The queue must be cleared after executing the requests.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    ad51153 View commit details
    Browse the repository at this point in the history
  16. Enable tests

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    476a923 View commit details
    Browse the repository at this point in the history
  17. Clean up unnecessary code + batch abort request bugfix

    Abort request handler was returning the wrong message, also, keep the old
    behavior where abort request is handled once only.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    f92974b View commit details
    Browse the repository at this point in the history
  18. Handle JobAbortedException

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    0bf7a90 View commit details
    Browse the repository at this point in the history
  19. Handle GCP errors

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    94ce5cd View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    76bb726 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    dbd6d24 View commit details
    Browse the repository at this point in the history
  22. Enable commented tests

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    e619ef0 View commit details
    Browse the repository at this point in the history
  23. Huge refactor on batch RunStatus

    This grabs many details from PAPIv2, work still pending on the tests.
    
    NOTE: This could break the current integration.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    057ee14 View commit details
    Browse the repository at this point in the history
  24. Configuration menu
    Copy the full SHA
    6fe1518 View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    0412a9d View commit details
    Browse the repository at this point in the history
  26. Add more tests

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    d71751d View commit details
    Browse the repository at this point in the history
  27. Configuration menu
    Copy the full SHA
    3ccc34d View commit details
    Browse the repository at this point in the history
  28. Minor fixes

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    810a061 View commit details
    Browse the repository at this point in the history
  29. Add TODO

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    9acc240 View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    7a4db8c View commit details
    Browse the repository at this point in the history
  31. Add tests to LoadConfigSpec

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    2db7ce0 View commit details
    Browse the repository at this point in the history
  32. Refactor the abort workflow

    Query the job status to not delete a terminal job.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    62129c9 View commit details
    Browse the repository at this point in the history
  33. Refactor GcpBatchGroupedRequests to be a data holder only

    The execution is now handled by BatchRequestExecutor.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    26603c8 View commit details
    Browse the repository at this point in the history
  34. Add missing files

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    29f3c4f View commit details
    Browse the repository at this point in the history
  35. Implement event list mapping

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    778c83a View commit details
    Browse the repository at this point in the history
  36. Clean up RunStatus from unnecessary data

    This removes the custom error mappers that detect preemptible errors because these are supposed to be handled natively by GCP.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    7becc27 View commit details
    Browse the repository at this point in the history
  37. Remove TODOs

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    38eb051 View commit details
    Browse the repository at this point in the history
  38. Fix minor bug in BatchRequestExecutor

    The requests were being reversed unnecessarily.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    64510cf View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    be118f5 View commit details
    Browse the repository at this point in the history
  40. Further clean up

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    b4a1b30 View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    e8a61f1 View commit details
    Browse the repository at this point in the history
  42. Run centaurGcpBatch tests only

    There is a failed test which we need to fix, there is no need to run the whole suite for now.
    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    d9c362a View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    11a02bb View commit details
    Browse the repository at this point in the history
  44. Draft

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    1e2661b View commit details
    Browse the repository at this point in the history
  45. Add debug option

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    290f065 View commit details
    Browse the repository at this point in the history
  46. Enable tests again

    AlexITC committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    50517e4 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2024

  1. Remove unnecessary comments from GcpBatchAsyncBackendJobExecutionActo…

    …rSpec
    
    There are a few tests that were marked with TODO notes, still, we found out that
    the behavior from batch is the same than the behavior from papiv2, both of these
    backends behave differently than papi-common.
    AlexITC committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    a333f65 View commit details
    Browse the repository at this point in the history

Commits on Apr 26, 2024

  1. Configuration menu
    Copy the full SHA
    bfd9dd1 View commit details
    Browse the repository at this point in the history
  2. Fix scalafmt

    AlexITC committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    54c152d View commit details
    Browse the repository at this point in the history

Commits on May 2, 2024

  1. Configuration menu
    Copy the full SHA
    59c9675 View commit details
    Browse the repository at this point in the history

Commits on May 3, 2024

  1. Configuration menu
    Copy the full SHA
    83891cf View commit details
    Browse the repository at this point in the history

Commits on May 7, 2024

  1. Configuration menu
    Copy the full SHA
    0cf2ba0 View commit details
    Browse the repository at this point in the history
  2. Clean up

    AlexITC committed May 7, 2024
    Configuration menu
    Copy the full SHA
    526447d View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    8d7c618 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    fa18a3c View commit details
    Browse the repository at this point in the history
  5. Further clean up

    AlexITC committed May 7, 2024
    Configuration menu
    Copy the full SHA
    f872fee View commit details
    Browse the repository at this point in the history

Commits on May 9, 2024

  1. Configuration menu
    Copy the full SHA
    0e30684 View commit details
    Browse the repository at this point in the history

Commits on May 10, 2024

  1. Configuration menu
    Copy the full SHA
    c08b09b View commit details
    Browse the repository at this point in the history
  2. Further cleanup

    AlexITC committed May 10, 2024
    Configuration menu
    Copy the full SHA
    23650c0 View commit details
    Browse the repository at this point in the history
  3. More cleanup

    AlexITC committed May 10, 2024
    Configuration menu
    Copy the full SHA
    35dce53 View commit details
    Browse the repository at this point in the history
  4. Minor tweaks

    AlexITC committed May 10, 2024
    Configuration menu
    Copy the full SHA
    cee36d9 View commit details
    Browse the repository at this point in the history

Commits on May 14, 2024

  1. Configuration menu
    Copy the full SHA
    6119047 View commit details
    Browse the repository at this point in the history
  2. Revert temporal flag change

    AlexITC committed May 14, 2024
    Configuration menu
    Copy the full SHA
    f4511b9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b30be69 View commit details
    Browse the repository at this point in the history

Commits on May 15, 2024

  1. Configuration menu
    Copy the full SHA
    8f16dbc View commit details
    Browse the repository at this point in the history

Commits on May 16, 2024

  1. Fix merge errors

    AlexITC committed May 16, 2024
    Configuration menu
    Copy the full SHA
    28a62ef View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0e4ddcb View commit details
    Browse the repository at this point in the history

Commits on May 21, 2024

  1. Configuration menu
    Copy the full SHA
    f1a83ff View commit details
    Browse the repository at this point in the history
  2. Set requestsAbortAndDiesImmediately=false

    This allow us handling the abort result instead of blindly marking the job as aborted.
    AlexITC committed May 21, 2024
    Configuration menu
    Copy the full SHA
    dca1420 View commit details
    Browse the repository at this point in the history
  3. Improve abort job handler

    Now, we look for the submit requests before sending an abort request, canceling jobs that were not submitted to GCP.
    
    Turns out that this is now unnecessary because we are already mapping query errors to a RunStatus.
    AlexITC committed May 21, 2024
    Configuration menu
    Copy the full SHA
    eca754f View commit details
    Browse the repository at this point in the history
  4. Remove customPollStatusFailure

    Turns out that this is now unnecessary because we are already mapping query errors to a RunStatus.
    AlexITC committed May 21, 2024
    Configuration menu
    Copy the full SHA
    0095a87 View commit details
    Browse the repository at this point in the history
  5. Fix compile errors

    AlexITC committed May 21, 2024
    Configuration menu
    Copy the full SHA
    837f86e View commit details
    Browse the repository at this point in the history
  6. Fix abort from BatchApiRequestManager

    When aborting an individual job, only that job can be aborted instead of
    all the jobs from that workflow.
    AlexITC committed May 21, 2024
    Configuration menu
    Copy the full SHA
    dcd0efe View commit details
    Browse the repository at this point in the history
  7. Try fixing preemption errors from GCP

    In theory, this solves #7407
    AlexITC committed May 21, 2024
    Configuration menu
    Copy the full SHA
    3c9d020 View commit details
    Browse the repository at this point in the history

Commits on May 24, 2024

  1. Rollback the preemption fixes

    Turns out that this is not the correct fix.
    AlexITC committed May 24, 2024
    Configuration menu
    Copy the full SHA
    e261cb9 View commit details
    Browse the repository at this point in the history

Commits on May 25, 2024

  1. Final cleanup

    - Switch the noisy logs to debug level.
    - Remove status codes ported from PAPI because they do not have any usefulness in batch.
    - Remove all the test cases involved the PAPI codes.
    - Clean RunStatus from the unused args.
    - Rename JES occurrences to Batch.
    AlexITC committed May 25, 2024
    Configuration menu
    Copy the full SHA
    1050850 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    75acee0 View commit details
    Browse the repository at this point in the history
  3. Yet another cleanup

    AlexITC committed May 25, 2024
    Configuration menu
    Copy the full SHA
    387e399 View commit details
    Browse the repository at this point in the history
  4. Run scalafmt

    AlexITC committed May 25, 2024
    Configuration menu
    Copy the full SHA
    5f01bc9 View commit details
    Browse the repository at this point in the history

Commits on May 29, 2024

  1. Configuration menu
    Copy the full SHA
    37e5fde View commit details
    Browse the repository at this point in the history