Skip to content

Conversation

@ethany-nv
Copy link
Collaborator

@ethany-nv ethany-nv commented Dec 18, 2025

Description

This PR uses asyncio to migrate logs for all the tasks within a workflow concurrently. Previously, the worker would synchronously migrate logs from Redis to cloud storage per task, which was inefficient.

This PR also batches together delete operations to increase the speed of clearing all the stale Redis keys.

Issue #174

Experiment

To see the performance gain of this optimization, we measure the duration of the CleanupWorkflow task when running large group workflows on two different OSMO instances - one without and one with the optimization. For each workflow ran, each task outputs 5000 lines.

Num Tasks W/O Optimization With Optimization
50 3:40 1:46
100 5:44 3:21
200 N/A (Probe Kill) 5:58

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@ethany-nv ethany-nv requested a review from a team December 18, 2025 01:24
xutongNV
xutongNV previously approved these changes Dec 18, 2025
Copy link
Contributor

@xutongNV xutongNV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you test this with a big workflow?

RyaliNvidia
RyaliNvidia previously approved these changes Dec 18, 2025
@ethany-nv ethany-nv changed the title Ethany/efficient upload jobs Efficient Workflow Cleanup through Using Async Operations for Log Migration Jan 16, 2026
@github-actions
Copy link

PR Preview Action v1.8.0

🚀 View preview at
https://NVIDIA.github.io/OSMO/pr-preview/pr-167/

Built to branch gh-pages/documentation at 2026-01-16 18:08 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@ethany-nv ethany-nv merged commit f1a4fd6 into main Jan 16, 2026
6 checks passed
@ethany-nv ethany-nv deleted the ethany/efficient_upload_jobs branch January 16, 2026 18:28
patclarknvidia added a commit that referenced this pull request Jan 20, 2026
* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

---------

Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
xutongNV added a commit that referenced this pull request Jan 20, 2026
* Update the wording re: creating feature branches (#204)

* Add a link back to OSMO from the brev launchable (#205)

* Improve styling for badges in the brev launchable readme (#207)

* Fix osmo config pool update payload in backend installation docs (#210)

* Fix osmo config pool update payload in practical guide (#213)

* #147 - backend operator redesign doc (#149)

* backend operator redesign doc

* 195 - Bump quick-start version due to updated dependencies (#217)

* Perform Client Side Data Auth Check In the Event of Environment Based Auth (#177)

* Data/Dataset Auth Check CLIs

* Remove auth check from data service

* Use auth check CLIs in ctrl

* Add exit code to docs

* Fix build issues

* Fix lint

* Ctrl to use user config when validating data auth

* Use the correct CLI argument type

* Fix lint

* Use profile when looking up data credential from config

* Update quick start installation to always install latest version (#218)

* Add workflow to label external issues and pull requests (#222)

* Add workflow to label external issues and pull requests

* pin to allowed action version

* add reopened event

* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

---------

Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
Co-authored-by: Fernando L <fernandol@nvidia.com>
Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
RyaliNvidia added a commit that referenced this pull request Jan 20, 2026
* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

---------

Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
xutongNV added a commit that referenced this pull request Jan 20, 2026
* Update the wording re: creating feature branches (#204)

* Add a link back to OSMO from the brev launchable (#205)

* Improve styling for badges in the brev launchable readme (#207)

* Fix osmo config pool update payload in backend installation docs (#210)

* Fix osmo config pool update payload in practical guide (#213)

* #147 - backend operator redesign doc (#149)

* backend operator redesign doc

* 195 - Bump quick-start version due to updated dependencies (#217)

* Perform Client Side Data Auth Check In the Event of Environment Based Auth (#177)

* Data/Dataset Auth Check CLIs

* Remove auth check from data service

* Use auth check CLIs in ctrl

* Add exit code to docs

* Fix build issues

* Fix lint

* Ctrl to use user config when validating data auth

* Use the correct CLI argument type

* Fix lint

* Use profile when looking up data credential from config

* Update quick start installation to always install latest version (#218)

* Add workflow to label external issues and pull requests (#222)

* Add workflow to label external issues and pull requests

* pin to allowed action version

* add reopened event

* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

* sync-feature-branches: fix no conflict case, allow single branch to be synced (#252)

* Fix sync-feature-branches with no merge conflicts

* Allow a single branch to be specified for sync-feature-branches

* Perform operations as OSMO CI Bot

* Add external label when the PR is created

* extract issue number

* add test cases (#247)

---------

Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
Co-authored-by: Fernando L <fernandol@nvidia.com>
Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
RyaliNvidia added a commit that referenced this pull request Jan 22, 2026
* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

* sync-feature-branches: fix no conflict case, allow single branch to be synced (#252)

* Fix sync-feature-branches with no merge conflicts

* Allow a single branch to be specified for sync-feature-branches

* Perform operations as OSMO CI Bot

* Add external label when the PR is created

* extract issue number

* add test cases (#247)

* Allow PR checks to run on release branches (#264)

* Database Pooling in Postgres Singleton Across Services (#251)

* Initial commit for database pooling

* Update set_session

* Fix lint

* Update PostgresConnector to have semaphor to control connections

* Lint fix

* Fix number of maxconn for test

* Address comments

* Add Go Postgres utils (#272)

* #148 - Auth Project Design Documents (#165)

* fix conflict

* fix conflict

* fix

---------

Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
xutongNV added a commit that referenced this pull request Jan 22, 2026
* Update the wording re: creating feature branches (#204)

* Add a link back to OSMO from the brev launchable (#205)

* Improve styling for badges in the brev launchable readme (#207)

* Fix osmo config pool update payload in backend installation docs (#210)

* Fix osmo config pool update payload in practical guide (#213)

* #147 - backend operator redesign doc (#149)

* backend operator redesign doc

* 195 - Bump quick-start version due to updated dependencies (#217)

* Perform Client Side Data Auth Check In the Event of Environment Based Auth (#177)

* Data/Dataset Auth Check CLIs

* Remove auth check from data service

* Use auth check CLIs in ctrl

* Add exit code to docs

* Fix build issues

* Fix lint

* Ctrl to use user config when validating data auth

* Use the correct CLI argument type

* Fix lint

* Use profile when looking up data credential from config

* Update quick start installation to always install latest version (#218)

* Add workflow to label external issues and pull requests (#222)

* Add workflow to label external issues and pull requests

* pin to allowed action version

* add reopened event

* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

* sync-feature-branches: fix no conflict case, allow single branch to be synced (#252)

* Fix sync-feature-branches with no merge conflicts

* Allow a single branch to be specified for sync-feature-branches

* Perform operations as OSMO CI Bot

* Add external label when the PR is created

* extract issue number

* add test cases (#247)

* Allow PR checks to run on release branches (#264)

* Database Pooling in Postgres Singleton Across Services (#251)

* Initial commit for database pooling

* Update set_session

* Fix lint

* Update PostgresConnector to have semaphor to control connections

* Lint fix

* Fix number of maxconn for test

* Address comments

* Add Go Postgres utils (#272)

* #148 - Auth Project Design Documents (#165)

---------

Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
Co-authored-by: Fernando L <fernandol@nvidia.com>
Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants