Skip to content

Conversation

@vvnpn-nv
Copy link
Contributor

@vvnpn-nv vvnpn-nv commented Jan 20, 2026

Description

  • Add shell scripts to create terraform resources needed to setup OSMO, create kubernetes resources (namespaces, secrets, configmaps) to deploy minimal version of OSMO without auth.
  • Updated azure redis resource to no longer use the deprecated redis enterprise instance
  • Updated AWS postgres and kuberenetes version in terraform

Issue #267

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vvnpn-nv vvnpn-nv requested a review from a team January 20, 2026 22:42
@vvnpn-nv vvnpn-nv changed the title #267 - cloud deployment script #267 - cloud deployment scripts Jan 20, 2026
- Added .gitignore to ignore values/, *.env files
- Removed values/*.yaml files from git (auto-generated during deployment)
RyaliNvidia
RyaliNvidia previously approved these changes Jan 20, 2026
RyaliNvidia
RyaliNvidia previously approved these changes Jan 22, 2026
@vvnpn-nv vvnpn-nv merged commit 5e90fd8 into main Jan 24, 2026
6 checks passed
@vvnpn-nv vvnpn-nv deleted the vpan/azure-deploy-script branch January 24, 2026 00:18
ethany-nv added a commit that referenced this pull request Jan 26, 2026
* Update the wording re: creating feature branches (#204)

* Add a link back to OSMO from the brev launchable (#205)

* Improve styling for badges in the brev launchable readme (#207)

* Fix osmo config pool update payload in backend installation docs (#210)

* Fix osmo config pool update payload in practical guide (#213)

* #147 - backend operator redesign doc (#149)

* backend operator redesign doc

* 195 - Bump quick-start version due to updated dependencies (#217)

* Perform Client Side Data Auth Check In the Event of Environment Based Auth (#177)

* Data/Dataset Auth Check CLIs

* Remove auth check from data service

* Use auth check CLIs in ctrl

* Add exit code to docs

* Fix build issues

* Fix lint

* Ctrl to use user config when validating data auth

* Use the correct CLI argument type

* Fix lint

* Use profile when looking up data credential from config

* Update quick start installation to always install latest version (#218)

* Add workflow to label external issues and pull requests (#222)

* Add workflow to label external issues and pull requests

* pin to allowed action version

* add reopened event

* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

* sync-feature-branches: fix no conflict case, allow single branch to be synced (#252)

* Fix sync-feature-branches with no merge conflicts

* Allow a single branch to be specified for sync-feature-branches

* Perform operations as OSMO CI Bot

* Add external label when the PR is created

* extract issue number

* add test cases (#247)

* Allow PR checks to run on release branches (#264)

* Database Pooling in Postgres Singleton Across Services (#251)

* Initial commit for database pooling

* Update set_session

* Fix lint

* Update PostgresConnector to have semaphor to control connections

* Lint fix

* Fix number of maxconn for test

* Address comments

* Add Go Postgres utils (#272)

* #148 - Auth Project Design Documents (#165)

* add args to postgres (#282)

* #267 - cloud deployment scripts (#268)

* script to create azure resources and deploy

* Remove auto-generated values files from tracking

- Added .gitignore to ignore values/, *.env files
- Removed values/*.yaml files from git (auto-generated during deployment)

* add aws script

* add aws script

* add copyright

* update copyright

* conflicts

---------

Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
Co-authored-by: Fernando L <fernandol@nvidia.com>
Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
patclarknvidia added a commit that referenced this pull request Jan 26, 2026
* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

* sync-feature-branches: fix no conflict case, allow single branch to be synced (#252)

* Fix sync-feature-branches with no merge conflicts

* Allow a single branch to be specified for sync-feature-branches

* Perform operations as OSMO CI Bot

* Add external label when the PR is created

* extract issue number

* add test cases (#247)

* Allow PR checks to run on release branches (#264)

* Database Pooling in Postgres Singleton Across Services (#251)

* Initial commit for database pooling

* Update set_session

* Fix lint

* Update PostgresConnector to have semaphor to control connections

* Lint fix

* Fix number of maxconn for test

* Address comments

* Add Go Postgres utils (#272)

* #148 - Auth Project Design Documents (#165)

* add args to postgres (#282)

* #267 - cloud deployment scripts (#268)

* script to create azure resources and deploy

* Remove auto-generated values files from tracking

- Added .gitignore to ignore values/, *.env files
- Removed values/*.yaml files from git (auto-generated during deployment)

* add aws script

* add aws script

* add copyright

* update copyright

---------

Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
RyaliNvidia added a commit that referenced this pull request Jan 27, 2026
* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

* sync-feature-branches: fix no conflict case, allow single branch to be synced (#252)

* Fix sync-feature-branches with no merge conflicts

* Allow a single branch to be specified for sync-feature-branches

* Perform operations as OSMO CI Bot

* Add external label when the PR is created

* extract issue number

* add test cases (#247)

* Allow PR checks to run on release branches (#264)

* Database Pooling in Postgres Singleton Across Services (#251)

* Initial commit for database pooling

* Update set_session

* Fix lint

* Update PostgresConnector to have semaphor to control connections

* Lint fix

* Fix number of maxconn for test

* Address comments

* Add Go Postgres utils (#272)

* #148 - Auth Project Design Documents (#165)

* add args to postgres (#282)

* #267 - cloud deployment scripts (#268)

* script to create azure resources and deploy

* Remove auto-generated values files from tracking

- Added .gitignore to ignore values/, *.env files
- Removed values/*.yaml files from git (auto-generated during deployment)

* add aws script

* add aws script

* add copyright

* update copyright

* Support for Azure workload identity in AKS and Arc clusters (#141)

* feat(src): add Azure service account and extra pod labels configuration

- implement service account creation with customizable name and annotations
- enhance service templates to support extra pod labels for various services
- update Azure backend to utilize DefaultAzureCredential for authentication
- add tests for Azure credential extraction and client creation

* feat(src): extract account key from connection string for Azure Blob Storage

- add function to extract AccountKey from connection string
- update AzureBlobStorageClient to handle different credential types

* feat(test): add tests for account key extraction from Azure connection strings

* chore: clean up linting issues for tests

* refactor(src): update data credential types in PostgresConnector and TaskGroup

- change StaticDataCredential to DataCredential in get_all_data_creds method
- update fetch_creds function signature to use DataCredential

* feat(src): update Azure client creation to include storage account and account URL

- remove deprecated storage account extraction function
- modify create_client to accept storage_account and account_url parameters
- update AzureBlobStorageClientFactory to use new parameters
- adjust tests to reflect changes in client creation

🔒 - Generated by Copilot

* refactor(src): mark storage_account parameter as unused in create_client function

🔧 - Generated by Copilot

* refactor(src): remove unused storage_account parameter from client creation

🔧 - Generated by Copilot

* Fix conflicts

---------

Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
Co-authored-by: Allen Greaves <111466195+agreaves-ms@users.noreply.github.com>
RyaliNvidia added a commit that referenced this pull request Jan 28, 2026
* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

* sync-feature-branches: fix no conflict case, allow single branch to be synced (#252)

* Fix sync-feature-branches with no merge conflicts

* Allow a single branch to be specified for sync-feature-branches

* Perform operations as OSMO CI Bot

* Add external label when the PR is created

* extract issue number

* add test cases (#247)

* Allow PR checks to run on release branches (#264)

* Database Pooling in Postgres Singleton Across Services (#251)

* Initial commit for database pooling

* Update set_session

* Fix lint

* Update PostgresConnector to have semaphor to control connections

* Lint fix

* Fix number of maxconn for test

* Address comments

* Add Go Postgres utils (#272)

* #148 - Auth Project Design Documents (#165)

* add args to postgres (#282)

* #267 - cloud deployment scripts (#268)

* script to create azure resources and deploy

* Remove auto-generated values files from tracking

- Added .gitignore to ignore values/, *.env files
- Removed values/*.yaml files from git (auto-generated during deployment)

* add aws script

* add aws script

* add copyright

* update copyright

* Support for Azure workload identity in AKS and Arc clusters (#141)

* feat(src): add Azure service account and extra pod labels configuration

- implement service account creation with customizable name and annotations
- enhance service templates to support extra pod labels for various services
- update Azure backend to utilize DefaultAzureCredential for authentication
- add tests for Azure credential extraction and client creation

* feat(src): extract account key from connection string for Azure Blob Storage

- add function to extract AccountKey from connection string
- update AzureBlobStorageClient to handle different credential types

* feat(test): add tests for account key extraction from Azure connection strings

* chore: clean up linting issues for tests

* refactor(src): update data credential types in PostgresConnector and TaskGroup

- change StaticDataCredential to DataCredential in get_all_data_creds method
- update fetch_creds function signature to use DataCredential

* feat(src): update Azure client creation to include storage account and account URL

- remove deprecated storage account extraction function
- modify create_client to accept storage_account and account_url parameters
- update AzureBlobStorageClientFactory to use new parameters
- adjust tests to reflect changes in client creation

🔒 - Generated by Copilot

* refactor(src): mark storage_account parameter as unused in create_client function

🔧 - Generated by Copilot

* refactor(src): remove unused storage_account parameter from client creation

🔧 - Generated by Copilot

* Add new project proposal to describe nvlink + topology aware scheduling (#211)

* Add new project proposal to describe nvlink + topology aware scheduling

* Split design into two docs

* Finish docs and add some updates from feedback

* Add some open items

* OSMO-6044: Application error when closing Task Details after switching Events view from Task to Workflow (#315)

* add redis utlis, update postgres utils (#313)

* add redis utlis, update postgres utils

* add deps

* Fix missing seperator in the test runner roles (#320)

* fix

* remove

* fix

---------

Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
Co-authored-by: Allen Greaves <111466195+agreaves-ms@users.noreply.github.com>
Co-authored-by: ecolternv <ecolter@nvidia.com>
Co-authored-by: tdewanNvidia <tdewan@nvidia.com>
fernandol-nvidia pushed a commit that referenced this pull request Jan 29, 2026
* script to create azure resources and deploy

* Remove auto-generated values files from tracking

- Added .gitignore to ignore values/, *.env files
- Removed values/*.yaml files from git (auto-generated during deployment)

* add aws script

* add aws script

* add copyright

* update copyright
xutongNV added a commit that referenced this pull request Feb 3, 2026
* Update the wording re: creating feature branches (#204)

* Add a link back to OSMO from the brev launchable (#205)

* Improve styling for badges in the brev launchable readme (#207)

* Fix osmo config pool update payload in backend installation docs (#210)

* Fix osmo config pool update payload in practical guide (#213)

* #147 - backend operator redesign doc (#149)

* backend operator redesign doc

* 195 - Bump quick-start version due to updated dependencies (#217)

* Perform Client Side Data Auth Check In the Event of Environment Based Auth (#177)

* Data/Dataset Auth Check CLIs

* Remove auth check from data service

* Use auth check CLIs in ctrl

* Add exit code to docs

* Fix build issues

* Fix lint

* Ctrl to use user config when validating data auth

* Use the correct CLI argument type

* Fix lint

* Use profile when looking up data credential from config

* Update quick start installation to always install latest version (#218)

* Add workflow to label external issues and pull requests (#222)

* Add workflow to label external issues and pull requests

* pin to allowed action version

* add reopened event

* allow flexible squid proxy replicas (#241)

* allow flexible squid proxy replicas

* fix

* Efficient Workflow Cleanup through Using Async Operations for Log Migration (#167)

* Improving Performance for Uploading Workflow Artifacts in Worker Jobs

* Cleanup

* Add progress writing after upload

* Add dependency in Bazel BUILD

* Add type to mypy requirements

* Update mypy requirements

* Add to mypy_cli BUILD

* Fix lint

* Comment

* Use constant to define semaphor and storage client executor count

* #244 - Use last login url if url is not specified (#245)

* Use last login url if url is not specified

* print message

* Cannot select any text inside modals or slideouts (#248)

* Video html element not changin when selecting different video files in the UI for OSMO dataset (#249)

* sync-feature-branches: fix no conflict case, allow single branch to be synced (#252)

* Fix sync-feature-branches with no merge conflicts

* Allow a single branch to be specified for sync-feature-branches

* Perform operations as OSMO CI Bot

* Add external label when the PR is created

* extract issue number

* add test cases (#247)

* Allow PR checks to run on release branches (#264)

* Database Pooling in Postgres Singleton Across Services (#251)

* Initial commit for database pooling

* Update set_session

* Fix lint

* Update PostgresConnector to have semaphor to control connections

* Lint fix

* Fix number of maxconn for test

* Address comments

* Add Go Postgres utils (#272)

* #148 - Auth Project Design Documents (#165)

* add args to postgres (#282)

* #267 - cloud deployment scripts (#268)

* script to create azure resources and deploy

* Remove auto-generated values files from tracking

- Added .gitignore to ignore values/, *.env files
- Removed values/*.yaml files from git (auto-generated during deployment)

* add aws script

* add aws script

* add copyright

* update copyright

* Support for Azure workload identity in AKS and Arc clusters (#141)

* feat(src): add Azure service account and extra pod labels configuration

- implement service account creation with customizable name and annotations
- enhance service templates to support extra pod labels for various services
- update Azure backend to utilize DefaultAzureCredential for authentication
- add tests for Azure credential extraction and client creation

* feat(src): extract account key from connection string for Azure Blob Storage

- add function to extract AccountKey from connection string
- update AzureBlobStorageClient to handle different credential types

* feat(test): add tests for account key extraction from Azure connection strings

* chore: clean up linting issues for tests

* refactor(src): update data credential types in PostgresConnector and TaskGroup

- change StaticDataCredential to DataCredential in get_all_data_creds method
- update fetch_creds function signature to use DataCredential

* feat(src): update Azure client creation to include storage account and account URL

- remove deprecated storage account extraction function
- modify create_client to accept storage_account and account_url parameters
- update AzureBlobStorageClientFactory to use new parameters
- adjust tests to reflect changes in client creation

🔒 - Generated by Copilot

* refactor(src): mark storage_account parameter as unused in create_client function

🔧 - Generated by Copilot

* refactor(src): remove unused storage_account parameter from client creation

🔧 - Generated by Copilot

* Add new project proposal to describe nvlink + topology aware scheduling (#211)

* Add new project proposal to describe nvlink + topology aware scheduling

* Split design into two docs

* Finish docs and add some updates from feedback

* Add some open items

* OSMO-6044: Application error when closing Task Details after switching Events view from Task to Workflow (#315)

* add redis utlis, update postgres utils (#313)

* add redis utlis, update postgres utils

* add deps

* Fix missing seperator in the test runner roles (#320)

* show backend name in scheduler validation error message (#323)

* #220 - Design documentation for dynamic subpool (#221)

* Initial design spike for dynamic subpool

* Add more context to design

* Address feedback

* resolve conflict

* fix ui

---------

Co-authored-by: Ethan Look-Potts <elookpotts@nvidia.com>
Co-authored-by: xutongNV <xutongr@nvidia.com>
Co-authored-by: Fernando L <fernandol@nvidia.com>
Co-authored-by: Vivian Pan <vivianp@nvidia.com>
Co-authored-by: ethany-nv <ethany@nvidia.com>
Co-authored-by: RyaliNvidia <ryali@nvidia.com>
Co-authored-by: patclarknvidia <patc@nvidia.com>
Co-authored-by: Allen Greaves <111466195+agreaves-ms@users.noreply.github.com>
Co-authored-by: ecolternv <ecolter@nvidia.com>
Co-authored-by: tdewanNvidia <tdewan@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants