Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[App] Improve cluster creation / deletion experience #15458

Conversation

luca3rd
Copy link
Contributor

@luca3rd luca3rd commented Nov 1, 2022

What does this PR do?

Fixes ENG-1513, ENG-1523

Cluster creation and deletion can take a long time. Instead of having these long running operations happen in the background, they should happen in the foreground. The advantage is that failures are brought to the users attention immediately, instead of the next time they decide to run lightning list clusters.

While the CLI waits for the cluster to run / delete, it will display cluster status changes to the user.

This PR also hides the --enable-performance and --edit-before-creation creation flags, as well as the --force deletion flag. They are either not frequently used (performance mode is expensive), or prone to misuse.

asciicast

Does your PR introduce any breaking changes? If yes, please list them.

Yes:

  1. lightning create cluster and lightning delete cluster now wait by default
  2. Those commands no longer have --wait. Now they have --async.
  3. lightning create cluster no longer accepts --enable-performance and --edit-before-creation
  4. lightning delete cluster no longer accepts --force

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda

@github-actions github-actions bot added the app (removed) Generic label for Lightning App package label Nov 1, 2022
@luca3rd luca3rd changed the title Wait by default and notify on state changes [App] Cluster creation should wait by default and display status updates Nov 1, 2022
@luca3rd luca3rd changed the title [App] Cluster creation should wait by default and display status updates [App] Cluster create / delete should wait by default and display status updates Nov 1, 2022
@luca3rd luca3rd self-assigned this Nov 1, 2022
@luca3rd luca3rd added this to the v1.8.x milestone Nov 1, 2022
@luca3rd luca3rd marked this pull request as ready for review November 1, 2022 22:00
@nmiculinic nmiculinic self-requested a review November 2, 2022 16:42
Copy link
Contributor

@nmiculinic nmiculinic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add short video https://asciinema.org/ for the user experience?

(( you can use local API server; disable cluster controller, and manually edit local database for the cluster to see the changes )).

Do not take more than 20min to record the video please...I want to see the user experience

docs/source-app/workflows/byoc/index.rst Show resolved Hide resolved
Copy link
Contributor

@nmiculinic nmiculinic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this locally; what we've talked about; it needs user feedback we're still doing something and not stuck.
e.g. every 10s print message cluster is still being [10s elapsed], [20s elapsed], etc. similar how terraform does it

@nmiculinic
Copy link
Contributor

The messsage just has "Cluster n-138 is now pending" but could also print the seconds/minutes elapsed

@nmiculinic
Copy link
Contributor

Also what happens when the user sends SIGINT to the process?

Cluster n-138 is now pending
Cluster n-138 is now pending
Cluster n-138 is now pending
Cluster n-138 is now pending
^C
Aborted!

So...but the cluster is still being created. Can you show a better error message to the user upon aborting with ^C. Maybe something like:

You've aborted waiting for cluster creation, but the cluster is still being created. If you want you can delete it via lighting delete <>

or something similar you'd expect as a user

@luca3rd luca3rd marked this pull request as draft November 3, 2022 18:34
@nmiculinic nmiculinic force-pushed the ENG-1513-for-cluster-creation-creation-default-to-waiting-until-its-ready branch from 772bfae to b176c71 Compare November 4, 2022 10:47
@nmiculinic nmiculinic force-pushed the ENG-1513-for-cluster-creation-creation-default-to-waiting-until-its-ready branch from f7e3d58 to d3215ad Compare November 4, 2022 11:46
@nmiculinic
Copy link
Contributor

https://asciinema.org/a/7PLh5tPgm2YxSzw4VvbyGwq6J This is my modifications. @luca3rd can you bring this PR across the finish line, make sure things are tested properly via unit tests, etc.

@luca3rd
Copy link
Contributor Author

luca3rd commented Nov 4, 2022

https://asciinema.org/a/7PLh5tPgm2YxSzw4VvbyGwq6J This is my modifications. @luca3rd can you bring this PR across the finish line, make sure things are tested properly via unit tests, etc.

@nmiculinic did you pull before force-pushing? Some of my recent changes were obliterated.

@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Nov 21, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 22, 2022

⚡ Required checks status: All passing 🟢

Groups summary

🟢 lightning_app: Tests workflow
Check ID Status
app-pytest (macOS-11, app, 3.8, latest) success
app-pytest (macOS-11, app, 3.8, oldest) success
app-pytest (macOS-11, lightning, 3.9, latest) success
app-pytest (ubuntu-20.04, app, 3.8, latest) success
app-pytest (ubuntu-20.04, app, 3.8, oldest) success
app-pytest (ubuntu-20.04, lightning, 3.9, latest) success
app-pytest (windows-2022, app, 3.8, latest) success
app-pytest (windows-2022, app, 3.8, oldest) success
app-pytest (windows-2022, lightning, 3.8, latest) success

These checks are required after the changes to src/lightning_app/cli/cmd_clusters.py, src/lightning_app/cli/lightning_cli_create.py, src/lightning_app/cli/lightning_cli_delete.py, tests/tests_app/cli/test_cli.py, tests/tests_app/cli/test_cmd_clusters.py.

🟢 lightning_app: Examples
Check ID Status
app-examples (macOS-11, app, 3.9, latest) success
app-examples (macOS-11, app, 3.9, oldest) success
app-examples (macOS-11, lightning, 3.9, latest) success
app-examples (ubuntu-20.04, app, 3.9, latest) success
app-examples (ubuntu-20.04, app, 3.9, oldest) success
app-examples (ubuntu-20.04, lightning, 3.9, latest) success
app-examples (windows-2022, app, 3.9, latest) success
app-examples (windows-2022, app, 3.9, oldest) success
app-examples (windows-2022, lightning, 3.9, latest) success

These checks are required after the changes to src/lightning_app/cli/cmd_clusters.py, src/lightning_app/cli/lightning_cli_create.py, src/lightning_app/cli/lightning_cli_delete.py.

🟢 lightning_app: Azure
Check ID Status
App.cloud-e2e success

These checks are required after the changes to src/lightning_app/cli/cmd_clusters.py, src/lightning_app/cli/lightning_cli_create.py, src/lightning_app/cli/lightning_cli_delete.py.

🟢 lightning_app: Docs
Check ID Status
make-doctest (app) success
make-html (app) success

These checks are required after the changes to src/lightning_app/cli/cmd_clusters.py, src/lightning_app/cli/lightning_cli_create.py, src/lightning_app/cli/lightning_cli_delete.py, docs/source-app/workflows/byoc/index.rst.

🟢 mypy
Check ID Status
mypy success

These checks are required after the changes to src/lightning_app/cli/cmd_clusters.py, src/lightning_app/cli/lightning_cli_create.py, src/lightning_app/cli/lightning_cli_delete.py.

🟢 install
Check ID Status
install-pkg (ubuntu-22.04, app, 3.7) success
install-pkg (ubuntu-22.04, app, 3.10) success
install-pkg (ubuntu-22.04, lite, 3.7) success
install-pkg (ubuntu-22.04, lite, 3.10) success
install-pkg (ubuntu-22.04, pytorch, 3.7) success
install-pkg (ubuntu-22.04, pytorch, 3.10) success
install-pkg (ubuntu-22.04, lightning, 3.7) success
install-pkg (ubuntu-22.04, lightning, 3.10) success
install-pkg (macOS-12, app, 3.7) success
install-pkg (macOS-12, app, 3.10) success
install-pkg (macOS-12, lite, 3.7) success
install-pkg (macOS-12, lite, 3.10) success
install-pkg (macOS-12, pytorch, 3.7) success
install-pkg (macOS-12, pytorch, 3.10) success
install-pkg (macOS-12, lightning, 3.7) success
install-pkg (macOS-12, lightning, 3.10) success
install-pkg (windows-2022, app, 3.7) success
install-pkg (windows-2022, app, 3.10) success
install-pkg (windows-2022, lite, 3.7) success
install-pkg (windows-2022, lite, 3.10) success
install-pkg (windows-2022, pytorch, 3.7) success
install-pkg (windows-2022, pytorch, 3.10) success
install-pkg (windows-2022, lightning, 3.7) success
install-pkg (windows-2022, lightning, 3.10) success

These checks are required after the changes to src/lightning_app/cli/cmd_clusters.py, src/lightning_app/cli/lightning_cli_create.py, src/lightning_app/cli/lightning_cli_delete.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Nov 22, 2022
@Borda Borda requested review from dmitsf and carmocca November 22, 2022 08:33
@Borda
Copy link
Member

Borda commented Nov 22, 2022

@nicolai86 is there anything else? 🦦

@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Nov 22, 2022
@nicolai86
Copy link
Contributor

@Borda no, I'll touch base with @luca3rd tmw to see if we can bring this over the finish line

@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Nov 28, 2022
@luca3rd luca3rd merged commit 33e1f93 into master Nov 28, 2022
@luca3rd luca3rd deleted the ENG-1513-for-cluster-creation-creation-default-to-waiting-until-its-ready branch November 28, 2022 16:38
Borda pushed a commit that referenced this pull request Nov 30, 2022
Cluster creation and deletion can take a long time. Instead of having these long running operations happen in the background, they should happen in the foreground. The advantage is that failures are brought to the users attention immediately, instead of the next time they decide to run `lightning list clusters`.

While the CLI waits for the cluster to run / delete, it will display cluster status changes to the user.

This PR also hides the `--enable-performance` and `--edit-before-creation` creation flags, as well as the `--force` deletion flag. They are either not frequently used (performance mode is expensive), or prone to misuse.

Co-authored-by: Neven Miculinic <neven.miculinic@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Raphael Randschau <nicolai86@users.noreply.github.com>
(cherry picked from commit 33e1f93)
lantiga added a commit that referenced this pull request Dec 7, 2022
* update chlog

* Tests/App: refactor examples - structure (#15770)

* rename _examples dir

* refactor

* clean

* path

* add inits

* skip

* e2e

* azure

* e2e

* rev

* unify single depth for ignore docs req.

* group

(cherry picked from commit 59fa320)

* feature: add `_generate_works_json` method (#15767)

(cherry picked from commit 51bb845)

* tests: split examples and pytests (#15774)

split examples and pytests

(cherry picked from commit 952b64b)

* [App] Stop App when it has succeeded (#15801)

(cherry picked from commit 3a99a25)

* Notify the user of ignored requirements (#15799)

(cherry picked from commit 9e43604)

* Add code_dir argument to tracer run (#15771)

(cherry picked from commit 0a12731)

* [App] Add CloudMultiProcessBackend to run an children App within the Flow in the cloud (#15800)

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* updte

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* Update src/lightning_app/CHANGELOG.md

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Update src/lightning_app/utilities/port.py

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Update src/lightning_app/utilities/port.py

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Update src/lightning_app/utilities/port.py

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Update src/lightning_app/utilities/port.py

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Update src/lightning_app/utilities/port.py

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* Update src/lightning_app/utilities/port.py

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
(cherry picked from commit 8ca6dfe)

* Update lightning-utilities requirement from ==0.3.* to ==0.4.* in /requirements (#15420)

Update lightning-utilities requirement in /requirements

Updates the requirements on [lightning-utilities](https://github.com/Lightning-AI/utilities) to permit the latest version.
- [Release notes](https://github.com/Lightning-AI/utilities/releases)
- [Changelog](https://github.com/Lightning-AI/utilities/blob/main/CHANGELOG.md)
- [Commits](Lightning-AI/utilities@v0.3.0...v0.4.0)

---
updated-dependencies:
- dependency-name: lightning-utilities
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit e150d08)

* Ignore `num_nodes` when running MultiNode components locally (#15806)

(cherry picked from commit a970f09)

* lit extras (#15793)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
(cherry picked from commit 8ee889b)

* [App] Add utility to get install command for package extras (#15809)

(cherry picked from commit f171657)

* [App] Enable Python Server and Gradio Serve to run on accelerated device such as GPU CUDA / MPS (#15813)

(cherry picked from commit 4e64391)

* Update flake8 version (#15816)

(cherry picked from commit 1e56b75)

* Checkgroup config fixes (#15787)

(cherry picked from commit cca3432)

* [App] Resolve a condition bug with spawning (#15812)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
(cherry picked from commit 6a2a83a)

* Print the e2e app ID as early as possible (#15821)

(cherry picked from commit 76cf419)

* Added note about custom base images (#14125)

Co-authored-by: awaelchli <aedu.waelchli@gmail.com>
(cherry picked from commit 70126df)

* Add warning comment to cloud requirements (#15790)

(cherry picked from commit be699a8)

* [CLI] fix ssh listing stopped components (#15810)

* [CLI] fix ssh listing stopped components
* update CHANGELOG

(cherry picked from commit c786b3d)

* Update fairscale requirement from <=0.4.6,>=0.4.5 to >=0.4.5,<0.4.13 in /requirements (#15842)

Update fairscale requirement in /requirements

Updates the requirements on [fairscale]() to permit the latest version.

---
updated-dependencies:
- dependency-name: fairscale
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 206fd06)

* Bump google-github-actions/get-gke-credentials from 0 to 1 (#15843)

Bumps [google-github-actions/get-gke-credentials](https://github.com/google-github-actions/get-gke-credentials) from 0 to 1.
- [Release notes](https://github.com/google-github-actions/get-gke-credentials/releases)
- [Changelog](https://github.com/google-github-actions/get-gke-credentials/blob/main/CHANGELOG.md)
- [Commits](google-github-actions/get-gke-credentials@v0...v1)

---
updated-dependencies:
- dependency-name: google-github-actions/get-gke-credentials
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit ed7707e)

* Bump hivemind from 1.0.1 to 1.1.2 in /requirements (#15839)

Bumps [hivemind](https://github.com/learning-at-home/hivemind) from 1.0.1 to 1.1.2.
- [Release notes](https://github.com/learning-at-home/hivemind/releases)
- [Commits](learning-at-home/hivemind@1.0.1...1.1.2)

---
updated-dependencies:
- dependency-name: hivemind
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 1d94297)

* Update cloudpickle requirement from <=2.1.0,>=1.3 to >=1.3,<2.3.0 in /requirements (#15840)

Update cloudpickle requirement in /requirements

Updates the requirements on [cloudpickle](https://github.com/cloudpipe/cloudpickle) to permit the latest version.
- [Release notes](https://github.com/cloudpipe/cloudpickle/releases)
- [Changelog](https://github.com/cloudpipe/cloudpickle/blob/master/CHANGES.md)
- [Commits](cloudpipe/cloudpickle@v1.3.0...v2.2.0)

---
updated-dependencies:
- dependency-name: cloudpickle
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 95d5ccb)

* hotfix import torch (#15849)

* fix import torch

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* plugin

* fix

* skip

* patch require

* seed

* warn

* .

* ..

* skip True

* 0.0.3

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

(cherry picked from commit ad4bd66)

* [App] Improve cluster creation / deletion experience (#15458)

Cluster creation and deletion can take a long time. Instead of having these long running operations happen in the background, they should happen in the foreground. The advantage is that failures are brought to the users attention immediately, instead of the next time they decide to run `lightning list clusters`.

While the CLI waits for the cluster to run / delete, it will display cluster status changes to the user.

This PR also hides the `--enable-performance` and `--edit-before-creation` creation flags, as well as the `--force` deletion flag. They are either not frequently used (performance mode is expensive), or prone to misuse.

Co-authored-by: Neven Miculinic <neven.miculinic@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Raphael Randschau <nicolai86@users.noreply.github.com>
(cherry picked from commit 33e1f93)

* CI: freeze docs requirements [hotfix] (#15865)

freeze ipy

(cherry picked from commit bc528fd)

* fix formatting

* [App] Raise error when launching app on multiple clusters (#15484)

* Error when running on multiple clusters

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert this in separate PR: keep this focused

* Improve testing

* fixup! Improve testing

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pass flake8

* Update changelog

* Address PR feedback

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused import

* Reword error message

* Error if running on cluster that doesn't exist

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixup! Error if running on cluster that doesn't exist

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unsued import

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

(cherry picked from commit c5d3bba)

* Moving `lightning_api_access` out of base requirements (#15844)

* moving the requirements to components extras

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* component requirements to devel

* importing torch in local scope

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* skipping doctest

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

(cherry picked from commit 5864409)

* [App] Fixing Sigterm Handler causing thread lock which caused KeyboardInterrupt to hang (#15881)

* terminating only once

* changelog

(cherry picked from commit 5144160)

* CI: signal lai build (#15871)

(cherry picked from commit 36b953b)

* CI: prune dependency for benchmarks (#15879)

* prune dependency for benchmarks
* drop

(cherry picked from commit 993bd67)

* unblock legacy checkpoints (#15798)

* fixing legacy checkpoints
* Apply suggestions from code review

Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
(cherry picked from commit fee52f9)

* CI: update signalling (#15887)

(cherry picked from commit f4fcad3)

* update chlog

* waiting on feedback (#15893)

* waiting
* builds

(cherry picked from commit a86584d)

* [CLI] drop name column from cluster list (#15721)

* drop name column from cluster list

* change create cluster to accept id as well

* rename validator

* remove cluster name from logs

* fix merge with master

* more merge with master issues

(cherry picked from commit a82be2f)

* Add CLI Command to Delete Lightning App (#15783)

* initial work on deleting apps

* after PR review

* delete CLI working

* restructred to make tests easier

* revert manifest changes

* added changelog, fix mypy issue

* updates

* Update src/lightning_app/cli/cmd_apps.py

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

* Update src/lightning_app/cli/lightning_cli_delete.py

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

* Update src/lightning_app/cli/lightning_cli_delete.py

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

* Update src/lightning_app/cli/lightning_cli_delete.py

Co-authored-by: Sherin Thomas <sherin@lightning.ai>

* Update src/lightning_app/cli/lightning_cli_delete.py

Co-authored-by: Sherin Thomas <sherin@lightning.ai>

* import typing

* adding tests

* finished adding tests

* addressed code review comments

* fix mypy error

* make mypy happy

* make mypy happy

* make mypy happy

* make mypy happy

* fix windows cli

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Sherin Thomas <sherin@lightning.ai>

(cherry picked from commit b4d99e3)

* [App] Support for headless apps (#15875)

* Add `is_headless` when dispatching in the cloud

* Bump cloud version

* Add tests

* Dont open app page for headless apps locally

* Refactor

* Update CHANGELOG.md

* Support dynamic UIs at runtime

* Comments

* Fix

* Updates

* Fixes and cleanup

* Fix tests

* Dont open view page for headless apps

* Fix test, resolve URL the right way

* Remove launch

* Clean

* Cleanup tests

* Fixes

* Updates

* Add test

* Increase app cloud tests timeout

* Increase timeout

* Wait for running

* Revert timeouts

* Clean

* Dont update if it hasnt changed

* Increase timeout

(cherry picked from commit 32cf1fa)

* [App] Fix hanging CI (#15913)

(cherry picked from commit ab022ac)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* version 1.8.4

* Direct support for compiled models (#15922)

* Direct support for compiled models

* Update test

* Update src/pytorch_lightning/core/module.py

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
(cherry picked from commit 2992002)

* update chlog

* CI: parameterize TPU tests (#15876)

* update
* param
* Apply suggestions from code review

(cherry picked from commit 77006a2)

* [App] Add ready property to the flow (#15921)

(cherry picked from commit 852089e)

* [App] Enable running with spawn context (#15923)

(cherry picked from commit d2a8fbf)

* Fix compiler support test (#15927)

(cherry picked from commit 6f54a82)

* Enable back inference mode support with hpu & update links (#15918)

* Enable back inference mode support with hpu
* Remove unused
* Update document link and address comment

Signed-off-by: Jerome <janand@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

(cherry picked from commit 6aaac8b)

* [App] Introduce auto scaler (#15769)

* Exlucde __pycache__ in setuptools

* Add load balancer example

* wip

* Update example

* rename

* remove prints

* _LoadBalancer -> LoadBalancer

* AutoScaler(work)

* change var name

* remove locust

* Update docs

* include autoscaler in api ref

* docs typo

* docs typo

* docs typo

* docs typo

* remove unused loadtest

* remove unused device_type

* clean up

* clean up

* clean up

* Add docstring

* type

* env vars to args

* expose an API for users to override to customise autoscaling logic

* update example

* comment

* udpate var name

* fix scale mechanism and clean up

* Update exampl

* ignore mypy

* Add test file

* .

* update impl and update tests

* Update changlog

* .

* revert docs

* update test

* update state to keep calling 'flow.run()'

Co-authored-by: Aniket Maurya <theaniketmaurya@gmail.com>

* Add aiohttp to base requirements

* Update docs

Co-authored-by: Luca Antiga <luca.antiga@gmail.com>

* Use deserializer utility

* fake trigger

* wip: protect /system/* with basic auth

* read password at runtime

* Change env var name

* import torch as optional

* Don't overcreate works

* simplify imports

* Update example

* aiohttp

* Add work_args work_kwargs

* More docs

* remove FIXME

* Apply Jirka's suggestions

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean example device

* add comment on init threshold value

* bad merge

* nit: logging format

* {in,out}put_schema -> {in,out}put_type

* lowercase

* docs on seconds

* process_time -> processing_time

* Dont modify work state from flow

* Update tests

* worker_url -> endpoint

* fix exampl

* Fix default scale logic

* Fix default scale logic

* Fix num_pending_works

* Update num_pending_works

* Fix bug creating too many works

* Remove up/downscale_threshold args

* Update example

* Add typing

* Fix example in docstring

* Fix default scale logic

* Update src/lightning_app/components/auto_scaler.py

Co-authored-by: Noha Alon <nohalon@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename method

* rename locvar

* Add todo

* docs ci

* docs ci

* asdfafsdasdf pls docs

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ethanwharris@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* .

* doc

* Update src/lightning_app/components/auto_scaler.py

Co-authored-by: Noha Alon <nohalon@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit 24983a0.

* Revert "Update src/lightning_app/components/auto_scaler.py"

This reverts commit 56ea78b.

* Remove redefinition

* Remove load balancer run blocker

* raise RuntimeError

* remove has_sent

* lower the default timeout_batching from 10 to 1

* remove debug

* update the default timeout_batching

* .

* tighten condition

* fix endpoint

* typo in runtimeerror cond

* async lock update severs

* add a test

* {in,out}put_type typing

* Update examples/app_server_with_auto_scaler/app.py

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>

* Update .actions/setup_tools.py

Co-authored-by: Aniket Maurya <theaniketmaurya@gmail.com>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Noha Alon <nohalon@gmail.com>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Akihiro Nitta <aki@pop-os.localdomain>
Co-authored-by: thomas chaton <thomas@grid.ai>

(cherry picked from commit 64b19fb)

* ENG-627: Docs for CloudCompute Mount Argument (#15182)

fixed conflicts

(cherry picked from commit 2041908)

* Fix LRScheduler import for PyTorch 2.0 (#15940)

* Fix LRScheduler import for PyTorch 2.0
* Add comment for posterity

(cherry picked from commit de93167)

* update chlog

Co-authored-by: Yurij Mikhalevich <yurij@grid.ai>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Luca Antiga <luca.antiga@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Ethan Harris <ethanwharris@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Raphael Randschau <nicolai86@users.noreply.github.com>
Co-authored-by: Luca Furst <rlfurst@gmail.com>
Co-authored-by: Sherin Thomas <sherin@lightning.ai>
Co-authored-by: Rick Izzo <rlizzo@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jerome Anand <88475913+jerome-habana@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app (removed) Generic label for Lightning App package ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants