Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Clean up build deletion #591

Open
kcpevey opened this issue Sep 21, 2023 · 2 comments
Open

[BUG] - Clean up build deletion #591

kcpevey opened this issue Sep 21, 2023 · 2 comments
Labels
area: user experience 👩🏻‍💻 Items impacting the end-user experience project: JATIC Work item needed for the JATIC project type: bug 🐛 Something isn't working

Comments

@kcpevey
Copy link
Contributor

kcpevey commented Sep 21, 2023

Describe the bug

I have an env that had VERY high memory consumption such that it took forever to build, caused an inability to ssh into the server, and basically bogged down the server so that no other builds could be executed for several hours (and some are stuck perpetually building). Meanwhile end users are left wondering why their build is taking so long.

The conda-store react ui has no ability to delete a build.
The conda-store admin ui has no ability to delete a build that is currently building.

When this has happened in the past, there are only two options:

  • restart the conda-store server and see if that fixes it
  • go into the db, find the build, and manually remove it

Both of these options require an admin to get into the backend.

Expected behavior

  • Users should be able to delete individual builds via the UI (there may be a discussion here on whether you should be an admin to be able to do this, but users should NOT have to drop in the /conda-store/admin UI.
  • Users should be able to build that is currently building
  • Exploding memory needs is a rather common occurrence with conda - can we provide any feedback to users when the conda solve is consuming the entire server?
  • Flushing logs frequently to the log file and making that available to the end user DURING the build would be really helpful for a lot of things.

How to Reproduce the problem?

Attempt to create this environment.

channels:
- pytorch
- conda-forge
dependencies:
- adjusttext >= 0.7.3
- apache-beam >= 2.22.0
- attrs >=22.1.0, <23.0.0
- boto3
- botocore
- catboost >= 1.1.1
- click >=8.0.3
- cma => 3.3.0
- colour >= 0.1.5
- conda-lock >= 2.1.1
- coverage >= 6.5.0
- datasets >= 2.7.1
- docutils
- ffmpeg-python >= 0.2.0
- flake8 != 0.2.0
- google-cloud-storage
- h5py >= 3.8.0
- hypothesis >= 6.61.0
- jsonschema
- jupyter >= 1.0.0
- jupytext >= 1.14.0
- keras >= 2.10.0
- kornia != 0.6.12
- librosa >= 0.10.0
- libgcc-ng
- libstdcxx-ng
- lightning
- loguru
- matplotlib  >= 3.7.1
- mlflow
- nodeenv
- numba != 0.56.4
- numpy >= 1.18.5, < 1.25
- numpydoc >= 1.5.0
- pandas >= 2.0
- panel
- papermill >= 2.3.3
- pluggy
- protobuf
- pyarrow >= 7.0.0
- pycocotools >= 2.0.6
- pydub >= 0.25.1
- py-lief => 0.12.3
- pyright >= 1.1.280
- pytest > 0.2.0, != 7.3.1
- pytest-cov > 4.0.0
- pytest-flake8 != 1.1.1
- pytest-mock > 3.10.0
- pytest-xdist >= 3.3.1
- python == 3.9
- python-levenshtein
- python-pptx >= 0.6.21
- pytorch >= 1.13.1, < 1.14.0
- pytorch-lightning < 1.5.0
- requests != 2.31.0
- resampy >= 0.4.2
- scikit-image >= 0.21
- scikit-learn >= 1.2
- scipy >= 1.10.1, < 2.0
- seaborn >= 0.11.0
- setuptools >= 65.6.3
- setuptools_scm
- six >= 1.16.0
- snapshottest >= 0.6.0
- snowballstemmer
- sortedcontainers >=2.4.0
- sox
- sphinx
- statsmodels >= 0.13.5
- tensorboardx >= 2.6
- tensorflow >= 2.10.1, <2.13.0
- timm >= 0.6.12
- torchaudio >= 0.13.1
- torchmetrics >=0.11, <1.0
- torchvision >= 0.14.1
- tox >= 4.6.4
- tqdm >= 4.65.0
- transformers >=4.25.1
- types-pyyaml >= 6.0.12
- types-python-dateutil >= 2.8.19
- typing-extensions >= 4.5, != 4.6
- xgboost >= 1.7.5
- pip
- pip:
  - alibi-detect >= 0.11.4
  - huggingface-hub >= 0.11.1
  - kwcoco >= 0.2.18
  - opencv-python >= 4.5.5.62
  - opencv-python-headless >= 4.5.5.62
  - markdown_it_py
  - mdit_py_plugins
  - prophet >= 1.1.0, <2.0.0
  - pykeops >= 2.0.0, <2.2.0
  - smqtk-classifier >= 0.17.0
  - smqtk-core >= 0.19.0
  - smqtk-descriptors >= 0.16.0
  - smqtk-detection >= 0.19.0
  - smqtk-image-io >= 0.17.1
  - sphinxcontrib_applehelp
  - sphinxcontrib_devhelp
  - sphinxcontrib_htmlhelp
  - sphinxcontrib_qthelp
  - sphinxcontrib_serializinghtml
  - tensorflow-addons >= 0.13.0
  - tensorflow-datasets >= 4.6.0, != 4.9.0
  - tensorflow-probability >= 0.8.0, <0.20.0
  - torcheval == 0.0.6
  - tidecv
  - virtualenv-pyenv >= 0.3.0
  - xaitk-saliency >= 0.7.0
- ipykernel
description: null
name: jatic-toolbox-panel-app
prefix: null
variables: null

Output

No response

Versions and dependencies used.

No response

Anything else?

No response

@kcpevey kcpevey added type: bug 🐛 Something isn't working area: user experience 👩🏻‍💻 Items impacting the end-user experience project: JATIC Work item needed for the JATIC project labels Sep 21, 2023
@kcpevey
Copy link
Contributor Author

kcpevey commented Oct 18, 2023

It would also be nice to give admin users a way to delete "all but the last XX builds". RE: nebari-dev/nebari#1247

@kcpevey
Copy link
Contributor Author

kcpevey commented Feb 13, 2024

The conda-store admin ui has no ability to delete a build that is currently building.

This is available now, but you have to go into the environment page to kill the build. So if you don't know which environments are stuck building, its very difficult to find them in order to kill them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: user experience 👩🏻‍💻 Items impacting the end-user experience project: JATIC Work item needed for the JATIC project type: bug 🐛 Something isn't working
Projects
Status: Ready 🛎️
Development

No branches or pull requests

1 participant