[ENH] - Make idle culler settings easily configurable and documented how to change #1283

costrouc · 2022-05-14T03:37:14Z

Feature description

Currently much of the idle culler is hard coded. @rsignell-usgs brought this up as an issue that he was concerned about. The current timeout is too short in some cases.

Value and/or benefit

The default idle timeout does not work for everyone.

Anything else?

No response

viniciusdc · 2022-05-16T13:50:32Z

Hi @costrouc, do they want a config per user structure, or they are happy to get it set in the qhub-config?

rsignell-usgs · 2022-05-16T18:25:50Z

@viniciusdc and @costrouc , we would be happy to set this in the qhub-config.
One of the worst aspects of having the time out being so short is that any terminal sessions disappear.
Thanks for taking a look!

rsignell-usgs · 2022-07-27T17:36:31Z

Folks, what would it take to enable this?

This is the number top complaint I've heard from ESIP Qhub users.

Even if it wasn't configurable and just made longer by qhub devs, that would be wonderful. Right now it must be 5 minutes, right?

It would be great if dask clusters spun down in 30 min, and notebooks spun down in 90 min or 3 hours.

Just for comparison, AWS SageMaker Studio Lab, the free notebook offering from AWS, times out after 4 hours for a GPU, 12 hours for a CPU.

iameskild · 2022-07-29T19:04:39Z

Hi @rsignell-usgs, I will make sure this issue is prioritized for our next sprint (which starts next week). I can't promise it will be configurable from the qhub-config.yaml but I will work with the team to come up with a workable solution asap. Thanks again for the reminder!!

rsignell-usgs · 2022-08-01T14:17:12Z

Okay, thanks @iameskild. The users will definitely appreciate any improvement in the situation, even if not configurable!

rsignell-usgs · 2022-10-22T00:51:46Z

@iameskild , I remember you showed me how to (temporarily) override the short culler settings by connecting to some pod and editing a config file, right? After the upgrade from 0.4.3 to 0.4.4, the users are screaming again about the too-short timeout for their servers.

iameskild · 2022-10-24T15:11:39Z

Hey @rsignell-usgs, for now, you can manually edit the etc-jupyter configmap if you want to make changes to the timeout settings.

Although I still have to circle back to this when I have more time but as a quick update, I was looking into using Terraform's templatefile to make these values more easily configurable.

viniciusdc · 2022-10-25T16:21:02Z

This can also be achieved using overrides on the jupyterhub configuration to change the idle-culling variable values. Right now, the values that can be changed are those here

jupyterhub:
  overrides:
    cull:
      users: true

Some values come from the idle-culler extension that, as of now, only the above method can be used to update them.

rsignell-usgs · 2022-10-25T16:28:56Z

To change these, I can use k9s to ssh into the hub-** pod and then just edit them?

iameskild · 2022-10-25T16:30:41Z

@rsignell-usgs yep, just edit the file. You may need to kill the hub pod for the changes to take effect.

rsignell-usgs · 2022-10-25T16:36:01Z

What is the filename once I've ssh'ed into the hub pod?

rsignell-usgs · 2022-10-25T19:15:29Z

Here's the workaround recipe that should modify the cull settings (at least until the next qhub/nebari version is deployed):

in k9s, type ":configmap"
use arrow keys to highlight the etc-jupyter configmap
hit the e key to edit (make the changes below), then "esc"
still in k9s, type ":pod"
use arrow keys to highlight the pod that starts with hub-xx
kill the pod (). (don't worry, it will regenerate in just a few seconds)

Just for the record, I set everything to 30 minutes:


    # The interval (in seconds) on which to check for terminals exceeding the
    # inactive timeout value.
    c.TerminalManager.cull_interval = 30 * 60

    # cull_idle_timeout: timeout (in seconds) after which an idle kernel is
    # considered ready to be culled
    c.MappingKernelManager.cull_idle_timeout = 30 * 60

    # cull_interval: the interval (in seconds) on which to check for idle
    # kernels exceeding the cull timeout value
    c.MappingKernelManager.cull_interval = 30 * 60

    # cull_connected: whether to consider culling kernels which have one
    # or more connections
    c.MappingKernelManager.cull_connected = True

    # cull_busy: whether to consider culling kernels which are currently
    # busy running some code
    c.MappingKernelManager.cull_busy = False

    # Shut down the server after N seconds with no kernels or terminals
    # running and no activity.
    c.NotebookApp.shutdown_no_activity_timeout = 30 * 60

costrouc added type: enhancement 💅🏼 New feature or request needs: triage 🚦 Someone needs to have a look at this issue and triage labels May 14, 2022

viniciusdc self-assigned this May 16, 2022

viniciusdc added the area: user experience 👩🏻‍💻 label May 16, 2022

iameskild added the needs: investigation 🔍 Someone in the team needs to find the root cause and replicate this bug label Jul 29, 2022

iameskild self-assigned this Jul 29, 2022

trallard removed the needs: triage 🚦 Someone needs to have a look at this issue and triage label Nov 10, 2022

This was referenced Jan 6, 2023

[ENH] - Increase idle timeout for JupyterLab Terminal #1615

Closed

[ENH] - Decrease the timeout for successfully dashboard deployment #1616

Closed

iameskild unassigned iameskild and viniciusdc Jan 28, 2023

trallard moved this to New 📬 in 🪴 Nebari Project Management Mar 6, 2023

trallard added this to 🪴 Nebari Project Management Mar 6, 2023

iameskild added this to the Release 2023.4.1 milestone Mar 28, 2023

iameskild mentioned this issue Mar 31, 2023

Make idle culler settings configurable from the nebari-config.yaml #1689

Merged

10 tasks

viniciusdc closed this as completed in #1689 Apr 5, 2023

github-project-automation bot moved this from New 📬 to Done 💪🏾 in 🪴 Nebari Project Management Apr 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] - Make idle culler settings easily configurable and documented how to change #1283

[ENH] - Make idle culler settings easily configurable and documented how to change #1283

costrouc commented May 14, 2022

viniciusdc commented May 16, 2022

rsignell-usgs commented May 16, 2022 •

edited

Loading

rsignell-usgs commented Jul 27, 2022 •

edited

Loading

iameskild commented Jul 29, 2022

rsignell-usgs commented Aug 1, 2022

rsignell-usgs commented Oct 22, 2022

iameskild commented Oct 24, 2022

viniciusdc commented Oct 25, 2022 •

edited

Loading

rsignell-usgs commented Oct 25, 2022

iameskild commented Oct 25, 2022

rsignell-usgs commented Oct 25, 2022

rsignell-usgs commented Oct 25, 2022 •

edited

Loading

[ENH] - Make idle culler settings easily configurable and documented how to change #1283

[ENH] - Make idle culler settings easily configurable and documented how to change #1283

Comments

costrouc commented May 14, 2022

Feature description

Value and/or benefit

Anything else?

viniciusdc commented May 16, 2022

rsignell-usgs commented May 16, 2022 • edited Loading

rsignell-usgs commented Jul 27, 2022 • edited Loading

iameskild commented Jul 29, 2022

rsignell-usgs commented Aug 1, 2022

rsignell-usgs commented Oct 22, 2022

iameskild commented Oct 24, 2022

viniciusdc commented Oct 25, 2022 • edited Loading

rsignell-usgs commented Oct 25, 2022

iameskild commented Oct 25, 2022

rsignell-usgs commented Oct 25, 2022

rsignell-usgs commented Oct 25, 2022 • edited Loading

rsignell-usgs commented May 16, 2022 •

edited

Loading

rsignell-usgs commented Jul 27, 2022 •

edited

Loading

viniciusdc commented Oct 25, 2022 •

edited

Loading

rsignell-usgs commented Oct 25, 2022 •

edited

Loading